Can't open file on NFS that was just written by other machine - NFS
This is a discussion on Can't open file on NFS that was just written by other machine - NFS ; My problem is as follows:
I have my application running on a LSF cluster (128+ nodes) that write
files to a NFS disk.
As soon as the job has finished the files are visible with ls, but
have a file ...
-
Can't open file on NFS that was just written by other machine
My problem is as follows:
I have my application running on a LSF cluster (128+ nodes) that write
files to a NFS disk.
As soon as the job has finished the files are visible with ls, but
have a file size of 0 for some time after the job is finished. A
management process running on another machine tries to open the files
as soon as LSF tells that all jobs have been finished, but the files
can not be opened by this management process yet. It sometimes take 10
seconds before the files can be opened by this process.
-
Re: Can't open file on NFS that was just written by other machine
wrote in message
news:1c0f1271-69d8-4d51-95ec-4c3dca73c7a7@a70g2000hsh.googlegroups.com...
> My problem is as follows:
>
> I have my application running on a LSF cluster (128+ nodes) that write
> files to a NFS disk.
> As soon as the job has finished the files are visible with ls, but
> have a file size of 0 for some time after the job is finished. A
> management process running on another machine tries to open the files
> as soon as LSF tells that all jobs have been finished, but the files
> can not be opened by this management process yet. It sometimes take 10
> seconds before the files can be opened by this process.
>
This is a result of the NFS protocol. NFS servers will keep files locked
for a period of time after the file write ends. In your case it sounds like
this period is 10 seconds.
Mike.
-
Re: Can't open file on NFS that was just written by other machine
On Aug 10, 9:47*am, "Michael D. Ober"
wrote:
> wrote in message
>
> news:1c0f1271-69d8-4d51-95ec-4c3dca73c7a7@a70g2000hsh.googlegroups.com...
>
> > My problem is as follows:
>
> > I have my application running on a LSF cluster (128+ nodes) that write
> > files to a NFS disk.
> > As soon as the job has finished the files are visible with ls, but
> > have a file size of 0 for some time after the job is finished. A
> > management process running on another machine tries to open the files
> > as soon as LSF tells that all jobs have been finished, but the files
> > can not be opened by this management process yet. It sometimes take 10
> > seconds before the files can be opened by this process.
>
> This is a result of the NFS protocol. *NFS servers will keep files locked
> for a period of time after the file write ends. *In your case it soundslike
> this period is 10 seconds.
Really? The NFS protocol doesn't say that a server can lock the file
without the client telling it to. First, there is no reason to do so.
Second,
it would seriously kill performance.
Eric, I think you might have a client caching problem. The client
caches
file attributes for seconds, which is typically configurable with
the
"actimeo=" mount option. The caching is there to improve local
access
latency and so that the client doesn't generate too much traffic.
The "file size of 0" suggests that the management process is still
using
the cached file attributes, even though the file has already changed.
So
you may want to look into reducing the cache timeout value.
Cheers,
bc