Can't open file on NFS that was just written by other machine - NFS

This is a discussion on Can't open file on NFS that was just written by other machine - NFS ; My problem is as follows: I have my application running on a LSF cluster (128+ nodes) that write files to a NFS disk. As soon as the job has finished the files are visible with ls, but have a file ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Can't open file on NFS that was just written by other machine

  1. Can't open file on NFS that was just written by other machine

    My problem is as follows:

    I have my application running on a LSF cluster (128+ nodes) that write
    files to a NFS disk.
    As soon as the job has finished the files are visible with ls, but
    have a file size of 0 for some time after the job is finished. A
    management process running on another machine tries to open the files
    as soon as LSF tells that all jobs have been finished, but the files
    can not be opened by this management process yet. It sometimes take 10
    seconds before the files can be opened by this process.

  2. Re: Can't open file on NFS that was just written by other machine

    wrote in message
    news:1c0f1271-69d8-4d51-95ec-4c3dca73c7a7@a70g2000hsh.googlegroups.com...
    > My problem is as follows:
    >
    > I have my application running on a LSF cluster (128+ nodes) that write
    > files to a NFS disk.
    > As soon as the job has finished the files are visible with ls, but
    > have a file size of 0 for some time after the job is finished. A
    > management process running on another machine tries to open the files
    > as soon as LSF tells that all jobs have been finished, but the files
    > can not be opened by this management process yet. It sometimes take 10
    > seconds before the files can be opened by this process.
    >



    This is a result of the NFS protocol. NFS servers will keep files locked
    for a period of time after the file write ends. In your case it sounds like
    this period is 10 seconds.

    Mike.



  3. Re: Can't open file on NFS that was just written by other machine

    On Aug 10, 9:47*am, "Michael D. Ober"
    wrote:
    > wrote in message
    >
    > news:1c0f1271-69d8-4d51-95ec-4c3dca73c7a7@a70g2000hsh.googlegroups.com...
    >
    > > My problem is as follows:

    >
    > > I have my application running on a LSF cluster (128+ nodes) that write
    > > files to a NFS disk.
    > > As soon as the job has finished the files are visible with ls, but
    > > have a file size of 0 for some time after the job is finished. A
    > > management process running on another machine tries to open the files
    > > as soon as LSF tells that all jobs have been finished, but the files
    > > can not be opened by this management process yet. It sometimes take 10
    > > seconds before the files can be opened by this process.

    >
    > This is a result of the NFS protocol. *NFS servers will keep files locked
    > for a period of time after the file write ends. *In your case it soundslike
    > this period is 10 seconds.


    Really? The NFS protocol doesn't say that a server can lock the file
    without the client telling it to. First, there is no reason to do so.
    Second,
    it would seriously kill performance.

    Eric, I think you might have a client caching problem. The client
    caches
    file attributes for seconds, which is typically configurable with
    the
    "actimeo=" mount option. The caching is there to improve local
    access
    latency and so that the client doesn't generate too much traffic.

    The "file size of 0" suggests that the management process is still
    using
    the cached file attributes, even though the file has already changed.
    So
    you may want to look into reducing the cache timeout value.

    Cheers,
    bc

+ Reply to Thread