Re: Problems with unformatted files on mpich + nfs - NFS

This is a discussion on Re: Problems with unformatted files on mpich + nfs - NFS ; antok.tm wrote: > I'm working on a task to 'paralellizing' an already made FORTRAN > application. Since it involves a huge count of variables per > iteration, I decided to use file operations (read and write) to pass > the ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Re: Problems with unformatted files on mpich + nfs

  1. Re: Problems with unformatted files on mpich + nfs

    antok.tm wrote:

    > I'm working on a task to 'paralellizing' an already made FORTRAN
    > application. Since it involves a huge count of variables per
    > iteration, I decided to use file operations (read and write) to pass
    > the values between processes.


    > If I spawn these processes ( let say 2 to 4 processes ) on SMP system
    > with 4 cores, the results are fine. But the problem occur if I spawn
    > them on 2 to 4 different computers (on a cluster) with exact OS and
    > specification ; the final results are wrong. Both systems (SMP and
    > cluster) use NFS as file storage. Are FORTRAN files that are
    > unformated isn't portable between computers even if they share an
    > exact specification and compiler ?

    (snip)

    > open(1,file='var6.dat',form='unformatted',status=' unknown')
    > write(1) var1,var2,var3
    > close(1)


    > The read part :
    > open(1,file='var6.dat',form='unformatted',status=' old')
    > read(1) var1,var2,var3
    > close(1)


    > Here is the program flow on 'human language'
    > Process1 write var6.dat ->
    > Mpi-barrier ->
    > Process2 read var6.dat


    > My system specifications are :
    > OS : Linux kernel 2.6.22
    > compiler : mpich + g77
    > Filesystem : NFS


    I haven't worried about this one for a while, and added
    comp.protocols.nfs in case anyone there knows.

    NFS does some buffering, and MPI barrier may not be enough to
    make sure that all buffers are written back.

    The NFS option for synchronous writes is supposed to not return
    to the writer until the data is on a physical storage device.
    That may or may not stop any read buffering, though.
    Also, be sure you use hard mounts.

    If it is NFS buffering you should either have old data or EOF.
    Then again, there could always be bugs in the NFS implementation.

    Anyway, I believe your problem is NFS, not Fortran.

    -- glen


  2. Re: Problems with unformatted files on mpich + nfs


    In article ,
    glen herrmannsfeldt writes:
    |> antok.tm wrote:
    |>
    |> > I'm working on a task to 'paralellizing' an already made FORTRAN
    |> > application. Since it involves a huge count of variables per
    |> > iteration, I decided to use file operations (read and write) to pass
    |> > the values between processes.
    |>
    |> > If I spawn these processes ( let say 2 to 4 processes ) on SMP system
    |> > with 4 cores, the results are fine. But the problem occur if I spawn
    |> > them on 2 to 4 different computers (on a cluster) with exact OS and
    |> > specification ; the final results are wrong. Both systems (SMP and
    |> > cluster) use NFS as file storage. Are FORTRAN files that are
    |> > unformated isn't portable between computers even if they share an
    |> > exact specification and compiler ?
    |>
    |> I haven't worried about this one for a while, and added
    |> comp.protocols.nfs in case anyone there knows.
    |>
    |> NFS does some buffering, and MPI barrier may not be enough to
    |> make sure that all buffers are written back.

    It isn't. You need the Fortran 2003 FLUSH statement, to close the
    file, or a system-dependent call to a FLUSH subroutine. And that
    IS a Fortran issue.

    |> The NFS option for synchronous writes is supposed to not return
    |> to the writer until the data is on a physical storage device.
    |> That may or may not stop any read buffering, though.
    |> Also, be sure you use hard mounts.

    Also, in NFS 3 and earlier, there are race conditions between the
    data transfer and the Inode update, which can cause bizarre effects.

    This is not helped by the POSIX specification of fsync and Synchronized
    I/O File Integrity Completion - have YOU spotted what it doesn't
    require and there is no POSIX mechanism to force? :-)

    |> If it is NFS buffering you should either have old data or EOF.
    |> Then again, there could always be bugs in the NFS implementation.

    There always are - virtually every NFS implementation adds buffering
    of forms that is forbidden by NFS, because the performance is
    catastrophic if you don't.


    Regards,
    Nick Maclaren.

  3. Re: Problems with unformatted files on mpich + nfs

    Nick Maclaren wrote:
    > (I wrote)
    > |> antok.tm wrote:


    > |> > I'm working on a task to 'paralellizing' an already made FORTRAN
    > |> > application. Since it involves a huge count of variables per
    > |> > iteration, I decided to use file operations (read and write) to pass
    > |> > the values between processes.


    > |> NFS does some buffering, and MPI barrier may not be enough to
    > |> make sure that all buffers are written back.


    > It isn't. You need the Fortran 2003 FLUSH statement, to close the
    > file, or a system-dependent call to a FLUSH subroutine. And that
    > IS a Fortran issue.


    The OP's example had a CLOSE, but I snipped it out. If you
    write on one remote machine, and read on another remote machine,
    does CLOSE guarantee the changes are seen on the second machine?

    I think I remember creating files on one machine and then it
    taking many seconds before I would see them on another machine.

    > |> The NFS option for synchronous writes is supposed to not return
    > |> to the writer until the data is on a physical storage device.
    > |> That may or may not stop any read buffering, though.
    > |> Also, be sure you use hard mounts.


    > Also, in NFS 3 and earlier, there are race conditions between the
    > data transfer and the Inode update, which can cause bizarre effects.


    > This is not helped by the POSIX specification of fsync and Synchronized
    > I/O File Integrity Completion - have YOU spotted what it doesn't
    > require and there is no POSIX mechanism to force? :-)


    I gave up on locking a long time ago, when I had a system spending
    all its time in a lockd loop waiting for something that was never
    going to happen.

    > |> If it is NFS buffering you should either have old data or EOF.
    > |> Then again, there could always be bugs in the NFS implementation.


    > There always are - virtually every NFS implementation adds buffering
    > of forms that is forbidden by NFS, because the performance is
    > catastrophic if you don't.


    So, my guess is that the OP is getting a previous version of the
    file from the buffer, as the system hadn't noticed the changes.

    -- glen


+ Reply to Thread