nfsv4; is it a whole new subsystem? - NFS

This is a discussion on nfsv4; is it a whole new subsystem? - NFS ; We have some issues with Linux NFS on v3 and are looking to start testing v4. I'm wondering if v4 would really solve the issues we're seeing. Things like: file not found, even though it's there a second later mounts ...

+ Reply to Thread
Results 1 to 18 of 18

Thread: nfsv4; is it a whole new subsystem?

  1. nfsv4; is it a whole new subsystem?

    We have some issues with Linux NFS on v3 and are looking to start
    testing v4. I'm wondering if v4 would really solve the issues we're
    seeing.

    Things like:

    file not found, even though it's there a second later
    mounts disappearing, but being available a minute later

    Thanks.

    ~F

  2. Re: nfsv4; is it a whole new subsystem?


    Faeandar wrote:
    > We have some issues with Linux NFS on v3 and are looking to start
    > testing v4. I'm wondering if v4 would really solve the issues we're
    > seeing.
    >
    > Things like:
    >
    > file not found, even though it's there a second later
    > mounts disappearing, but being available a minute later


    These aren't issues with the NFSv3 protocol. Look like implementation
    issues. So changing NFS versions wouldn't necessarily make them go
    away.
    Changing the operating system on the client might, whether it is from
    Linux 2.x to 2.6, or Linux to some other UNIX that runs on x86.

    >
    > Thanks.
    >
    > ~F



  3. Re: nfsv4; is it a whole new subsystem?

    In article <1124861758.364510.293640@g44g2000cwa.googlegroups. com>,
    Mike Eisler wrote:
    >Faeandar wrote:
    >
    >> We have some issues with Linux NFS on v3 and are looking to start
    >> testing v4. I'm wondering if v4 would really solve the issues we're
    >> seeing.
    >>
    >> Things like:
    >>
    >> file not found, even though it's there a second later
    >> mounts disappearing, but being available a minute later

    >
    >These aren't issues with the NFSv3 protocol. Look like implementation
    >issues. So changing NFS versions wouldn't necessarily make them go
    >away.
    >Changing the operating system on the client might, whether it is from
    >Linux 2.x to 2.6, or Linux to some other UNIX that runs on x86.


    Not so. At least the former IS an issue with the protocol (or,
    rather, the protocol is such that the problem is is unavoidable,
    even though most such problems may be implementation bugs). It is
    a failure syndrome that I am familiar with, on many different Unices.

    The second is common, too, though it should be transparent to users
    of the simple I/O interfaces (e.g. it is NOT transparent if you do
    a stat or equivalent on the mount point itself). And there is a bug
    in some Linuces (2.6 kernel) where the recovery fails, occasionally,
    so it isn't transparent even to simple I/O.

    The use of separate 'streams' for I/O and control means that there
    are necessarily race conditions as soon as you start to have a
    seriously parallel Unix implementation. I first hit this badly
    on one what provided a single system image on distributed memory
    using multiple cooperating micro-kernels (no, not Beowulf, but
    rather like Beowulf designed in).

    If that IS the cause, NFS v4 will help. In our limited experience
    of it, it does seem to help. What you CAN say is that, IF you have
    such problems with NFS v4, they ARE implementation bugs. With
    NFS v3, they probably are, but may be protocol misdesigns.


  4. Re: nfsv4; is it a whole new subsystem?


    Nick Maclaren wrote:
    > In article <1124861758.364510.293640@g44g2000cwa.googlegroups. com>,
    > Mike Eisler wrote:
    > >Faeandar wrote:
    > >
    > >> We have some issues with Linux NFS on v3 and are looking to start
    > >> testing v4. I'm wondering if v4 would really solve the issues we're
    > >> seeing.
    > >>
    > >> Things like:
    > >>
    > >> file not found, even though it's there a second later
    > >> mounts disappearing, but being available a minute later

    > >
    > >These aren't issues with the NFSv3 protocol. Look like implementation
    > >issues. So changing NFS versions wouldn't necessarily make them go
    > >away.
    > >Changing the operating system on the client might, whether it is from
    > >Linux 2.x to 2.6, or Linux to some other UNIX that runs on x86.

    >
    > Not so. At least the former IS an issue with the protocol (or,
    > rather, the protocol is such that the problem is is unavoidable,


    Nick,

    Thanks for the nice comments on v4. Some questions ...

    How does NFSv3 make files disappear and re-appear?

    > even though most such problems may be implementation bugs). It is
    > a failure syndrome that I am familiar with, on many different Unices.
    >
    > The second is common, too, though it should be transparent to users
    > of the simple I/O interfaces (e.g. it is NOT transparent if you do
    > a stat or equivalent on the mount point itself). And there is a bug
    > in some Linuces (2.6 kernel) where the recovery fails, occasionally,
    > so it isn't transparent even to simple I/O.


    How does the protocol cause mounts to go away? (And what do you mean
    when by "mounts going away")?

    > The use of separate 'streams' for I/O and control means that there
    > are necessarily race conditions as soon as you start to have a


    What's an example of an NFSv3 procedure that does control?

    > seriously parallel Unix implementation. I first hit this badly
    > on one what provided a single system image on distributed memory
    > using multiple cooperating micro-kernels (no, not Beowulf, but
    > rather like Beowulf designed in).
    >
    > If that IS the cause, NFS v4 will help. In our limited experience


    I'm pleased NFSv4 is solving problems for you. That was the idea.

    > of it, it does seem to help. What you CAN say is that, IF you have
    > such problems with NFS v4, they ARE implementation bugs. With
    > NFS v3, they probably are, but may be protocol misdesigns.



  5. Re: nfsv4; is it a whole new subsystem?


    In article <1124872914.472027.271240@f14g2000cwb.googlegroups. com>,
    "Mike Eisler" writes:
    |>
    |> Thanks for the nice comments on v4. Some questions ...
    |>
    |> How does NFSv3 make files disappear and re-appear?

    Because the stat daemon is distinct from the I/O protocol. Both
    for performance reasons and, more seriously, RAS ones, it isn't
    possible to implement the strictest form of global synchronisation
    on a parallel system.

    |> How does the protocol cause mounts to go away? (And what do you mean
    |> when by "mounts going away")?

    The protocol doesn't. It's automounters :-( The problem occurs
    with failure to recover from "stale file handle". It hasn't become
    hard enough here, YET, for us to chase down the failure. It would
    be a nightmare to do!

    |> > The use of separate 'streams' for I/O and control means that there
    |> > are necessarily race conditions as soon as you start to have a
    |>
    |> What's an example of an NFSv3 procedure that does control?

    As above. Stat, locking etc.

    |> > seriously parallel Unix implementation. I first hit this badly
    |> > on one what provided a single system image on distributed memory
    |> > using multiple cooperating micro-kernels (no, not Beowulf, but
    |> > rather like Beowulf designed in).
    |> >
    |> > If that IS the cause, NFS v4 will help. In our limited experience
    |>
    |> I'm pleased NFSv4 is solving problems for you. That was the idea.

    Well, it would if a certain vendor would get a move on and fix
    a critical bug in their kernel/STREAMS/NFS implementation and we
    could move to it.

    We would also like a less brain-dead security implementation than
    Kerberos - ideally, SSH keys, with the option for host-based
    authentication and all. If the answer is Kerberos, you can be
    certain that you have asked the wrong question - it's nearly as
    bad as the X Windowing System :-(


    Regards,
    Nick Maclaren.

  6. Re: nfsv4; is it a whole new subsystem?

    On 23 Aug 2005 22:35:58 -0700, "Mike Eisler"
    wrote:

    >
    >Faeandar wrote:
    >> We have some issues with Linux NFS on v3 and are looking to start
    >> testing v4. I'm wondering if v4 would really solve the issues we're
    >> seeing.
    >>
    >> Things like:
    >>
    >> file not found, even though it's there a second later
    >> mounts disappearing, but being available a minute later

    >
    >These aren't issues with the NFSv3 protocol. Look like implementation
    >issues. So changing NFS versions wouldn't necessarily make them go
    >away.
    >Changing the operating system on the client might, whether it is from
    >Linux 2.x to 2.6, or Linux to some other UNIX that runs on x86.
    >
    >>
    >> Thanks.
    >>
    >> ~F


    So, as Nick has mentioned there are v3 specific issues. Or at least
    it seems that way.

    I can't honestly say what the exact issues are but I can say:

    1) we don't have any of the problems in Solaris
    2) both the Solaris and Linux guru's here agree it's a Linux issue
    3) both agree it's specifically NFS on Linux
    4) both know more about these OS's than me

    Sooooo, I figure lets try v4 and see what happens, but I wanted to
    know if it was completely different or just layered.

    ~F

  7. Re: nfsv4; is it a whole new subsystem?

    In article ,
    Faeandar wrote:
    >
    >So, as Nick has mentioned there are v3 specific issues. Or at least
    >it seems that way.
    >
    >I can't honestly say what the exact issues are but I can say:
    >
    >1) we don't have any of the problems in Solaris


    Oh, yes, you do. They may occur sufficiently rarely that you don't
    notice them, but I can assure you that they are there in Solaris
    and will occur in all configurations. The point is that, like most
    such race conditions, the probability varies IMMENSELY withe the
    exact configuration and use. What can be once an hour for a large,
    parallel system with a difficult workload could be once a century
    for a small, serial one with an easier workload.

    >2) both the Solaris and Linux guru's here agree it's a Linux issue


    They may be right with the particular failures you see - there are
    definitely more bugs in Linux's NFS than in Solaris's.

    >3) both agree it's specifically NFS on Linux


    Until they have identified the actuual bug, how can they tell? I have
    seen HUNDREDS of generic bugs that happen to show up on only a single
    system.

    >4) both know more about these OS's than me


    Probably than me, but I probably know about a wider range than they do.

    >Sooooo, I figure lets try v4 and see what happens, but I wanted to
    >know if it was completely different or just layered.


    It is definitely different, but not completely different. It most
    definitely isn't just layered on top of NFS v3.


    Regards,
    Nick Maclaren.

  8. Re: nfsv4; is it a whole new subsystem?


    Nick Maclaren wrote:
    > In article <1124872914.472027.271240@f14g2000cwb.googlegroups. com>,
    > "Mike Eisler" writes:
    > |>
    > |> Thanks for the nice comments on v4. Some questions ...
    > |>
    > |> How does NFSv3 make files disappear and re-appear?
    >
    > Because the stat daemon is distinct from the I/O protocol. Both


    What stat daemon? Walk me through how this stat daemon and the I/O
    protocol make files disappear.

    > for performance reasons and, more seriously, RAS ones, it isn't
    > possible to implement the strictest form of global synchronisation
    > on a parallel system.


    That's what byte range locking is for, albeit at a performance cost
    (which is mitigated somewhat with NFSv4 delegations).

    > |> How does the protocol cause mounts to go away? (And what do you mean
    > |> when by "mounts going away")?
    >
    > The protocol doesn't. It's automounters :-( The problem occurs


    Ah, so there are problems specific to implementations.

    > with failure to recover from "stale file handle". It hasn't become
    > hard enough here, YET, for us to chase down the failure. It would
    > be a nightmare to do!


    How do you know you are actually getting stale file handle errors
    from the NFS server?

    I worked on one customer problem a while back where the
    Linux NFS client was reporting ESTALE back to the
    app because it claimed the inode# of the file changed.

    Reporting ESTALE for this is incorrect implementation.

    Packet traces verified that the NFS server was not changing the
    fileid.

    Turned out the automounter being used, which was the old fashioned
    kind that was implemented as a fake NFS server, was the source of
    the changed fileid.

    Modern, autofs-based automounters wouldn't do this.

    > |> > The use of separate 'streams' for I/O and control means that there
    > |> > are necessarily race conditions as soon as you start to have a
    > |>
    > |> What's an example of an NFSv3 procedure that does control?
    >
    > As above. Stat, locking etc.
    >
    > |> > seriously parallel Unix implementation. I first hit this badly
    > |> > on one what provided a single system image on distributed memory
    > |> > using multiple cooperating micro-kernels (no, not Beowulf, but
    > |> > rather like Beowulf designed in).
    > |> >
    > |> > If that IS the cause, NFS v4 will help. In our limited experience
    > |>
    > |> I'm pleased NFSv4 is solving problems for you. That was the idea.
    >
    > Well, it would if a certain vendor would get a move on and fix
    > a critical bug in their kernel/STREAMS/NFS implementation and we
    > could move to it.
    >
    > We would also like a less brain-dead security implementation than
    > Kerberos - ideally, SSH keys, with the option for host-based
    > authentication and all. If the answer is Kerberos, you can be
    > certain that you have asked the wrong question - it's nearly as
    > bad as the X Windowing System :-(


    Most computers in the world use Kerberos now.


  9. Re: nfsv4; is it a whole new subsystem?

    On 24 Aug 2005 18:45:02 GMT, nmm1@cus.cam.ac.uk (Nick Maclaren) wrote:

    >In article ,
    >Faeandar wrote:
    >>
    >>So, as Nick has mentioned there are v3 specific issues. Or at least
    >>it seems that way.
    >>
    >>I can't honestly say what the exact issues are but I can say:
    >>
    >>1) we don't have any of the problems in Solaris

    >
    >Oh, yes, you do. They may occur sufficiently rarely that you don't
    >notice them, but I can assure you that they are there in Solaris
    >and will occur in all configurations. The point is that, like most
    >such race conditions, the probability varies IMMENSELY withe the
    >exact configuration and use. What can be once an hour for a large,
    >parallel system with a difficult workload could be once a century
    >for a small, serial one with an easier workload.


    Perhaps it is then, we just know that the stability of Solaris is many
    factors greater than Linux when it comes to NFS.

    The environment is one of chip design and subsequent simulation runs.
    The Linux boxes are notorious for choking on file access where the
    Solaris hosts just chug away without experiencing the problems. At
    least not noticeably anyway.

    >
    >>2) both the Solaris and Linux guru's here agree it's a Linux issue

    >
    >They may be right with the particular failures you see - there are
    >definitely more bugs in Linux's NFS than in Solaris's.
    >
    >>3) both agree it's specifically NFS on Linux

    >
    >Until they have identified the actuual bug, how can they tell? I have
    >seen HUNDREDS of generic bugs that happen to show up on only a single
    >system.


    They may have, I just haven't had the time to go over it with them
    yet. We're about to embark on a full blown v4 evaluation several
    months ahead of schedule simply due to the Linux instability.

    >
    >>4) both know more about these OS's than me

    >
    >Probably than me, but I probably know about a wider range than they do.
    >
    >>Sooooo, I figure lets try v4 and see what happens, but I wanted to
    >>know if it was completely different or just layered.

    >
    >It is definitely different, but not completely different. It most
    >definitely isn't just layered on top of NFS v3.
    >
    >
    >Regards,
    >Nick Maclaren.


    ~F

  10. Re: nfsv4; is it a whole new subsystem?

    In article ,
    Faeandar wrote:
    >
    >Perhaps it is then, we just know that the stability of Solaris is many
    >factors greater than Linux when it comes to NFS.


    Yes, that is precisely what is the case! Linux is a HELL of a lot
    better than it was even two years ago, but is still a fair way behind
    Solaris in terms of the stability of its NFS - and especially its
    NFS server.


    Regards,
    Nick Maclaren.

  11. Re: nfsv4; is it a whole new subsystem?


    In article <1124922133.400912.259120@o13g2000cwo.googlegroups. com>,
    "Mike Eisler" writes:
    |> Nick Maclaren wrote:
    |> > In article <1124872914.472027.271240@f14g2000cwb.googlegroups. com>,
    |> > "Mike Eisler" writes:
    |> > |>
    |> > |> Thanks for the nice comments on v4. Some questions ...
    |> > |>
    |> > |> How does NFSv3 make files disappear and re-appear?
    |> >
    |> > Because the stat daemon is distinct from the I/O protocol. Both
    |>
    |> What stat daemon? Walk me through how this stat daemon and the I/O
    |> protocol make files disappear.

    On Solaris:

    franklin-2$ps -ef | grep statd
    daemon 358 1 0 Mar 30 ? 0:00 /usr/lib/nfs/statd

    No, I can't tell you what the detailed problem is without becoming
    a first-level expert on that area, but I can tell you two things:

    It is a known fact in computational theory that it is
    impossible to achieve perfect synchronisation of an arbitrary
    number of independent channels without the unacceptable overhead
    of global synchronisation. Yes, you can block any one problem,
    but at the expense of another springing up elsewhere. This is
    mathematically provable.

    I have observed the problem on every Unix that I have used
    seriously on large, parallel systems, including Solaris 9 and 10.

    |> > for performance reasons and, more seriously, RAS ones, it isn't
    |> > possible to implement the strictest form of global synchronisation
    |> > on a parallel system.
    |>
    |> That's what byte range locking is for, albeit at a performance cost
    |> (which is mitigated somewhat with NFSv4 delegations).

    No, no, no! Absolutely NOT!!! There are so many things wrong with
    that belief, as any ex-mainframe person can tell you, that it is
    almost impossible to explain why. Start with the fact that it does
    not synchronise the inode and file contents data (which is where
    statd comes in), and then go on to the fact that it doesn't provide
    consistency for the applications that have guaranteed temporal
    synchronicity by the use of non-NFS means (e.g. pipes, signals, MPI
    etc.) And there is MUCH more :-(

    |> > |> How does the protocol cause mounts to go away? (And what do you mean
    |> > |> when by "mounts going away")?
    |> >
    |> > The protocol doesn't. It's automounters :-( The problem occurs
    |>
    |> Ah, so there are problems specific to implementations.

    The root cause of the problem is that the protocol requires more
    than just Unix semantics - it requires a system that uses global
    synchronisation to an extent that is incompatible with RAS, let
    alone performance. And NO serious modern Unix provides that.

    |> > with failure to recover from "stale file handle". It hasn't become
    |> > hard enough here, YET, for us to chase down the failure. It would
    |> > be a nightmare to do!
    |>
    |> How do you know you are actually getting stale file handle errors
    |> from the NFS server?

    I never said that I was.

    |> I worked on one customer problem a while back where the
    |> Linux NFS client was reporting ESTALE back to the
    |> app because it claimed the inode# of the file changed.
    |>
    |> Reporting ESTALE for this is incorrect implementation.

    That could well be the issue.

    |> > We would also like a less brain-dead security implementation than
    |> > Kerberos - ideally, SSH keys, with the option for host-based
    |> > authentication and all. If the answer is Kerberos, you can be
    |> > certain that you have asked the wrong question - it's nearly as
    |> > bad as the X Windowing System :-(
    |>
    |> Most computers in the world use Kerberos now.

    Most administrators don't. It is completely brain-dead, and is
    incompatible with most modern systems (including Solaris) and
    running large systems with a small number of staff.


    Regards,
    Nick Maclaren.

  12. Re: nfsv4; is it a whole new subsystem?


    Nick Maclaren wrote:
    > In article <1124922133.400912.259120@o13g2000cwo.googlegroups. com>,
    > "Mike Eisler" writes:
    > |> Nick Maclaren wrote:
    > |> > In article <1124872914.472027.271240@f14g2000cwb.googlegroups. com>,
    > |> > "Mike Eisler" writes:
    > |> > |>
    > |> > |> Thanks for the nice comments on v4. Some questions ...
    > |> > |>
    > |> > |> How does NFSv3 make files disappear and re-appear?
    > |> >
    > |> > Because the stat daemon is distinct from the I/O protocol. Both
    > |>
    > |> What stat daemon? Walk me through how this stat daemon and the I/O
    > |> protocol make files disappear.
    >
    > On Solaris:
    >
    > franklin-2$ps -ef | grep statd
    > daemon 358 1 0 Mar 30 ? 0:00 /usr/lib/nfs/statd
    >


    Nick, that's the status monitor. It doesn't do anything on
    NFS clients but wait for NFS servers to tell it the server has
    rebooted.
    On NFS servers it waits for NFS clients to reboot. In the case of
    clients,
    a server indication of reboot triggers lock recovery.

    > I have observed the problem on every Unix that I have used
    > seriously on large, parallel systems, including Solaris 9 and 10.


    What problem?

    > |> > for performance reasons and, more seriously, RAS ones, it isn't
    > |> > possible to implement the strictest form of global synchronisation
    > |> > on a parallel system.
    > |>
    > |> That's what byte range locking is for, albeit at a performance cost
    > |> (which is mitigated somewhat with NFSv4 delegations).
    >
    > No, no, no! Absolutely NOT!!! There are so many things wrong with
    > that belief, as any ex-mainframe person can tell you, that it is
    > almost impossible to explain why. Start with the fact that it does
    > not synchronise the inode and file contents data (which is where
    > statd comes in), and then go on to the fact that it doesn't provide

    ^^^^^^^

    You need to buy a copy of my book, Managing NFS and NIS, 2nd Edition.

    > consistency for the applications that have guaranteed temporal
    > synchronicity by the use of non-NFS means (e.g. pipes, signals, MPI
    > etc.) And there is MUCH more :-(
    >
    > |> > |> How does the protocol cause mounts to go away? (And what do you mean
    > |> > |> when by "mounts going away")?
    > |> >
    > |> > The protocol doesn't. It's automounters :-( The problem occurs
    > |>
    > |> Ah, so there are problems specific to implementations.
    >
    > The root cause of the problem is that the protocol requires more
    > than just Unix semantics - it requires a system that uses global
    > synchronisation to an extent that is incompatible with RAS, let
    > alone performance. And NO serious modern Unix provides that.


    So how is this an NFS bug?

    > |> Most computers in the world use Kerberos now.
    >
    > Most administrators don't. It is completely brain-dead, and is


    Actually they do. Windows 2000, 2003, Vista, XP all use it.
    Administrators
    call it Active Directory, but it is really Kerberos V5 (and LDAP, DNS,
    etc.). You have to work hard to turn it off and go back to Lan Manager.
    Kerberos V5 is everywhere. Kerberos V5 has won; Redmond made it so.


  13. Re: nfsv4; is it a whole new subsystem?


    In article <1124978279.598230.177470@o13g2000cwo.googlegroups. com>,
    "Mike Eisler" writes:
    |>
    |> Nick, that's the status monitor. It doesn't do anything on
    |> NFS clients but wait for NFS servers to tell it the server has
    |> rebooted.
    |> On NFS servers it waits for NFS clients to reboot. In the case of
    |> clients,
    |> a server indication of reboot triggers lock recovery.

    Now, that's interesting. It indicates that the origin of the
    problem is elsewhere in the system, and therefore it may NOT be
    cured by NFS v4. Oh, joy :-(

    |> > I have observed the problem on every Unix that I have used
    |> > seriously on large, parallel systems, including Solaris 9 and 10.
    |>
    |> What problem?

    The problem about inode data, file contents and directory contents
    getting out of synchronisation with each other, in the following
    sense:

    A set of processes/threads performs a set of I/O operations
    and other communications that define a partial ordering (a DAG).

    The results of at least some of such operations are such that
    they are incompatible with the DAG.

    |> > The root cause of the problem is that the protocol requires more
    |> > than just Unix semantics - it requires a system that uses global
    |> > synchronisation to an extent that is incompatible with RAS, let
    |> > alone performance. And NO serious modern Unix provides that.
    |>
    |> So how is this an NFS bug?

    Because NFS is assuming semantics that are not specified by the
    systems it is defined to fit in with. NFS is not interesting in
    glorious isolation from an operating system and applications.

    When an application writer does that, it is called relying on
    undefined behaviour, and is invariably a bug. Why is that not
    the case for NFS?

    |> > |> Most computers in the world use Kerberos now.
    |> >
    |> > Most administrators don't. It is completely brain-dead, and is
    |>
    |> Actually they do. Windows 2000, 2003, Vista, XP all use it.
    |> Administrators
    |> call it Active Directory, but it is really Kerberos V5 (and LDAP, DNS,
    |> etc.). You have to work hard to turn it off and go back to Lan Manager.
    |> Kerberos V5 is everywhere. Kerberos V5 has won; Redmond made it so.

    There is more to computing than Microsoft Windows. There may be
    a hell of a lot of Microsoft systems around, but most of their
    'administrators' aren't. You are correct that I was referring
    primarily to Unix systems, where Kerberos is generally regarded
    with the contempt it deserves.


    Regards,
    Nick Maclaren.

  14. Re: nfsv4; is it a whole new subsystem?


    Nick Maclaren wrote:

    > |> > I have observed the problem on every Unix that I have used
    > |> > seriously on large, parallel systems, including Solaris 9 and 10.
    > |>
    > |> What problem?
    >
    > The problem about inode data, file contents and directory contents
    > getting out of synchronisation with each other, in the following


    The synchronization of the contents is a property of the local
    file system on the NFS server.

    The synchronization of the caching of the contents the responsibility
    of the NFS client.

    > sense:
    >
    > A set of processes/threads performs a set of I/O operations
    > and other communications that define a partial ordering (a DAG).
    >
    > The results of at least some of such operations are such that
    > they are incompatible with the DAG.


    A little more concrete please?

    > |> > The root cause of the problem is that the protocol requires more
    > |> > than just Unix semantics - it requires a system that uses global
    > |> > synchronisation to an extent that is incompatible with RAS, let
    > |> > alone performance. And NO serious modern Unix provides that.
    > |>
    > |> So how is this an NFS bug?
    >
    > Because NFS is assuming semantics that are not specified by the
    > systems it is defined to fit in with. NFS is not interesting in


    Examples?

    > glorious isolation from an operating system and applications.
    >
    > When an application writer does that, it is called relying on
    > undefined behaviour, and is invariably a bug. Why is that not
    > the case for NFS?



  15. Re: nfsv4; is it a whole new subsystem?


    In article <1124984212.757769.266790@g14g2000cwa.googlegroups. com>,
    "Mike Eisler" writes:
    |> Nick Maclaren wrote:
    |>
    |> > A set of processes/threads performs a set of I/O operations
    |> > and other communications that define a partial ordering (a DAG).
    |> >
    |> > The results of at least some of such operations are such that
    |> > they are incompatible with the DAG.
    |>
    |> A little more concrete please?

    Oh, gosh. I/O operations within a single thread have an implied
    order (i.e. that of the thread's serial ordering in the von Neumann
    sense). Communications between threads imply an order between the
    threads (i.e. that of communication method, which is not always as
    simple as appears). I/O operations to a single file system will
    imply an order where the action of one operation has an effect on
    the result of another.

    All of this defines a partial order (a DAG). If, when actually
    running such a code, the results of the program do not correspond
    with any ordering that is compatible with the DAG, you have a
    consistency failure. This shows up in a high-quality program by
    a consistency check failing and in a typical one by unrepeatable
    misbehaviour.

    This is all standard knowledge in the parallel computing arena.

    |> > Because NFS is assuming semantics that are not specified by the
    |> > systems it is defined to fit in with. NFS is not interesting in
    |>
    |> Examples?

    See above.

    I will give you one very concrete example. A set of threads is
    communicating with an NFS server, both of which are running on
    parallel systems with multiple network connexions. I am pretty
    sure that NFS assumes that only one such connexion is used, so
    that its operations cannot overtake one another in flight. That
    is NOT what systems provide.


    Regards,
    Nick Maclaren.

  16. Re: nfsv4; is it a whole new subsystem?

    Nick Maclaren wrote:

    > |> What problem?
    >
    > The problem about inode data, file contents and directory contents
    > getting out of synchronisation with each other, in the following
    > sense:
    >
    > A set of processes/threads performs a set of I/O operations
    > and other communications that define a partial ordering (a DAG).
    >
    > The results of at least some of such operations are such that
    > they are incompatible with the DAG.


    Bug, problem or feature? The defined semantics of NFS do not match
    exactly *any* operating system. This is especially true of any
    application that expects exact Posix/Unix(tm) semantics, likewise
    it cannot handle all Windows semantics. There
    are some well known classes of applications that will fail, and
    there is no desire in the NFS community to fix most of them. However
    the defined semantics do meet a very large set of applications.

    You have a very valid point that any distributed data/file system
    that strives to create pure single system semantics is a *VERY*
    hard problem. Sun's PxFS did this for a tightly coupled cluster
    with small latencies, but for many applications the cost of PxFS
    was too high so it was not widely used. To do this single system
    semantics with the latencies associated with campus or WAN networks
    is extremely difficult.

    But NFS is not designed for, nor should be used for, a distributed
    application that expects single system semantics. The bug is using
    the wrong tool for the wrong job.

    This is not limited to filesystems, looking at IBM's latest X3 SMP
    chipset, it is very complex just to manage memory within a box.

    -David

  17. Re: nfsv4; is it a whole new subsystem?


    In article ,
    David Robinson writes:
    |>
    |> But NFS is not designed for, nor should be used for, a distributed
    |> application that expects single system semantics. The bug is using
    |> the wrong tool for the wrong job.

    Unfortunately, that is not the only use of it that fails. As I
    said, I have seen it fail when used from a single Unix system to
    a single Unix fileserver, running a great many different Unices.
    Of course, all of those were parallel (often highly parallel)
    SMP and similar servers :-)

    The point is that it requires a higher degree of serialisation
    than such systems actually provide.


    Regards,
    Nick Maclaren.

  18. Re: nfsv4; is it a whole new subsystem?

    Nick Maclaren wrote:
    > In article ,
    > David Robinson writes:
    > |>
    > |> But NFS is not designed for, nor should be used for, a distributed
    > |> application that expects single system semantics. The bug is using
    > |> the wrong tool for the wrong job.


    > Unfortunately, that is not the only use of it that fails. As I
    > said, I have seen it fail when used from a single Unix system to
    > a single Unix fileserver, running a great many different Unices.
    > Of course, all of those were parallel (often highly parallel)
    > SMP and similar servers :-)


    The NFS protocol also does not claim to support single system
    semantics even in the degenerate case of single client with
    single server. However, a decent client implementation will
    make the number of failures (when compared to the multi-client
    case) extremely small. A good client implementation on an SMP
    system will front NFS with a unified VM system or unified
    buffer cache so that any store ordering issues will be handled
    before the NFS client gets involved. However, using interfaces
    such as directio that bypass VM or buffer caches, will face
    the same issues that a multi-client application will see.

    > The point is that it requires a higher degree of serialisation
    > than such systems actually provide.


    Architecturally in a parallel system, you must decide where the
    semantics of which of two competing operations wins occurs. The only
    reliable and reproduceable way to accomplish that is to have
    an explicit barrier operation. In some cases the barrier is
    exposed to the application, others it is imposed by the
    software library or operating system, others will do
    it at system boundaries. Even in a total store order system
    a synchronization operation is required between parallel threads.

    NFS, by design, has no barrier operations outside explicit
    record locking. Ordering of concurrent I/O operations must be
    established before the requests are sent to the NFS client.
    Ordering may be a function of the OS or of an application
    doing non-NFS synchronization.
    Good SMP implementations will present the same ordering to
    the NFS client as it does to a "local" filesystem so that
    this is not an issue. Quality of implementation does very...

    -David

+ Reply to Thread