NFS v4 and pNFS - NFS

This is a discussion on NFS v4 and pNFS - NFS ; I've seen the spec's for both but can someone describe the features in layman's terms? I understand, for the most part, how NFS v3 works. I'm not a total novice, but by no means a coder. Thanks. ~F...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 50

Thread: NFS v4 and pNFS

  1. NFS v4 and pNFS

    I've seen the spec's for both but can someone describe the features in
    layman's terms? I understand, for the most part, how NFS v3 works.
    I'm not a total novice, but by no means a coder.

    Thanks.

    ~F

  2. Re: NFS v4 and pNFS


    In article <6c6492leholo2lgnqdq7c70n4ru3c7rju2@4ax.com>,
    Faeandar writes:
    |>
    |> I've seen the spec's for both but can someone describe the features in
    |> layman's terms? I understand, for the most part, how NFS v3 works.
    |> I'm not a total novice, but by no means a coder.

    I can describe some of them for NFS v4. It has merged the communication
    streams, thus eliminating the worst of the race conditions in NFS v3.
    It has added decent support for TCP, which may help for performance.
    It has added some crude authentication, but currently only has a
    horrible Kerberos interface. If I recall, it has added support for ACLs,
    but I may be wrong.

    A quick glance at pNFS makes me think that it is yet another fiasco in
    the making. In particular, it was described as a minor tweak to NFS v4
    to allow parallelism, but specified neither its parallelism model nor
    what constraints it was going to impose on the use of NFS v4. You just
    CAN'T bolt parallelism onto a serial protocol of the NFS nature and get
    it right.


    Regards,
    Nick Maclaren.

  3. Re: NFS v4 and pNFS


    Nick Maclaren wrote:

    > A quick glance at pNFS makes me think that it is yet another fiasco in
    > the making.


    Does this mean that NFSv4 is a fiasco ?
    I wondered why I can't find anyone using it despite all the hype
    around...

    I have tried to use it myself, but can't make it to work reliably, or
    at all (most cases)...


  4. Re: NFS v4 and pNFS


    In article <1150793091.644551.132110@r2g2000cwb.googlegroups.c om>,
    "Brane2" writes:
    |> Nick Maclaren wrote:
    |>
    |> > A quick glance at pNFS makes me think that it is yet another fiasco in
    |> > the making.
    |>
    |> Does this mean that NFSv4 is a fiasco ?

    No. Even for its Kerberos aspect.

    |> I wondered why I can't find anyone using it despite all the hype
    |> around...
    |>
    |> I have tried to use it myself, but can't make it to work reliably, or
    |> at all (most cases)...

    What systems? It certainly works, after a fashion, under both Solaris
    and Linux. A good rule is to start by disabling as much of Kerberos
    as you can - well, actually, that is a good rule with ANYTHING involving
    Kerberos!


    Regards,
    Nick Maclaren.

  5. Re: NFS v4 and pNFS


    "Faeandar" wrote in message
    news:6c6492leholo2lgnqdq7c70n4ru3c7rju2@4ax.com...
    > I've seen the spec's for both but can someone describe the features in
    > layman's terms? I understand, for the most part, how NFS v3 works.
    > I'm not a total novice, but by no means a coder.
    >
    > Thanks.
    >
    > ~F


    F,

    My short list:

    NFSV4:
    * Single port ( NAT-able, NFS becomes one package,
    instead of the collection of cooperative tools as in the past)
    * Single namespace. ( Everything in one namespace instead of
    a collection of dis-joint directories)
    * Server re-direct. ( Now able to have coarse grain distributed
    multi-nodal NFS servers that look like one NFS server)
    * File delegation. ( Clients can operate autonomously and
    greatly improve performance under the right conditions)
    * TCP. (UDP is gone YES. TCP only improves situation
    for reliability and performance)
    * IPSEC. ( Can now use IPSEC and VPNs )
    * Bull Group working on NFSV4 over IPV6. ( Eliminate the
    island effect and get back to one unified planet)
    * pNFS. ( Parallel extensions to take advantage of multiple
    NFSV4 servers acting as one, in a single namespace, with
    overlapping parallel operations ongoing to multiple nodes.

    Enjoy,
    Postmaster.



  6. Re: NFS v4 and pNFS

    "Postmaster" writes:

    > NFSV4:
    > * Single port ( NAT-able, NFS becomes one package,
    > instead of the collection of cooperative tools as in the past)
    > * Single namespace. ( Everything in one namespace instead of
    > a collection of dis-joint directories)
    > * Server re-direct. ( Now able to have coarse grain distributed
    > multi-nodal NFS servers that look like one NFS server)
    > * File delegation. ( Clients can operate autonomously and
    > greatly improve performance under the right conditions)
    > * TCP. (UDP is gone YES. TCP only improves situation
    > for reliability and performance)
    > * IPSEC. ( Can now use IPSEC and VPNs )


    I think you mean "mandatory security mechanisms other than "AUTH_UNIX".

    You can use IPsec with any IP protocol.

    > * Bull Group working on NFSV4 over IPV6. ( Eliminate the
    > island effect and get back to one unified planet)


    Why doesn't this automatically work?

    I think you forgot:

    * User id mapping (can now unify multiple disjoint collections
    of usernames even if ids are overlapping)

    Casper

  7. Re: NFS v4 and pNFS

    Nick Maclaren wrote:

    > |> I wondered why I can't find anyone using it despite all the hype
    > |> around...
    > |>
    > |> I have tried to use it myself, but can't make it to work reliably, or
    > |> at all (most cases)...
    >
    > What systems? It certainly works, after a fashion, under both Solaris
    > and Linux. A good rule is to start by disabling as much of Kerberos
    > as you can - well, actually, that is a good rule with ANYTHING involving
    > Kerberos!


    I have several dual Opterons in a local Gbit Ethernet network. One
    machine is in role of fileserver and router, other two are plain
    workstations/clients. I have Gentoo-64bit running on all of them. NFS3
    runs just fine.

    But when I try to use v4 (without any autenthication), strange things
    happen:

    -if I want to see exported shares /home/some_user and /XYZ etc, I have
    to export also share with fsid=0 and that share has to physically
    contain all other shares. So I can't just mkdir /nfs4_main and export
    it with fsid=0, as the folder has to contain all other exported folders
    on fs.
    This is not how the matters are explained on nfs4.org. Specs say that
    special virtual filesystem will be constructed with fsid=0 even if I
    don't specify it that share will "glue" all other exported shares in
    one seamless (to the client) filesystem.

    -if I want to have write acces to any exported share, I have to export
    fsid=0 share as world writeable and executable and to change "/"
    attributes accordingly.

    - if "mount -t nfs4" dies, after killing the mounting process, nothing
    can be mounted "-t nfs4" until I reboot the client and server.
    Restarting nfs service doesn't help...


  8. Re: NFS v4 and pNFS

    Postmaster wrote:

    > * TCP. (UDP is gone YES. TCP only improves situation
    > for reliability and performance)


    TCP has much higher CPU overhead than UDP.
    Wouldn't UDP had an edge on fast, small, relatively reliable Ethernet
    links ?


  9. Re: NFS v4 and pNFS


    Casper H.S. Dik wrote:

    > I think you forgot:
    >
    > * User id mapping (can now unify multiple disjoint collections
    > of usernames even if ids are overlapping)


    Is there any more info on how to configure that ? I have noticed that
    nfs4 gets the username and group attributes right when writing from
    client, even though the uid of many users don't correspond exactly, so
    defaults obviously work, but there are situations, where explicit user
    mapping is needed...


  10. Re: NFS v4 and pNFS


    In article <1150876821.942825.26000@i40g2000cwc.googlegroups.c om>,
    "Brane2" writes:
    |> Postmaster wrote:
    |>
    |> > * TCP. (UDP is gone YES. TCP only improves situation
    |> > for reliability and performance)
    |>
    |> TCP has much higher CPU overhead than UDP.
    |> Wouldn't UDP had an edge on fast, small, relatively reliable Ethernet
    |> links ?

    Only if you don't mind NFS going bonkers every now and then. The cost
    of doing the error detection and recovery is similar, whether it is
    done in NFS using UDP or in TCP.


    Regards,
    Nick Maclaren.

  11. Re: NFS v4 and pNFS


    "Nick Maclaren" wrote in message
    news:e7b1q3$49t$1@gemini.csx.cam.ac.uk...
    >
    > In article <1150876821.942825.26000@i40g2000cwc.googlegroups.c om>,
    > "Brane2" writes:
    > |> Postmaster wrote:
    > |>
    > |> > * TCP. (UDP is gone YES. TCP only improves situation
    > |> > for reliability and performance)
    > |>
    > |> TCP has much higher CPU overhead than UDP.
    > |> Wouldn't UDP had an edge on fast, small, relatively reliable Ethernet
    > |> links ?
    >
    > Only if you don't mind NFS going bonkers every now and then. The cost
    > of doing the error detection and recovery is similar, whether it is
    > done in NFS using UDP or in TCP.
    >
    >
    > Regards,
    > Nick Maclaren.


    Nick,

    For the most part I agree with you :-) But retransmits by the
    NFS layer are NFS block size (big) where the retransmits
    at the TCP layer are smaller. .... So, if you drop a single
    packet in UDP you get the entire NFS block re-transmitted.
    (And lets not forget those nasty re-trans timeouts! )
    But, if you dropped a single packet in TCP, you only get
    the packet re-transmitted. This performs oh so much
    better. Also, switch manufactures these days rely on UDP
    being an un-reliable transport, and they prioritize traffic
    in the switch. Once things get hopping, UDP is pretty much
    sent to the dumpster. (you'd be better off moving your
    data with 5.25" floppies and sneaker-net. Heck, you might
    be better of with a TRS-80 Model 1 tape drive. Wesley
    Crusher, and NFS over UDP.. die--die--die :-)

    Simplified version:
    NFS over UDP -> Evil lurking performance nightmares.
    NFS over TCP -> A tad more overhead, but WAY better
    solution, with much better reliability.

    I've VERY glad that NFSV4 does not support UDP !

    Zen State: NFSV4 over TCP (with TOE), IPV6, RDMA, and
    pNFS, and throw in some 10GigE, IB, Elan, or Myrinet
    to spice things up a bit :-)

    Enjoy,
    Postmaster.



  12. Re: NFS v4 and pNFS

    On Tue, 20 Jun 2006 16:26:23 GMT, "Postmaster"
    wrote:

    >
    >"Faeandar" wrote in message
    >news:6c6492leholo2lgnqdq7c70n4ru3c7rju2@4ax.com...
    >> I've seen the spec's for both but can someone describe the features in
    >> layman's terms? I understand, for the most part, how NFS v3 works.
    >> I'm not a total novice, but by no means a coder.
    >>
    >> Thanks.
    >>
    >> ~F

    >
    >F,
    >
    > My short list:
    >
    > NFSV4:
    > * Single port ( NAT-able, NFS becomes one package,
    > instead of the collection of cooperative tools as in the past)
    > * Single namespace. ( Everything in one namespace instead of
    > a collection of dis-joint directories)
    > * Server re-direct. ( Now able to have coarse grain distributed
    > multi-nodal NFS servers that look like one NFS server)
    > * File delegation. ( Clients can operate autonomously and
    > greatly improve performance under the right conditions)
    > * TCP. (UDP is gone YES. TCP only improves situation
    > for reliability and performance)
    > * IPSEC. ( Can now use IPSEC and VPNs )
    > * Bull Group working on NFSV4 over IPV6. ( Eliminate the
    > island effect and get back to one unified planet)
    > * pNFS. ( Parallel extensions to take advantage of multiple
    > NFSV4 servers acting as one, in a single namespace, with
    > overlapping parallel operations ongoing to multiple nodes.
    >
    >Enjoy,
    >Postmaster.
    >



    My thanks to you, Casper, and Nick. This is the information I was
    looking for at the level I needed.

    ~F

  13. Re: NFS v4 and pNFS


    Postmaster wrote:
    Also, switch manufactures these days rely on UDP
    > being an un-reliable transport, and they prioritize traffic
    > in the switch. Once things get hopping, UDP is pretty much
    > sent to the dumpster. (you'd be better off moving your
    > data with 5.25" floppies and sneaker-net. Heck, you might
    > be better of with a TRS-80 Model 1 tape drive. Wesley
    > Crusher, and NFS over UDP.. die--die--die :-)
    >


    Funny, it always worked for me. I can easily get 80 MB/s or more linear
    read speed on NFSv3 over 1G Ethernet. I also have all VmWare virtual
    machines on the server and access them through NFSv3/UDP/Gbit Ethernet.
    Always worked very fine, even after client or (very rare) server crash.

    I also use Linksys "SD-2008" 8xGbit switch and never had a problem with
    NFSv3/UDP, even with heavy traffic.

    TCP can present considerable burden and decent TCP offload Etherent
    cards arent exactly growing on trees AFAIK. I have been trying to find
    a decent multiple Gbit E with TOE card, but everything was $2000 or
    over... If you have suggestion for a decent Gbit Eth interface , I'd
    love to hear it.

    TCP's error detection and retransmission can be of advanteage only with
    good enough HW and with high enough error rate, which is on local lan
    with short distances kind of hard to get. One would have to get crappy
    switch for that, but what $$$ would be saved by going for expensive
    Eth. cards ?


  14. Re: NFS v4 and pNFS

    Postmaster wrote:
    > For the most part I agree with you :-) But retransmits by
    > the NFS layer are NFS block size (big) where the retransmits at
    > the TCP layer are smaller. .... So, if you drop a single packet
    > in UDP you get the entire NFS block re-transmitted. (And lets


    IIRC everything NFS sends is always a single datagram at the UDP
    layer. It might be more accurate to say that when the NFS message is
    larger than the MTU and so the IP datagram containing the UDP datagram
    and NFS message is fragmented, if any one of the IP datagram fragments
    for that UDP datagram are lost, the entire UDP datagram is lost and so
    the NFS message must be resent.

    I may have some of the math wrong (I slept too often in probstats) but
    if we have a packet loss probability p, the probability of the packet
    not being lost is (1-p) which means then that the probability of the
    entire fragmented IP/UDP datagram carrying the NFS message not being
    lost becomes (1-p)^N where N is the number of fragments. For
    something like a 32678 byte mount, that means N is ~= 22. Now, _that_
    can be rather nasty.

    Of course, not all NFS messages are larger than the MTU

    > not forget those nasty re-trans timeouts! ) But, if you dropped
    > a single packet in TCP, you only get the packet
    > re-transmitted. This performs oh so much better.


    Depending on how things were segmented, if there were a series of
    sub-MSS NFS requests sent, a retransmission of the TCP segment
    carrying the first first could also carry retransmissions of the later
    ones as most TCP stacks will retransmit a full-MSS worth of data
    starting from the perceived hole in the sequence space.

    With UDP as the transport that wouldn't happen because each NFS
    request/response is a separate UDP datagram.

    Now, as for how often that happens...

    > Also, switch manufactures these days rely on UDP being an
    > un-reliable transport, and they prioritize traffic in the
    > switch.


    I had heard of Cisco switches doing that, but had not heard of other
    switches doing that - specifically which switches do you have in mind?

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  15. Re: NFS v4 and pNFS


    Rick Jones wrote:

    > I may have some of the math wrong (I slept too often in probstats) but
    > if we have a packet loss probability p, the probability of the packet
    > not being lost is (1-p) which means then that the probability of the
    > entire fragmented IP/UDP datagram carrying the NFS message not being
    > lost becomes (1-p)^N where N is the number of fragments. For
    > something like a 32678 byte mount, that means N is ~= 22. Now, _that_
    > can be rather nasty.


    I think your calculus here is rather flawed, otherwise it would be a
    almost a miracle to get ANYTHING through NFS with smallish MTU e.g.
    800...

    Besides, that's what jumbo frames are for ;o)
    I use MTU=9000 and with NFS mount options rsize=8192 and wsize=8192 and
    everything works beautifully...


  16. Re: NFS v4 and pNFS

    Brane2 wrote:
    > Rick Jones wrote:


    >> I may have some of the math wrong (I slept too often in probstats)
    >> but if we have a packet loss probability p, the probability of the
    >> packet not being lost is (1-p) which means then that the
    >> probability of the entire fragmented IP/UDP datagram carrying the
    >> NFS message not being lost becomes (1-p)^N where N is the number of
    >> fragments. For something like a 32678 byte mount, that means N is
    >> ~= 22. Now, _that_ can be rather nasty.


    > I think your calculus here is rather flawed, otherwise it would be a
    > almost a miracle to get ANYTHING through NFS with smallish MTU e.g.
    > 800...


    Indeed, it would be a miracle. That is one reason why IP
    fragmentation is considered so "bad."

    WRT the math - think coin flips and the probability of getting heads N
    times in a row.

    > Besides, that's what jumbo frames are for ;o)
    > I use MTU=9000 and with NFS mount options rsize=8192 and wsize=8192 and
    > everything works beautifully...


    That likely would work rather well. Of course, increasing packet
    sizes further reduces the transport's contribution to the overall
    overhead, which means the difference between using UDP and TCP would
    tend towards epsilon.

    rick jones
    --
    No need to believe in either side, or any side. There is no cause.
    There's only yourself. The belief is in your own precision. - Jobert
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  17. Re: NFS v4 and pNFS


    In article ,
    Rick Jones writes:
    |>
    |> IIRC everything NFS sends is always a single datagram at the UDP
    |> layer. It might be more accurate to say that when the NFS message is
    |> larger than the MTU and so the IP datagram containing the UDP datagram
    |> and NFS message is fragmented, if any one of the IP datagram fragments
    |> for that UDP datagram are lost, the entire UDP datagram is lost and so
    |> the NFS message must be resent.

    OK.

    |> I may have some of the math wrong (I slept too often in probstats) but
    |> if we have a packet loss probability p, the probability of the packet
    |> not being lost is (1-p) which means then that the probability of the
    |> entire fragmented IP/UDP datagram carrying the NFS message not being
    |> lost becomes (1-p)^N where N is the number of fragments. For
    |> something like a 32678 byte mount, that means N is ~= 22. Now, _that_
    |> can be rather nasty.

    Grrk. Yes, the formula is correct, but the analysis isn't! Firstly,
    p is normally VERY small - well below 10^-3 - which means that the
    probability of the overall loss is still small. Secondly, that assumes
    no correlation, and packet losses are often strongly correlated, which
    (surprisingly) is likely to REDUCE the overall loss probability.

    Yes, when things go sour, they can go VERY sour. But TCP itself isn't
    reliable with a seriously unreliable transport, as its timeout-based
    recovery is prone to fail horribly under certain conditions. I don't
    know if that is a bug in the RFC or the Berkeley reference implementation,
    or whether it is due to the timing/size constants having changed over
    the years, but I have seen one particular syndrome on 3 wildly different
    systems, with NOTHING in common in the hardware, transport or software
    (except the design they wrote from, whether RFC or Berkeley). And I
    have seen others where I suspect a similar effect.

    I have less experience with NFS, but what experience I have is fairly
    similar, even over UDP. Except the players are different :-)


    Regards,
    Nick Maclaren.

  18. Re: NFS v4 and pNFS

    Nick Maclaren wrote:
    > In article ,
    > Rick Jones writes:


    > |> I may have some of the math wrong (I slept too often in
    > |> probstats) but if we have a packet loss probability p, the
    > |> probability of the packet not being lost is (1-p) which means
    > |> then that the probability of the entire fragmented IP/UDP
    > |> datagram carrying the NFS message not being lost becomes (1-p)^N
    > |> where N is the number of fragments. For something like a 32678
    > |> byte mount, that means N is ~= 22. Now, _that_ can be rather
    > |> nasty.


    > Grrk. Yes, the formula is correct, but the analysis isn't!
    > Firstly, p is normally VERY small - well below 10^-3 - which means
    > that the probability of the overall loss is still small. Secondly,
    > that assumes no correlation, and packet losses are often strongly
    > correlated, which (surprisingly) is likely to REDUCE the overall
    > loss probability.


    How is that? Because once you lose one fragment of a datagram you
    don't care if you've lost any of the others and the chances of it
    hitting more than one datagram are reduced?

    Still, is what I put up there more or less reasonable to a first
    approximation?

    rick jones
    --
    oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  19. Re: NFS v4 and pNFS


    In article <71zmg.2169$oG6.634@news.cpqcorp.net>,
    Rick Jones writes:
    |>
    |> > Grrk. Yes, the formula is correct, but the analysis isn't!
    |> > Firstly, p is normally VERY small - well below 10^-3 - which means
    |> > that the probability of the overall loss is still small. Secondly,
    |> > that assumes no correlation, and packet losses are often strongly
    |> > correlated, which (surprisingly) is likely to REDUCE the overall
    |> > loss probability.
    |>
    |> How is that? Because once you lose one fragment of a datagram you
    |> don't care if you've lost any of the others and the chances of it
    |> hitting more than one datagram are reduced?

    Yes. Failures that occur in bursts with gaps larger than the NFS packet
    size are less damaging than ones that are uniformly spread - with other
    error-recovery strategies, the converse is true :-)

    |> Still, is what I put up there more or less reasonable to a first
    |> approximation?

    Well, yes, if you correct it to say that it is only a problem if the
    network is VERY lossy.


    Regards,
    Nick Maclaren.

  20. Re: NFS v4 and pNFS

    Nick Maclaren wrote:
    > Well, yes, if you correct it to say that it is only a problem if the
    > network is VERY lossy.


    What would be your definition of VERY here?

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast