nfsd - SGI

This is a discussion on nfsd - SGI ; Greeting, We're having big perfomance issues with our origin 3000 and I'm pretty sure that it's because we don't have enough nfsd deamons. I don't doubt the quality of the SGI's techs, but they're telling us not to use more ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: nfsd

  1. nfsd

    Greeting,

    We're having big perfomance issues with our origin 3000 and I'm
    pretty sure that it's because we don't have enough nfsd deamons. I don't
    doubt the quality of the SGI's techs, but they're telling us not to use
    more that 1 nfsd per cpu, in our case, that means that only 4 nfsd are
    deserving more than 230 linux machines! On the other hand, samba forks
    over 130 smbd deamons for our windows stations.

    It has come to the point that we're thinking of mounting the drives
    through samba on our linux render farm. But to me, that's just wrong!

    Can any one tell me if there is a reason for not running more than 1
    nfsd deamon per CPU? Would it be that bad if I boost the number of nfsd
    to 250 or will that just screw up our server even more?

    One more thing, is there a way to assure a higher quality of service for
    the nfs client than samba's?

    Thanks,

    Seb


  2. Re: nfsd

    Sebastien Charland wrote:

    > Greeting,
    >
    > We're having big perfomance issues with our origin 3000 and I'm pretty
    > sure that it's because we don't have enough nfsd deamons. I don't doubt
    > the quality of the SGI's techs, but they're telling us not to use more
    > that 1 nfsd per cpu,


    Very old advice. I'd probably run about 80 on a modern 4 cpu
    NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    if you have tons of clients.

    Why don't you *try* it?

    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI Brussels



  3. Re: nfsd

    >>>>> "Seb" == Sebastien Charland writes:

    Seb> We're having big perfomance issues with our origin 3000 and I'm
    Seb> pretty sure that it's because we don't have enough nfsd deamons. I
    Seb> don't doubt the quality of the SGI's techs, but they're telling us
    Who are 'they'? If you refer to to the NFS Admin Guide, then the guide
    is submarvelous and it in process of being updated. Or if you're
    talking about real people, could you ask them to contact me or my
    manager - maybe we need to explain them a few things. Normally, the
    kind of load you're running would determine the kind of tuning you
    must do, so you tell to 'them' what kind of load you've got and I can
    try to give 'them' some advice on how to tune.

    Seb> not to use more that 1 nfsd per cpu, in our case, that means that
    That's bad, especially considering stuffed linux read-ahead behaviour.

    Seb> only 4 nfsd are deserving more than 230 linux machines!
    That's a bit rich for a single 4P machine, especially if you're
    trying to use it for something other then just NFS serving.


    Seb> Can any one tell me if there is a reason for not running more than 1
    Seb> nfsd deamon per CPU? Would it be that bad if I boost the number of
    Seb> nfsd to 250 or will that just screw up our server even more?

    Yes and No. The official SpecSFS numbers run used something like 270
    nfsd. In general, you need between 64 and 128 nfsds on 4P machine,
    more if you're experiencing bursts of NFS activity - remember, there
    is one-to-one ratio between number of nfsd and number of calls
    processed in parallel. The only things which you should be aware is
    that 250 nfsd would peg your 4P machine at 100% of CPU.

    6.5.22 could help a bit here with introduction of dynamic nfsd, so if
    upgrading to .22, make sure you remove your hard-coded nfsd
    number to allow for dynamic stuff to work.

    Also there are other things which must be tuned for making a NFS
    server out of general purpose machine. If you can, try looking for
    "Tunung NFS" presentation from the last developer conference - could
    help you a bit. But as I've said - learn about your load, tune for it.

    Seb> One more thing, is there a way to assure a higher quality of service
    Seb> for the nfs client than samba's?
    What exactly do you mean by that?

    max

  4. Re: nfsd

    Alexis Cousein wrote:
    > Sebastien Charland wrote:
    >
    >> Greeting,
    >>
    >> We're having big perfomance issues with our origin 3000 and I'm
    >> pretty sure that it's because we don't have enough nfsd deamons. I
    >> don't doubt the quality of the SGI's techs, but they're telling us not
    >> to use more that 1 nfsd per cpu,

    >
    >
    > Very old advice. I'd probably run about 80 on a modern 4 cpu
    > NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    > if you have tons of clients.
    >
    > Why don't you *try* it?
    >

    production server; 230 linux renderfarm + 130 windows users depend on
    it... I can't just do a series of reboot just for the heck of it ;-)

    That's why I need to know if there's a real issue about the number of
    nfsd I can safely run. My guts tell me I should put 250, but I'm a linux
    guy, not an IRIX one :-)

    Cheers,

    Seb


  5. Re: nfsd

    Sebastien Charland wrote:

    > Alexis Cousein wrote:
    >
    >> Sebastien Charland wrote:
    >>
    >>> Greeting,
    >>>
    >>> We're having big perfomance issues with our origin 3000 and I'm
    >>> pretty sure that it's because we don't have enough nfsd deamons. I
    >>> don't doubt the quality of the SGI's techs, but they're telling us
    >>> not to use more that 1 nfsd per cpu,

    >>
    >>
    >>
    >> Very old advice. I'd probably run about 80 on a modern 4 cpu
    >> NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    >> if you have tons of clients.
    >>
    >> Why don't you *try* it?
    >>

    > production server; 230 linux renderfarm + 130 windows users depend on
    > it... I can't just do a series of reboot just for the heck of it ;-)


    Why on earth do you think you have to reboot?

    [NFS is supposed to be a stateless protocol, you know. Not that I'd
    tempt fate and kill and restart the nfs daemon with 230 machines
    connected, but still...]

    And no, there is *no* problem in having gobs of NFS daemons - unless
    you want to use that machine for doing something else (and even then,
    I'd tend to go ahead but confine them in a cpuset).

    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI/Silicon Graphics Brussels




  6. Re: nfsd

    On Thu, 16 Oct 2003, Alexis Cousein wrote:

    > Sebastien Charland wrote:
    >
    > > Greeting,
    > >
    > > We're having big perfomance issues with our origin 3000 and I'm pretty
    > > sure that it's because we don't have enough nfsd deamons. I don't doubt
    > > the quality of the SGI's techs, but they're telling us not to use more
    > > that 1 nfsd per cpu,

    >
    > Very old advice. I'd probably run about 80 on a modern 4 cpu
    > NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    > if you have tons of clients.
    >
    > Why don't you *try* it?


    Don't forget to up file descriptors as well. The Irix default systemwide
    max is fine for a desktop machine, but is woefully inadquate for just
    about anything else.

    Jamie Bowden
    --
    "It was half way to Rivendell when the drugs began to take hold"
    Hunter S Tolkien "Fear and Loathing in Barad Dur"
    Iain Bowen


  7. Re: nfsd

    Alexis Cousein wrote:
    > Sebastien Charland wrote:
    >
    >> Alexis Cousein wrote:
    >>
    >>> Sebastien Charland wrote:
    >>>
    >>>> Greeting,
    >>>>
    >>>> We're having big perfomance issues with our origin 3000 and I'm
    >>>> pretty sure that it's because we don't have enough nfsd deamons. I
    >>>> don't doubt the quality of the SGI's techs, but they're telling us
    >>>> not to use more that 1 nfsd per cpu,
    >>>
    >>>
    >>>
    >>>
    >>> Very old advice. I'd probably run about 80 on a modern 4 cpu
    >>> NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    >>> if you have tons of clients.
    >>>
    >>> Why don't you *try* it?
    >>>

    >> production server; 230 linux renderfarm + 130 windows users depend on
    >> it... I can't just do a series of reboot just for the heck of it ;-)

    >
    >
    > Why on earth do you think you have to reboot?



    >
    > [NFS is supposed to be a stateless protocol, you know. Not that I'd
    > tempt fate and kill and restart the nfs daemon with 230 machines
    > connected, but still...]
    >

    I'm not really worried about the 230 linux server, it's the samba user
    that always complaints :-)

    > And no, there is *no* problem in having gobs of NFS daemons - unless
    > you want to use that machine for doing something else (and even then,
    > I'd tend to go ahead but confine them in a cpuset).
    >



  8. Re: nfsd

    Sebastien Charland wrote:

    > I'm not really worried about the 230 linux server, it's the samba user
    > that always complaints :-)


    You can restart the nfsd's without affecting the SMB daemons. Even
    kernel oplocks will be no problem -- that's why they live in the kernel,
    after all.

    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI/Silicon Graphics Brussels




  9. Re: nfsd

    In article , Max Matveev writes:
    |>
    |> Who are 'they'? If you refer to to the NFS Admin Guide, then the guide
    |> is submarvelous and it in process of being updated.

    man nfsd
    .....

    nservers This is the number of NFS server daemons to start. On an
    Origin system, one nfsd server per CPU maximizes efficiency
    and performance. Having too many daemons can be wasteful of
    system resources. The default is the number of CPUs in the
    system or 4, whichever is larger.


    --
    --------- Gordon Lack --------------- gml4410@ggr.co.uk ------------
    This message *may* reflect my personal opinion. It is *not* intended
    to reflect those of my employer, or anyone else.


  10. Re: nfsd

    Max Matveev wrote:
    >>>>>>"Seb" == Sebastien Charland writes:

    >
    >
    > Seb> We're having big perfomance issues with our origin 3000 and I'm
    > Seb> pretty sure that it's because we don't have enough nfsd deamons. I
    > Seb> don't doubt the quality of the SGI's techs, but they're telling us
    > Who are 'they'?


    Jean-Phillipe Lebel (Case #2459325), he's actually quoting some SGI man
    page or something... Not really useful :-)

    > If you refer to to the NFS Admin Guide, then the guide
    > is submarvelous and it in process of being updated. Or if you're
    > talking about real people, could you ask them to contact me or my
    > manager - maybe we need to explain them a few things. Normally, the
    > kind of load you're running would determine the kind of tuning you
    > must do, so you tell to 'them' what kind of load you've got and I can
    > try to give 'them' some advice on how to tune.


    Well, the server is an origin 3200 with 4x400MHz cpus. It's only job is
    samba and nfs file serving. One of the reason we've moved our render
    farm from NT to linux, was to lower the smbd cpu overhead...

    >
    > Seb> not to use more that 1 nfsd per cpu, in our case, that means that
    > That's bad, especially considering stuffed linux read-ahead behaviour.
    >


    While on the subject, I'm currently using rsize=32768 and wsize=32768 in
    nfs3, should I use 8192 instead or is that ok?
    Also, I had to change the mounts to tcp vs udp, because our linux farm
    were causing denial of service on the server! :-)

    > Seb> only 4 nfsd are deserving more than 230 linux machines!
    > That's a bit rich for a single 4P machine, especially if you're
    > trying to use it for something other then just NFS serving.
    >
    >
    > Seb> Can any one tell me if there is a reason for not running more than 1
    > Seb> nfsd deamon per CPU? Would it be that bad if I boost the number of
    > Seb> nfsd to 250 or will that just screw up our server even more?
    >
    > Yes and No. The official SpecSFS numbers run used something like 270
    > nfsd. In general, you need between 64 and 128 nfsds on 4P machine,
    > more if you're experiencing bursts of NFS activity - remember, there
    > is one-to-one ratio between number of nfsd and number of calls
    > processed in parallel. The only things which you should be aware is
    > that 250 nfsd would peg your 4P machine at 100% of CPU.
    >
    > 6.5.22 could help a bit here with introduction of dynamic nfsd, so if
    > upgrading to .22, make sure you remove your hard-coded nfsd
    > number to allow for dynamic stuff to work.
    >
    > Also there are other things which must be tuned for making a NFS
    > server out of general purpose machine. If you can, try looking for
    > "Tunung NFS" presentation from the last developer conference - could
    > help you a bit. But as I've said - learn about your load, tune for it.


    Well, we get tones of get attribute (because of Maya), so we have to put
    actimeo=120 on the client mount. And we also have a lot of big file
    (+300x16MB) transfers. It's not rare to have a single directory with
    more than 10000 entries.

    >
    > Seb> One more thing, is there a way to assure a higher quality of service
    > Seb> for the nfs client than samba's?
    > What exactly do you mean by that?


    Basically, I would like to make sure that my (currently 4) nfsd get more
    cpu time then my (130) smbd deamon. But I guess it becomes irrelevant if
    I boost the number of nfsd to 200, then the ratio of nfsd vs smbd
    processes evens out.

    But then, the question becomes: can I specify that any nfs request
    coming from vlan x should be processed before any request from vlan y?
    Because in some way. I don't care if 224 nfs clients are slow
    (renderfarm on vlan 21 & 23), but I do when it's a user station (vlan 20
    & 22).

    Cheers,

    Seb


  11. Re: nfsd

    Lack Mr G M wrote:
    > In article , Max Matveev writes:
    > |>
    > |> Who are 'they'? If you refer to to the NFS Admin Guide, then the guide
    > |> is submarvelous and it in process of being updated.
    >
    > man nfsd
    > ....
    >
    > nservers This is the number of NFS server daemons to start. On an
    > Origin system, one nfsd server per CPU maximizes efficiency
    > and performance. Having too many daemons can be wasteful of
    > system resources. The default is the number of CPUs in the
    > system or 4, whichever is larger.
    >
    >

    "Default" doesn't mean "optimal".


    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI/Silicon Graphics Brussels




  12. Re: nfsd

    Sebastien Charland wrote:

    > Max Matveev wrote:
    >
    >>>>>>> "Seb" == Sebastien Charland writes:

    >>
    >>
    >>
    >> Seb> We're having big perfomance issues with our origin 3000 and I'm
    >> Seb> pretty sure that it's because we don't have enough nfsd
    >> deamons. I
    >> Seb> don't doubt the quality of the SGI's techs, but they're
    >> telling us
    >> Who are 'they'?

    >
    >
    > Jean-Phillipe Lebel (Case #2459325), he's actually quoting some SGI man
    > page or something... Not really useful :-)
    >
    >> If you refer to to the NFS Admin Guide, then the guide
    >> is submarvelous and it in process of being updated. Or if you're
    >> talking about real people, could you ask them to contact me or my
    >> manager - maybe we need to explain them a few things. Normally, the
    >> kind of load you're running would determine the kind of tuning you
    >> must do, so you tell to 'them' what kind of load you've got and I can
    >> try to give 'them' some advice on how to tune.

    >
    >
    > Well, the server is an origin 3200 with 4x400MHz cpus. It's only job is
    > samba and nfs file serving. One of the reason we've moved our render
    > farm from NT to linux, was to lower the smbd cpu overhead...
    >
    >>
    >> Seb> not to use more that 1 nfsd per cpu, in our case, that means
    >> that
    >> That's bad, especially considering stuffed linux read-ahead behaviour.
    >>

    >
    > While on the subject, I'm currently using rsize=32768 and wsize=32768 in
    > nfs3, should I use 8192 instead or is that ok?


    That is unnecessary, at least for NFS3
    -- systune | grep nfs3_default_xfer
    will tell you why.

    That size is only harmful if your packet drops many 1500 byte MTUs - in
    which case you're better off doing NFS over TCP anyway than tuning
    down the rsize/wsize (or nfs3_default_xfer).

    > Also, I had to change the mounts to tcp vs udp, because our linux farm
    > were causing denial of service on the server! :-)


    That *probably* means you have a switch dropping packets. In which case
    large NFS xfer sizes over UDP end up in need of retransmission (all
    32KB of them!), which then causes more problems etc. I've seen
    transfer speeds of about 155 KB/s on some networks (with the
    network cable maxed out by the ethernet traffic!) because of
    that issue.

    OTOH, on a *good* network, there's less overhead with NFS over UDP.

    > But then, the question becomes: can I specify that any nfs request
    > coming from vlan x should be processed before any request from vlan y?


    Not that I know.

    As for smbd vs. nfsd -- cpuset is your friend: you can make sure that
    all smbds consume only one cpu by creating a cpuset around the original
    parent.

    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI/Silicon Graphics Brussels




  13. Re: nfsd

    In article , Alexis Cousein writes:
    |>
    |> "Default" doesn't mean "optimal".

    No, but that wasn't the relevant part - this was:

    |> > On an
    |> > Origin system, one nfsd server per CPU maximizes efficiency
    |> > and performance. Having too many daemons can be wasteful of
    |> > system resources.

    --
    --------- Gordon Lack --------------- gml4410@ggr.co.uk ------------
    This message *may* reflect my personal opinion. It is *not* intended
    to reflect those of my employer, or anyone else.

  14. Re: nfsd

    >>>>> "gml4410" == Lack Mr G M writes:

    gml4410> In article , Alexis
    gml4410> Cousein writes:
    >>>
    >>> "Default" doesn't mean "optimal".


    gml4410> No, but that wasn't the relevant part - this was:

    >>> > On an
    >>> > Origin system, one nfsd server per CPU maximizes
    >>> > efficiency and performance. Having too many
    >>> > daemons can be wasteful of system resources.


    Both have been changed in .22 with specific references to "wastage".
    ..22 should appear in the shop near your RSN, so pre-order yours now.

    max

  15. Re: nfsd

    >>>>> "Seb" == Sebastien Charland writes:

    Seb> he's actually quoting some SGI man page or something... Not
    Seb> really useful :-)

    Well, I don't mean to cast a shadow on a particular person, especially
    considering that he was quoting from "the bible" but in this case "the
    bible" is wrong. Unfortunately, the black magic of tuning is not
    practiced widely, so an average person must go with the gospel and
    hope it would work.

    Seb> While on the subject, I'm currently using rsize=32768 and wsize=32768
    Seb> in nfs3, should I use 8192 instead or is that ok?

    In general, it should be Ok. With linux clients and their 16
    read-aheads by default it could lead to some waste since you'd read
    64 times more bytes for each read but you should balance that against
    the nature of your read patterns, i.e. if your apps do sequential
    reads is could be a good thing provided client can hold the data in
    the cache.

    Another thing to consider is number of packets dropped - if you have a
    network which is semi-reliable, rasing mount block size would lead to
    more re-transmits.

    Seb> Well, we get tones of get attribute (because of Maya), so we have to
    Seb> put actimeo=120 on the client mount.

    That's wise if you know that there are no contention on a file on a
    server, i.e. there is no updates to the same file comming from two or
    more clients.

    Seb> And we also have a lot of big file (+300x16MB) transfers.

    Reads or writes? In any case, playing with nbufs systune could help.

    Seb> It's not rare to have a single directory with more than 10000 entries.

    Increased DNLC cache size would help here.

    Seb> But then, the question becomes: can I specify that any nfs request
    Seb> coming from vlan x should be processed before any request from vlan y?

    No - the nfsd is very egalitarian: once the requests are in the queue,
    they will be processed in the order of arrival. If you want to place
    with load balancing, you should play in the switch/router.

    You could play with segregation of traffic accross NICs and binding
    NICs to CPUs to approximate some kind of QoS model but that is
    extremely crude and probably would cause more troubles when it's worth.

    max

  16. Re: nfsd

    In article ,
    Sebastien Charland wrote:
    >Alexis Cousein wrote:

    [ ... ]
    >> Very old advice. I'd probably run about 80 on a modern 4 cpu
    >> NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    >> if you have tons of clients.


    >> Why don't you *try* it?
    >>

    >production server; 230 linux renderfarm + 130 windows users depend on
    >it... I can't just do a series of reboot just for the heck of it ;-)
    >
    >That's why I need to know if there's a real issue about the number of
    >nfsd I can safely run. My guts tell me I should put 250, but I'm a linux
    >guy, not an IRIX one :-)


    How about manually starting one more per CPU? If it helps, the help
    will be noticable. I can't see any way one extra process per CPU
    would hurt; try it just before time for most of the users to leave
    for the day. That way, if there's a problem with the extra processes,
    you can kill them with little effect.


    Gary

    --
    Gary Heston gheston@hiwaay.net

    "Is this chicken, what I have, or is this fish? I know it's tuna, but
    it says 'Chicken by the Sea'." Jessica Simpson, on MTV _Newlyweds_

  17. Re: nfsd


    Hi Gary,

    We've learn to always do our changes during the lunch break. This
    way, if hell freeze over, we still have time to fix it before 16h when
    the night shift begins. :-)

    To wrap up, we've decided to up the number of nfsd to 128. We also
    changed max_nfs_clients & max_xnfs_clients to 128. Then we applied the
    latest set of nfs patches for 6.5.19f.

    It's hard to say for sure, since our rendering crunch is finished,
    but the server really seems faster. I guess time will tell us if these
    changes were enough.

    I know that we're looking forward to upgrading to 6.5.22 once it
    comes out...

    I'll post another update just to keep a trace for others :-)

    Cheers,

    Seb



    Gary Heston wrote:
    > In article ,
    > Sebastien Charland wrote:
    >
    >>Alexis Cousein wrote:

    >
    > [ ... ]
    >
    >>>Very old advice. I'd probably run about 80 on a modern 4 cpu
    >>>NASserver1000 (aka Origin 300) that uses mainly NFS. Perhpas more
    >>>if you have tons of clients.

    >
    >
    >>>Why don't you *try* it?
    >>>

    >>
    >>production server; 230 linux renderfarm + 130 windows users depend on
    >>it... I can't just do a series of reboot just for the heck of it ;-)
    >>
    >>That's why I need to know if there's a real issue about the number of
    >>nfsd I can safely run. My guts tell me I should put 250, but I'm a linux
    >>guy, not an IRIX one :-)

    >
    >
    > How about manually starting one more per CPU? If it helps, the help
    > will be noticable. I can't see any way one extra process per CPU
    > would hurt; try it just before time for most of the users to leave
    > for the day. That way, if there's a problem with the extra processes,
    > you can kill them with little effect.
    >
    >
    > Gary
    >



  18. Re: nfsd

    Lack Mr G M wrote:

    > |> > On an
    > |> > Origin system, one nfsd server per CPU maximizes efficiency
    > |> > and performance. Having too many daemons can be wasteful of
    > |> > system resources.
    >

    Well, I would agree that that "default" man page wording isn't "optimal"

    ;{.

    What that sentence *tried* to mean is that if you are *also* running
    real applications on your CPU and just the occasional (and not speed
    critical) NFS activity, you should probably run one server per CPU.



    --
    Alexis Cousein Senior Systems Engineer
    alexis@sgi.com SGI/Silicon Graphics Brussels




  19. Re: nfsd

    Is there a way to add your quote to the SGI's support bible? I'm pretty
    sure we're not the only one using an SGI as a file server and having
    this 1-nfsd-per-cpu rule imposed on. :-)

    Seb

    Alexis Cousein wrote:
    > Lack Mr G M wrote:
    >
    >> |> >
    >> On an
    >> |> > Origin system, one nfsd server per CPU maximizes
    >> efficiency
    >> |> > and performance. Having too many daemons can be
    >> wasteful of
    >> |> > system resources.
    >>

    > Well, I would agree that that "default" man page wording isn't "optimal"
    >
    > ;{.
    >
    > What that sentence *tried* to mean is that if you are *also* running
    > real applications on your CPU and just the occasional (and not speed
    > critical) NFS activity, you should probably run one server per CPU.
    >
    >
    >



+ Reply to Thread