Re: bad NFS/UDP performance - FreeBSD

This is a discussion on Re: bad NFS/UDP performance - FreeBSD ; > On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote: > > Hi, > > There seems to be some serious degradation in performance. > > Under 7.0 I get about 90 MB/s (on write), while, on the ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: Re: bad NFS/UDP performance

  1. Re: bad NFS/UDP performance

    > On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
    > > Hi,
    > > There seems to be some serious degradation in performance.
    > > Under 7.0 I get about 90 MB/s (on write), while, on the same machine
    > > under 7.1 it drops to 20!
    > > Any ideas?

    >
    > 1) Network card driver changes,

    could be, but at least iperf/tcp is ok - can't get udp numbers, do you
    know of any tool to measure udp performance?
    BTW, I also checked on different hardware, and the badness is there.
    >
    > 2) This could be relevant, but rwatson@ will need to help determine
    > that.
    > http://lists.freebsd.org/pipermail/f...er/045109.html


    gut feeling is that it's somewhere else:


    Writing 16 MB file
    BS Count /---- 7.0 ------/ /---- 7.1 -----/
    1*512 32768 0.16s 98.11MB/s 0.43s 37.18MB/s
    2*512 16384 0.17s 92.04MB/s 0.46s 34.79MB/s
    4*512 8192 0.16s 101.88MB/s 0.43s 37.26MB/s
    8*512 4096 0.16s 99.86MB/s 0.44s 36.41MB/s
    16*512 2048 0.16s 100.11MB/s 0.50s 32.03MB/s
    32*512 1024 0.26s 61.71MB/s 0.46s 34.79MB/s
    64*512 512 0.22s 71.45MB/s 0.45s 35.41MB/s
    128*512 256 0.21s 77.84MB/s 0.51s 31.34MB/s
    256*512 128 0.19s 82.47MB/s 0.43s 37.22MB/s
    512*512 64 0.18s 87.77MB/s 0.49s 32.69MB/s
    1024*512 32 0.18s 89.24MB/s 0.47s 34.02MB/s
    2048*512 16 0.17s 91.81MB/s 0.30s 53.41MB/s
    4096*512 8 0.16s 100.56MB/s 0.42s 38.07MB/s
    8192*512 4 0.82s 19.56MB/s 0.80s 19.95MB/s
    16384*512 2 0.82s 19.63MB/s 0.95s 16.80MB/s
    32768*512 1 0.81s 19.69MB/s 0.96s 16.64MB/s

    Average: 75.86 33.00

    the nfs filer is a NetWork Appliance, and is in use, so i get fluctuations in
    the
    measurements, but the relation are similar, good on 7.0, bad on 7.1

    Cheers,
    danny


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  2. Re: bad NFS/UDP performance

    >
    > :> -vfs.nfs.realign_test: 22141777
    > :> +vfs.nfs.realign_test: 498351
    > :>
    > :> -vfs.nfsrv.realign_test: 5005908
    > :> +vfs.nfsrv.realign_test: 0
    > :>
    > :> +vfs.nfsrv.commit_miss: 0
    > :> +vfs.nfsrv.commit_blks: 0
    > :>
    > :> changing them did nothing - or at least with respect to nfs throughput :-)
    > :
    > :I'm not sure what any of these do, as NFS is a bit out of my league.
    > ::-) I'll be following this thread though!
    > :
    > :--
    > :| Jeremy Chadwick jdc at parodius.com |
    >
    > A non-zero nfs_realign_count is bad, it means NFS had to copy the
    > mbuf chain to fix the alignment. nfs_realign_test is just the
    > number of times it checked. So nfs_realign_test is irrelevant.
    > it's nfs_realign_count that matters.
    >

    it's zero, so I guess I'm ok there.
    funny though, on my 'good' machine, vfs.nfsrv.realign_test: 5862999
    and on the slow one, it's 0 - but then again the good one has been up
    for several days.

    > Several things can cause NFS payloads to be improperly aligned.
    > Anything from older network drivers which can't start DMA on a
    > 2-byte boundary, resulting in the 14-byte encapsulation header
    > causing improper alignment of the IP header & payload, to rpc
    > embedded in NFS TCP streams winding up being misaligned.
    >
    > Modern network hardware either support 2-byte-aligned DMA, allowing
    > the encapsulation to be 2-byte aligned so the payload winds up being
    > 4-byte aligned, or support DMA chaining allowing the payload to be
    > placed in its own mbuf, or pad, etc.
    >
    > --
    >
    > One thing I would check is to be sure a couple of nfsiod's are running
    > on the client when doing your tests. If none are running the RPCs wind
    > up being more synchronous and less pipelined. Another thing I would
    > check is IP fragment reassembly statistics (for UDP) - there should be
    > none for TCP connections no matter what the NFS I/O size selected.
    >

    ahh, nfsiod, it seems that it's now dynamicaly started! at least none show
    when host is idle, after i run my tests there are 20! with ppid 0
    need to refresh my NFS knowledge.
    how can I see the IP fragment reassembly statistics?

    > (It does seem more likely to be scheduler-related, though).
    >


    tend to agree, I tried bith ULE/BSD, but the badness is there.

    > -Matt
    >


    thanks,
    danny


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  3. Re: bad NFS/UDP performance

    On Fri, 26 Sep 2008, Danny Braniss wrote:

    > after more testing, it seems it's related to changes made between Aug 4 and
    > Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try
    > and close the gap.


    I think this is the best way forward -- skimming August changes, there are a
    number of candidate commits, including retuning of UDP hashes by mav, my
    rwlock changes, changes to mbuf chain handling, etc.

    Thanks,

    Robert N M Watson
    Computer Laboratory
    University of Cambridge
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...reebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  4. Re: bad NFS/UDP performance

    > > On Fri, 26 Sep 2008, Danny Braniss wrote:
    > >
    > > > after more testing, it seems it's related to changes made between Aug 4 and
    > > > Aug 29 ie, a kernel built on Aug 4 works fine, Aug 29 is slow. I'l now try
    > > > and close the gap.

    > >
    > > I think this is the best way forward -- skimming August changes, there are a
    > > number of candidate commits, including retuning of UDP hashes by mav, my
    > > rwlock changes, changes to mbuf chain handling, etc.

    >
    > it more difficult than I expected.
    > for one, the kernel date was missleading, the actual source update is the key, so
    > the window of changes is now 28/July to 19/August. I have the diffs, but nothing
    > yet seems relevant.
    >
    > on the other hand, I tried NFS/TCP, and there things seem ok, ie the 'good' and the 'bad'
    > give the same throughput, which seem to point to UDP changes ...
    >
    > danny


    Grr, there goes binary search theory out of the window,
    So far I have managed to pinpoint the day that the changes affect the
    throughput:
    18/08/08 00:00:00 19/08/08 00:00:00
    (I assume cvs's date is GMT).
    now would be a good time for some help, specially how to undo changes, my
    knowledge of csup/cvs are close to zero.

    danny


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  5. Re: bad NFS/UDP performance

    > > it more difficult than I expected.
    > > for one, the kernel date was missleading, the actual source update is the key, so
    > > the window of changes is now 28/July to 19/August. I have the diffs, but nothing
    > > yet seems relevant.
    > >
    > > on the other hand, I tried NFS/TCP, and there things seem ok, ie the 'good' and the 'bad'
    > > give the same throughput, which seem to point to UDP changes ...

    >
    > Can you post the network-numbers?

    [again :-]
    > > Writing 16 MB file
    > > BS Count /---- 7.0 ------/ /---- 7.1 -----/


    should now read:
    /---- Aug 18 ------/ /--- Aug 19 ----/
    > > 1*512 32768 0.16s 98.11MB/s 0.43s 37.18MB/s
    > > 2*512 16384 0.17s 92.04MB/s 0.46s 34.79MB/s
    > > 4*512 8192 0.16s 101.88MB/s 0.43s 37.26MB/s
    > > 8*512 4096 0.16s 99.86MB/s 0.44s 36.41MB/s
    > > 16*512 2048 0.16s 100.11MB/s 0.50s 32.03MB/s
    > > 32*512 1024 0.26s 61.71MB/s 0.46s 34.79MB/s
    > > 64*512 512 0.22s 71.45MB/s 0.45s 35.41MB/s
    > > 128*512 256 0.21s 77.84MB/s 0.51s 31.34MB/s
    > > 256*512 128 0.19s 82.47MB/s 0.43s 37.22MB/s
    > > 512*512 64 0.18s 87.77MB/s 0.49s 32.69MB/s
    > > 1024*512 32 0.18s 89.24MB/s 0.47s 34.02MB/s
    > > 2048*512 16 0.17s 91.81MB/s 0.30s 53.41MB/s
    > > 4096*512 8 0.16s 100.56MB/s 0.42s 38.07MB/s
    > > 8192*512 4 0.82s 19.56MB/s 0.80s 19.95MB/s
    > > 16384*512 2 0.82s 19.63MB/s 0.95s 16.80MB/s
    > > 32768*512 1 0.81s 19.69MB/s 0.96s 16.64MB/s
    > >
    > > Average: 75.86 33.00



    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  6. Re: bad NFS/UDP performance


    On Fri, 3 Oct 2008, Danny Braniss wrote:

    >>> it more difficult than I expected.
    >>> for one, the kernel date was missleading, the actual source update is the key, so
    >>> the window of changes is now 28/July to 19/August. I have the diffs, but nothing
    >>> yet seems relevant.
    >>>
    >>> on the other hand, I tried NFS/TCP, and there things seem ok, ie the
    >>> 'good' and the 'bad' give the same throughput, which seem to point to UDP
    >>> changes ...

    >>
    >> Can you post the network-numbers?

    > so I ran some more test, these are for writes IO:


    OK, so it looks like this was almost certainly the rwlock change. What
    happens if you pretty much universally substitute the following in
    udp_usrreq.c:

    Currently Change to
    --------- ---------
    INP_RLOCK INP_WLOCK
    INP_RUNLOCK INP_WUNLOCK
    INP_RLOCK_ASSERT INP_WLOCK_ASSERT

    Robert N M Watson
    Computer Laboratory
    University of Cambridge

    >
    > server is a NetApp:
    >
    > kernel from 18/08/08 00:00:0 :
    > /----- UDP ----//---- TCP -------/
    > 1*512 38528 0.19s 83.50MB 0.20s 80.82MB/s
    > 2*512 19264 0.21s 76.83MB 0.21s 77.57MB/s
    > 4*512 9632 0.19s 85.51MB 0.22s 73.13MB/s
    > 8*512 4816 0.19s 83.76MB 0.21s 75.84MB/s
    > 16*512 2408 0.19s 83.99MB 0.21s 77.18MB/s
    > 32*512 1204 0.19s 84.45MB 0.22s 71.79MB/s
    > 64*512 602 0.20s 79.98MB 0.20s 78.44MB/s
    > 128*512 301 0.18s 86.51MB 0.22s 71.53MB/s
    > 256*512 150 0.19s 82.83MB 0.20s 78.86MB/s
    > 512*512 75 0.19s 82.77MB 0.21s 76.39MB/s
    > 1024*512 37 0.19s 85.62MB 0.21s 76.64MB/s
    > 2048*512 18 0.21s 77.72MB 0.20s 80.30MB/s
    > 4096*512 9 0.26s 61.06MB 0.30s 53.79MB/s
    > 8192*512 4 0.83s 19.20MB 0.41s 39.12MB/s
    > 16384*512 2 0.84s 19.01MB 0.41s 39.03MB/s
    > 32768*512 1 0.82s 19.59MB 0.39s 40.89MB/s
    >
    > kernel from 19/08/08 00:00:00:
    > 1*512 38528 0.45s 35.59MB 0.20s 81.43MB/s
    > 2*512 19264 0.45s 35.56MB 0.20s 79.24MB/s
    > 4*512 9632 0.49s 32.66MB 0.22s 73.72MB/s
    > 8*512 4816 0.47s 34.06MB 0.21s 75.52MB/s
    > 16*512 2408 0.53s 30.16MB 0.22s 72.58MB/s
    > 32*512 1204 0.31s 51.68MB 0.40s 40.14MB/s
    > 64*512 602 0.43s 37.23MB 0.25s 63.57MB/s
    > 128*512 301 0.51s 31.39MB 0.26s 62.70MB/s
    > 256*512 150 0.47s 34.02MB 0.23s 69.06MB/s
    > 512*512 75 0.47s 34.01MB 0.23s 70.52MB/s
    > 1024*512 37 0.53s 30.12MB 0.22s 73.01MB/s
    > 2048*512 18 0.55s 29.07MB 0.23s 70.64MB/s
    > 4096*512 9 0.46s 34.69MB 0.21s 75.92MB/s
    > 8192*512 4 0.81s 19.66MB 0.43s 36.89MB/s
    > 16384*512 2 0.80s 19.99MB 0.40s 40.29MB/s
    > 32768*512 1 1.11s 14.41MB 0.38s 42.56MB/s
    >
    >
    >
    >
    >

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  7. Re: bad NFS/UDP performance


    On Fri, 3 Oct 2008, Danny Braniss wrote:

    > gladly, but have no idea how to do LOCK_PROFILING, so some pointers would be
    > helpfull.


    The LOCK_PROFILING(9) man page isn't a bad starting point -- I find that the
    defaults work fine most of the time, so just use them. Turn the enable syscl
    on just before you begin a run, and turn it off immediately afterwards. Make
    sure to reset between reruns (rebooting to a new kernel is fine too!).

    > as a side note, many years ago I checked out NFS/TCP and it was really bad,
    > I even remember NetApp telling us to drop TCP, but now, things look rather
    > better. Wonder what caused it.


    Well, the virtues of TCP become more apparent with higher network speeds, as
    the logic to fill pipes using TCP, manage flow control, etc, is a lot more
    sophisticated than what's in the RPC code for using UDP. The downsides to UDP
    are also becoming more apparent: as network speeds go up, fragmented UDP risks
    IP ID collisions which could lead to data corruption, or at the very least,
    dropped packets. We have changed the default for NFSv3 mounts to TCP in 8.x,
    and talked about doing it for 7.1; unfortunately the timing wasn't quite
    right, so it most likely will appear in 7.2.

    Robert N M Watson
    Computer Laboratory
    University of Cambridge
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  8. Re: bad NFS/UDP performance

    >
    > On Sat, 4 Oct 2008, Danny Braniss wrote:
    >
    > > at the moment, the best I can do is run it on a different hardware that has
    > > if_em, the results are in
    > > ftp://ftp.cs.huji.ac.il/users/danny/...of/7.1-1000.em the
    > > benchmark ran better with the Intel NIC, averaged UDP 54MB/s, TCP 53MB/s (I
    > > get the same numbers with an older kernel).

    >
    > Dear Danny:
    >
    > Unfortunately, I was left slightly unclear on the comparison you are making
    > above. Could you confirm whether or not, with if_em, you see a performance
    > regression using UDP NFS between 7.0-RELEASE and the most recent 7.1-STABLE,
    > and if you do, whether or not the RLOCK->WLOCK change has any effect on
    > performance? It would be nice to know on the same hardware but at least with
    > different hardware we get a sense of whether or not this might affect other
    > systems or whether it's limited to a narrower set of configurations.
    >
    > Thanks,


    7.1-1000.em vanilla 7.1 1 x Intel Core Duo
    7.1-1000.x2200.em vanilla 7.1 2 x Dual-Core AMD Opteron
    7.0-1000.x2200.em 7.0 + RLOCK->WLOCK

    the plot thickens.
    I put an em card in, and the throughput is almost the same than with the bge.

    all the tests were done on the same host, a Sun x2200/amd/2cpux2core
    except for the one over the weekend that is a intel Core Duo, and not the same
    if_em card, sorry about that but one has PCI X, the other PCI Express :-(.

    what is becoming obvious is that NFS/UDP is very temperamental/sensitive :-)

    danny


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


+ Reply to Thread