Major SMP problems with lstat/namei - FreeBSD

This is a discussion on Major SMP problems with lstat/namei - FreeBSD ; We have encountered some serious SMP performance/scalability problems that we've tracked back to lstat/namei calls. I've written a quick benchmark with a pair of tests to simplify/measure the problem. Both tests use a tree of directories: the top level directory ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Major SMP problems with lstat/namei

  1. Major SMP problems with lstat/namei


    We have encountered some serious SMP performance/scalability problems
    that we've tracked back to lstat/namei calls. I've written a quick
    benchmark with a pair of tests to simplify/measure the problem. Both
    tests use a tree of directories: the top level directory contains five
    subdirectories a, b, c, d, and e. Each subdirectory contains five
    subdirectories a, b, c, d, and e, and so on.. 1 directory at level
    one, 5 at level two, 25 at level three, 125 at level four, 625 at
    level five, and 3125 at level six.

    In the "realpath" test, a random path is constructed at the bottom of
    the tree (e.g. /tmp/lstat/a/b/c/d/e) and realpath() is called on that,
    provoking lstat() calls on the whole tree. This is to simulate a mix
    of high-contention and low-contention lstat() calls.

    In the "lstat" test, lstat is called directly on a path at the bottom
    of the tree. Since there are 3125 files, this simulates relatively
    low-contention lstat() calls.

    In both cases, the test repeats as many times as possible for 60
    seconds. Each test is run simultaneously by multiple processes, with
    progressively doubling concurrency from 1 to 512.

    What I found was that everything is fine at concurrency 2, probably
    indicating that the benchmark pegged on some other resource limit. At
    concurrency 4, realpath drops to 31.8% of concurrency 1. At
    concurrency 8, performance is down to 18.3%. In the interim, CPU load
    goes to 80-90% system CPU. I've confirmed via ktrace and the rusage
    that the CPU usage is all system time, and that lstat() is the *only*
    system call in the test (realpath() is called with an absolute path).

    I then reran the 32-process test on 1-7 cores, and found that
    performance peaks at 2 cores and drops sharply from there. eight
    cores runs *fifteen* times slower than two cores.

    The test full results are at the bottom of this message.

    This is on 6.3-RELEASE-p4 with vfs.lookup_shared=1.

    I believe this is the same issue that was previously discussed as "2 x
    quad-core system is slower that 2 x dual core on FreeBSD" archived here:

    http://lists.freebsd.org/pipermail/f...er/038441.html

    In that post, Kris Kennaway wrote:
    > It is hard to say for certain without a direct profile comparison

    of the
    > workload, but it is probably due to lockmgr contention. lockmgr is

    used
    > for various locking operations to do with VFS data structures. It is
    > known to have poor performance and scale very badly."


    At this point, what I've got is one of those synthetic benchmarks, but
    it matches our production problems exactly, except that the production
    processes need a whole lot more RAM and eventually when this
    manifests, they backlog and the server death spirals through swap,
    which is a most unfortunate difference.

    I've chased my way up the kernel source to kern_lstat(), where a
    shared lock is obtained, and then onto namei, where vfs.lookup_shared
    comes into play. But unfortunately, I don't understand lockmgr, I
    don't know how the macros and flags I see here relate to it, I can't
    figure out what happened to the changes that Attilio Rao was working
    on, and there didn't seem to be much other hope at the time.

    This is becoming a huge problem for us. Is there anything that at all
    can be done, or any news? In the case linked above, improvement was
    made by changing a PHP setting that isn't applicable in our case.

    Thanks,
    Jeff

    Concurrency 1

    realpath
    Total = 1409069 (100%)
    Total/Sec = 23484
    Total/Sec/Worker = 23484

    lstat
    Total = 6828763 (100%)
    Total/Sec = 113812
    Total/Sec/Worker = 113812

    Concurrency 2

    realpath
    Total = 1450489 (100%)
    Total/Sec = 24174
    Total/Sec/Worker = 12087

    lstat
    Total = 6891417 (100.9%)
    Total/Sec = 114856
    Total/Sec/Worker = 57428


    Concurrency 4

    realpath
    Total = 448693 (31.8%)
    Total/Sec = 7478
    Total/Sec/Worker = 1869

    lstat
    Total = 3047933 (44.6%)
    Total/Sec = 50798
    Total/Sec/Worker = 12699

    Concurrency 8

    realpath
    Total = 258281 (18.3%)
    Total/Sec = 4304
    Total/Sec/Worker = 538

    lstat
    Total = 1688728 (24.7%)
    Total/Sec = 28145
    Total/Sec/Worker = 3518

    Concurrency 16

    realpath
    Total = 179150 (12.7%)
    Total/Sec = 2985
    Total/Sec/Worker = 186

    lstat
    Total = 966558 (14.1%)
    Total/Sec = 16109
    Total/Sec/Worker = 1006

    Concurrency 32

    realpath
    Total = 116982 (8.3%)
    Total/Sec = 1949
    Total/Sec/Worker = 60

    lstat
    Total = 644703 (9.4%)
    Total/Sec = 10745
    Total/Sec/Worker = 335

    Concurrency 64

    realpath
    Total = 112050 (7.9%)
    Total/Sec = 1867
    Total/Sec/Worker = 29

    lstat
    Total = 572798 (8.3%)
    Total/Sec = 9546
    Total/Sec/Worker = 149


    Concurrency 128

    realpath
    Total = 111544 (7.9%)
    Total/Sec = 1859
    Total/Sec/Worker = 14

    lstat
    Total = 570800 (8.3%)
    Total/Sec = 9513
    Total/Sec/Worker = 74


    Concurrency 256

    realpath
    Total = 96461 (6.8%)
    Total/Sec = 1607
    Total/Sec/Worker = 6

    lstat
    Total = 580679 (8.5%)
    Total/Sec = 9677
    Total/Sec/Worker = 37


    Concurrency 512

    realpath
    Total = 91224 (6.4%)
    Total/Sec = 1520
    Total/Sec/Worker = 2

    lstat
    Total = 498342 (7.2%)
    Total/Sec = 8305
    Total/Sec/Worker = 16

    realpath Concurrency 32 - 1 Core

    Total = 1289527
    Total/Sec = 21492
    Total/Sec/Worker = 671

    realpath Concurrency 32 - 2 Core

    Total = 1753625
    Total/Sec = 29227
    Total/Sec/Worker = 913

    realpath Concurrency 32 - 3 Core

    Total = 1197896
    Total/Sec = 19964
    Total/Sec/Worker = 623

    realpath Concurrency 32 - 4 Core

    Total = 631293
    Total/Sec = 10521
    Total/Sec/Worker = 328

    realpath Concurrency 32 - 5 Core

    Total = 227814
    Total/Sec = 3796
    Total/Sec/Worker = 118

    realpath Concurrency 32 - 6 Core

    Total = 153550
    Total/Sec = 2559
    Total/Sec/Worker = 79

    realpath Concurrency 32 - 7 Core

    Total = 136013
    Total/Sec = 2266
    Total/Sec/Worker = 70


    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...reebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


  2. Re: Major SMP problems with lstat/namei

    Ivan Voras wrote:

    > There is nothing that can be done within the 6.x branch. 7.x contains
    > many improvements but I think only 8.x will directly change the lockmgr
    > and the namei cache. The best things you can try right now is to use
    > 7-STABLE (or soon to be released 7.1; you might need tuning with
    > 7.0-RELEASE) or try 8-CURRENT (it's quite stable).


    I remembered two more things:

    * The problematic load can also be generated with benchmarks/blogbench
    * I don't have the numbers here but I think I remember that ZFS had
    noticably larger score than UFS in this workload. Of course, ZFS has
    other problems.



    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iD8DBQFI2hUlldnAQVacBcgRAsR2AKCXVfj2YxaX0t2y9UOv+n Ex0z5PWACgnDE+
    rzauJyAc70nIG5r9c2iA3yw=
    =cmxt
    -----END PGP SIGNATURE-----


  3. Re: Major SMP problems with lstat/namei

    On Wednesday 24 September 2008 01:47:32 pm Jeff Wheelhouse wrote:
    >
    > On Sep 24, 2008, at 12:12 PM, John Baldwin wrote:
    > > Shared lookups only work on the NFS client in 6.x. I'm about to
    > > turn them on
    > > for UFS in HEAD (8.x) and will backport the needed fixes to 7.x
    > > after 7.1
    > > (too risky to merge to 7.x this close to a release).

    >
    > Testers available, when you get to that. :-)
    >
    > > So lookup_shared=1
    > > isn't going to really help on 6.x unless you are doing it all over
    > > NFS. You
    > > also want to backport my fix to cache_enter() before using
    > > lookup_shared at
    > > all:

    >
    > Since it sounds like 6.x is a dead end, we'll focus on 7.x, provided
    > we can get it to be stable for us.


    Yes.

    > Having never used svn, I do need to figure out how to pull the
    > specific patches you referenced, but I'm sure that's not an
    > unclimbable mountain. :-)


    You can still use cvs to pull the revisions. All those e-mail msg's have the
    CVS revisions in them, too.

    --
    John Baldwin
    _______________________________________________
    freebsd-hackers@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...reebsd-hackers
    To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"


+ Reply to Thread