[tbench regression fixes]: digging out smelly deadmen. - Kernel

This is a discussion on [tbench regression fixes]: digging out smelly deadmen. - Kernel ; On Sat, Oct 25, 2008 at 12:25:34AM +0200, Rafael J. Wysocki (rjw@sisk.pl) wrote: > > > > > vanilla 27: 347.222 > > > > > no TSO/GSO: 357.331 > > > > > no hrticks: 382.983 > > > ...

+ Reply to Thread
Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast
Results 41 to 60 of 92

Thread: [tbench regression fixes]: digging out smelly deadmen.

  1. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Sat, Oct 25, 2008 at 12:25:34AM +0200, Rafael J. Wysocki (rjw@sisk.pl) wrote:
    > > > > > vanilla 27: 347.222
    > > > > > no TSO/GSO: 357.331
    > > > > > no hrticks: 382.983
    > > > > > no balance: 389.802

    >
    > Can anyone please tell me if there was any conclusion of this thread?


    For the reference, just pulled git tree (4403b4 commit): 361.184
    and with dirty_ratio set to 50: 361.086
    without scheduler domain tuning things are essentially the same: 361.367

    So, things are getting worse with time, and previous tunes do not help
    anymore.

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Sun, 26 Oct 2008, Andrew Morton wrote:

    > > > > 208.4 MB/sec -- vanilla 2.6.16.60
    > > > > 201.6 MB/sec -- vanilla 2.6.20.1
    > > > > 172.9 MB/sec -- vanilla 2.6.22.19
    > > > > 74.2 MB/sec -- vanilla 2.6.23
    > > > > 46.1 MB/sec -- vanilla 2.6.24.2
    > > > > 30.6 MB/sec -- vanilla 2.6.26.1
    > > > > I.e. huge drop for 2.6.23 (this was with default configs for each
    > > > > respective kernel).

    > Was this when we decreased the default value of
    > /proc/sys/vm/dirty_ratio, perhaps? dbench is sensitive to that.


    2.6.28 gives 41.8 MB/s with /proc/sys/vm/dirty_ratio == 50. So small
    improvement, but still far far away from the throughput of pre-2.6.23
    kernels.

    --
    Jiri Kosina
    SUSE Labs
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [tbench regression fixes]: digging out smelly deadmen.

    From: Evgeniy Polyakov
    Date: Sun, 26 Oct 2008 13:05:55 +0300

    > I'm not surprised there were no changes when I reported hrtimers to be
    > the main guilty factor in my setup for dbench tests, and only when David
    > showed that they also killed his sparks via wake_up(), something was
    > done. Now this regression even dissapeared from the list.
    > Good direction, we should always follow this.


    Yes, this situation was in my opinion a complete ****ing joke. Someone
    like me shouldn't have to do all of the hard work for the scheduler
    folks in order for a bug like this to get seriously looked at.

    Evgeniy's difficult work was effectively ignored except by other
    testers who could also see and reproduce the problem.

    No scheduler developer looked seriously into these reports other than
    to say "please try to reproduce with tip" (?!?!?!) I guess showing
    the developer the exact changeset(s) which add the regression isn't
    enough these days :-/

    Did any scheduler developer try to run tbench ONCE and do even a tiny
    bit of analysis, like the kind I did? Answer honestly... Linus even
    asked you guys in the private thread to "please look into it". So, if
    none of you did, you should all be deeply ashamed of yourselves.

    People like me shouldn't have to do all of that work for you just to
    get something to happen.

    Not until I went privately to Ingo and Linus with cycle counts and a
    full disagnosis (of every single release since 2.6.22, a whole 2 days
    of work for me) of the precise code eating up too many cycles and
    causing problems DID ANYTHING HAPPEN.

    This is extremely and excruciatingly DISAPPOINTING and WRONG.

    We completely and absolutely suck if this is how we will handle any
    performance regression report.

    And although this case is specific to the scheduler, a lot of
    other areas handle well prepared bug reports similarly. So I'm not
    really picking on the scheduler folks, they just happen to be the
    current example :-)

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Sun, 2008-10-26 at 20:03 +0100, Jiri Kosina wrote:
    > On Sun, 26 Oct 2008, Andrew Morton wrote:
    >
    > > > > > 208.4 MB/sec -- vanilla 2.6.16.60
    > > > > > 201.6 MB/sec -- vanilla 2.6.20.1
    > > > > > 172.9 MB/sec -- vanilla 2.6.22.19
    > > > > > 74.2 MB/sec -- vanilla 2.6.23
    > > > > > 46.1 MB/sec -- vanilla 2.6.24.2
    > > > > > 30.6 MB/sec -- vanilla 2.6.26.1
    > > > > > I.e. huge drop for 2.6.23 (this was with default configs for each
    > > > > > respective kernel).

    > > Was this when we decreased the default value of
    > > /proc/sys/vm/dirty_ratio, perhaps? dbench is sensitive to that.

    >
    > 2.6.28 gives 41.8 MB/s with /proc/sys/vm/dirty_ratio == 50. So small
    > improvement, but still far far away from the throughput of pre-2.6.23
    > kernels.


    How many clients?

    dbench 160 -t 60

    2.6.28-smp (git.today)
    Throughput 331.718 MB/sec 160 procs (no logjam)
    Throughput 309.85 MB/sec 160 procs (contains logjam)
    Throughput 392.746 MB/sec 160 procs (contains logjam)

    -Mike

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [tbench regression fixes]: digging out smelly deadmen.


    * David Miller wrote:

    > From: Evgeniy Polyakov
    > Date: Sun, 26 Oct 2008 13:05:55 +0300
    >
    > > I'm not surprised there were no changes when I reported hrtimers to be
    > > the main guilty factor in my setup for dbench tests, and only when David
    > > showed that they also killed his sparks via wake_up(), something was
    > > done. Now this regression even dissapeared from the list.
    > > Good direction, we should always follow this.

    >
    > Yes, this situation was in my opinion a complete ****ing joke.
    > Someone like me shouldn't have to do all of the hard work for the
    > scheduler folks in order for a bug like this to get seriously looked
    > at.


    yeah, that overhead was bad, and once it became clear that you had
    high-resolution timers enabled for your benchmaking runs (which is
    default-off and which is still rare for benchmarking runs - despite
    being a popular end-user feature) we immediately disabled the hrtick via
    this upstream commit:

    0c4b83d: sched: disable the hrtick for now

    that commit is included in v2.6.28-rc1 so this particular issue should
    be resolved.

    high-resolution timers are still default-disabled in the upstream
    kernel, so this never affected usual configs that folks keep
    benchmarking - it only affected those who decided they want higher
    resolution timers and more precise scheduling.

    Anyway, the sched-hrtick is off now, and we wont turn it back on without
    making sure that it's really low cost in the hotpath.

    Regarding tbench, a workload that context-switches in excess of 100,000
    per second is inevitably going to show scheduler overhead - so you'll
    get the best numbers if you eliminate all/most scheduler code from the
    hotpath. We are working on various patches to mitigate the cost some
    more - and your patches and feedback is welcome as well.

    But it's a difficult call with no silver bullets. On one hand we have
    folks putting more and more stuff into the context-switching hotpath on
    the (mostly valid) point that the scheduler is a slowpath compared to
    most other things. On the other hand we've got folks doing
    high-context-switch ratio benchmarks and complaining about the overhead
    whenever something goes in that improves the quality of scheduling of a
    workload that does not context-switch as massively as tbench. It's a
    difficult balance and we cannot satisfy both camps.

    Nevertheless, this is not a valid argument in favor of the hrtick
    overhead: that was clearly excessive overhead and we zapped it.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [tbench regression fixes]: digging out smelly deadmen.

    From: Ingo Molnar
    Date: Mon, 27 Oct 2008 10:30:35 +0100

    > But it's a difficult call with no silver bullets. On one hand we have
    > folks putting more and more stuff into the context-switching hotpath on
    > the (mostly valid) point that the scheduler is a slowpath compared to
    > most other things.


    This I heavily disagree with. The scheduler should be so cheap
    that you cannot possibly notice that it is even there for a benchmark
    like tbench.

    If we now think it's ok that picking which task to run is more
    expensive than writing 64 bytes over a TCP socket and then blocking on
    a read, I'd like to stop using Linux. :-) That's "real work" and if
    the scheduler is more expensive than "real work" we lose.

    I do want to remind you of a thread you participated in, in April,
    where you complained about loopback TCP performance:

    http://marc.info/?l=linux-netdev&m=120696343707674&w=2

    It might be fruitful for you to rerun your tests with CFS reverted
    (start with 2.6.22 and progressively run your benchmark on every
    release), you know, just for fun :-)

    > On the other hand we've got folks doing high-context-switch ratio
    > benchmarks and complaining about the overhead whenever something
    > goes in that improves the quality of scheduling of a workload that
    > does not context-switch as massively as tbench. It's a difficult
    > balance and we cannot satisfy both camps.


    We've always been proud of our scheduling overhead being extremely
    low, and you have to face the simple fact that starting in 2.6.23 it's
    been getting progressively more and more expensive.

    Consistently so.

    People even noticed it.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Sun, 26 Oct 2008, Jiri Kosina wrote:

    > > > > > 208.4 MB/sec -- vanilla 2.6.16.60
    > > > > > 201.6 MB/sec -- vanilla 2.6.20.1
    > > > > > 172.9 MB/sec -- vanilla 2.6.22.19
    > > > > > 74.2 MB/sec -- vanilla 2.6.23
    > > > > > 46.1 MB/sec -- vanilla 2.6.24.2
    > > > > > 30.6 MB/sec -- vanilla 2.6.26.1
    > > > > > I.e. huge drop for 2.6.23 (this was with default configs for each
    > > > > > respective kernel).

    > > Was this when we decreased the default value of
    > > /proc/sys/vm/dirty_ratio, perhaps? dbench is sensitive to that.

    > 2.6.28 gives 41.8 MB/s with /proc/sys/vm/dirty_ratio == 50. So small
    > improvement, but still far far away from the throughput of pre-2.6.23
    > kernels.


    Ok, so another important datapoint:

    with c1e4fe711a4 (just before CFS has been merged for 2.6.23), the dbench
    throughput measures

    187.7 MB/s

    in our testing conditions (default config).

    With c31f2e8a42c4 (just after CFS has been merged for 2.6.23), the
    throughput measured by dbench is

    82.3 MB/s

    This is the huge drop we have been looking for. After this, the
    performance was still going down gradually, up to ~45 MS/ we are measuring
    for 2.6.27. But the biggest drop (more than 50%) points directly to CFS
    merge.

    --
    Jiri Kosina
    SUSE Labs
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [tbench regression fixes]: digging out smelly deadmen.


    * Jiri Kosina wrote:

    > Ok, so another important datapoint:
    >
    > with c1e4fe711a4 (just before CFS has been merged for 2.6.23), the dbench
    > throughput measures
    >
    > 187.7 MB/s
    >
    > in our testing conditions (default config).
    >
    > With c31f2e8a42c4 (just after CFS has been merged for 2.6.23), the
    > throughput measured by dbench is
    >
    > 82.3 MB/s
    >
    > This is the huge drop we have been looking for. After this, the
    > performance was still going down gradually, up to ~45 MS/ we are
    > measuring for 2.6.27. But the biggest drop (more than 50%) points
    > directly to CFS merge.


    that is a well-known property of dbench: it rewards unfairness in IO,
    memory management and scheduling.

    The way to get the best possible dbench numbers in CPU-bound dbench
    runs, you have to throw away the scheduler completely, and do this
    instead:

    - first execute all requests of client 1
    - then execute all requests of client 2
    ....
    - execute all requests of client N

    the moment the clients are allowed to overlap, the moment their requests
    are executed more fairly, the dbench numbers drop.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [tbench regression fixes]: digging out smelly deadmen.

    > The way to get the best possible dbench numbers in CPU-bound dbench
    > runs, you have to throw away the scheduler completely, and do this
    > instead:
    >
    > - first execute all requests of client 1
    > - then execute all requests of client 2
    > ....
    > - execute all requests of client N


    Rubbish. If you do that you'll not get enough I/O in parallel to schedule
    the disk well (not that most of our I/O schedulers are doing the job
    well, and the vm writeback threads then mess it up and the lack of Arjans
    ioprio fixes then totally screw you)

    > the moment the clients are allowed to overlap, the moment their requests
    > are executed more fairly, the dbench numbers drop.


    Fairness isn't everything. Dbench is a fairly good tool for studying some
    real world workloads. If your fairness hurts throughput that much maybe
    your scheduler algorithm is just plain *wrong* as it isn't adapting to
    workload at all well.

    Alan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 2008-10-27 at 11:33 +0000, Alan Cox wrote:
    > > The way to get the best possible dbench numbers in CPU-bound dbench
    > > runs, you have to throw away the scheduler completely, and do this
    > > instead:
    > >
    > > - first execute all requests of client 1
    > > - then execute all requests of client 2
    > > ....
    > > - execute all requests of client N

    >
    > Rubbish. If you do that you'll not get enough I/O in parallel to schedule
    > the disk well (not that most of our I/O schedulers are doing the job
    > well, and the vm writeback threads then mess it up and the lack of Arjans
    > ioprio fixes then totally screw you)
    >
    > > the moment the clients are allowed to overlap, the moment their requests
    > > are executed more fairly, the dbench numbers drop.

    >
    > Fairness isn't everything. Dbench is a fairly good tool for studying some
    > real world workloads. If your fairness hurts throughput that much maybe
    > your scheduler algorithm is just plain *wrong* as it isn't adapting to
    > workload at all well.


    Doesn't seem to be scheduler/fairness. 2.6.22.19 is O(1), and falls
    apart too, I posted the numbers and full dbench output yesterday.

    -Mike

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 27 Oct 2008, Mike Galbraith wrote:

    > > real world workloads. If your fairness hurts throughput that much maybe
    > > your scheduler algorithm is just plain *wrong* as it isn't adapting to
    > > workload at all well.

    > Doesn't seem to be scheduler/fairness. 2.6.22.19 is O(1), and falls
    > apart too, I posted the numbers and full dbench output yesterday.


    We'll need to look into this a little bit more I think. I have sent out
    some numbers too, and these indicate very clearly that there is more than
    50% performance drop (measured by dbench) just after the very merge of CFS
    in 2.6.23-rc1 merge window.

    --
    Jiri Kosina
    SUSE Labs
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 2008-10-27 at 14:42 +0100, Jiri Kosina wrote:
    > On Mon, 27 Oct 2008, Mike Galbraith wrote:
    >
    > > > real world workloads. If your fairness hurts throughput that much maybe
    > > > your scheduler algorithm is just plain *wrong* as it isn't adapting to
    > > > workload at all well.

    > > Doesn't seem to be scheduler/fairness. 2.6.22.19 is O(1), and falls
    > > apart too, I posted the numbers and full dbench output yesterday.

    >
    > We'll need to look into this a little bit more I think. I have sent out
    > some numbers too, and these indicate very clearly that there is more than
    > 50% performance drop (measured by dbench) just after the very merge of CFS
    > in 2.6.23-rc1 merge window.


    Sure. Watching the per/sec output, every kernel I have sucks at high
    client count dbench, it's just a matter of how badly, and how long.

    BTW, the nice pretty 160 client numbers I posted yesterday for ext2
    turned out to be because somebody adds _netdev mount option when I mount
    -a in order to mount my freshly hotplugged external drive (why? that
    ain't in my fstab). Without that switch, ext2 output is roughly as
    raggedy as ext3, and nowhere near the up to 1.4GB/sec I can get with
    dirty_ratio=50 + ext2 + (buy none, get one free) _netdev option. Free
    for the not asking option does nada for ext3.

    -Mike

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [tbench regression fixes]: digging out smelly deadmen.

    Mike Galbraith wrote:
    > That's exactly what I've been trying to look into, but combined with
    > netperf. The thing is an incredibly twisted maze of _this_ affects
    > _that_... sometimes involving magic and/or mythical creatures.


    I cannot guarantee it will help, but the global -T option to pin netperf
    or netserver to a specific CPU might help cut-down the variables.

    FWIW netperf top of trunk omni tests can now also determine and report
    the state of SELinux. They also have code to accept or generate their
    own RFC4122-esque UUID. Define some connical tests and then ever closer
    to just needing some database-fu and automagic testing I suppose...
    things I do not presently posess but am curious enough to follow some
    pointers.

    happy benchmarking,

    rick jones
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [tbench regression fixes]: digging out smelly deadmen.


    * Alan Cox wrote:

    > > The way to get the best possible dbench numbers in CPU-bound dbench
    > > runs, you have to throw away the scheduler completely, and do this
    > > instead:
    > >
    > > - first execute all requests of client 1
    > > - then execute all requests of client 2
    > > ....
    > > - execute all requests of client N

    >
    > Rubbish. [...]


    i've actually implemented that about a decade ago: i've tracked down
    what makes dbench tick, i've implemented the kernel heuristics for it
    to make dbench scale linearly with the number of clients - just to be
    shot down by Linus about my utter rubbish approach ;-)

    > [...] If you do that you'll not get enough I/O in parallel to
    > schedule the disk well (not that most of our I/O schedulers are
    > doing the job well, and the vm writeback threads then mess it up and
    > the lack of Arjans ioprio fixes then totally screw you)


    the best dbench results come from systems that have enough RAM to
    cache the full working set, and a filesystem intelligent enough to not
    insert bogus IO serialization cycles (ext3 is not such a filesystem).

    The moment there's real IO it becomes harder to analyze but the same
    basic behavior remains: the more unfair the IO scheduler, the "better"
    dbench results we get.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 2008-10-27 at 10:26 -0700, Rick Jones wrote:
    > Mike Galbraith wrote:
    > > That's exactly what I've been trying to look into, but combined with
    > > netperf. The thing is an incredibly twisted maze of _this_ affects
    > > _that_... sometimes involving magic and/or mythical creatures.

    >
    > I cannot guarantee it will help, but the global -T option to pin netperf
    > or netserver to a specific CPU might help cut-down the variables.


    Yup, and how. Early on, the other variables drove me bat-**** frigging
    _nuts_. I eventually selected a UP config to test _because_ those other
    variables combined with SMP overhead and config options drove crazy ;-)

    > FWIW netperf top of trunk omni tests can now also determine and report
    > the state of SELinux. They also have code to accept or generate their
    > own RFC4122-esque UUID. Define some connical tests and then ever closer
    > to just needing some database-fu and automagic testing I suppose...
    > things I do not presently posess but am curious enough to follow some
    > pointers.


    Hrm. I'm going to have to save that, and parse a few times. (usual)

    > happy benchmarking,


    Not really, but I can't seem to give up ;-)

    -Mike

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [tbench regression fixes]: digging out smelly deadmen.

    >I cannot guarantee it will help, but the global -T option to pin netperf
    >>or netserver to a specific CPU might help cut-down the variables.

    >
    >
    > Yup, and how. Early on, the other variables drove me bat-**** frigging
    > _nuts_. I eventually selected a UP config to test _because_ those other
    > variables combined with SMP overhead and config options drove crazy ;-)
    >
    >
    >>FWIW netperf top of trunk omni tests can now also determine and report
    >>the state of SELinux.


    http://www.netperf.org/svn/netperf2/...netsec_linux.c

    Pointers to programtatic detection of AppArmour and a couple salient
    details about firewall (enabled, perhaps number of rules) from any
    quarter would be welcome.

    >> They also have code to accept or generate their
    >>own RFC4122-esque UUID. Define some connical tests and then ever closer
    >>to just needing some database-fu and automagic testing I suppose...
    >>things I do not presently posess but am curious enough to follow some
    >>pointers.

    >
    >
    > Hrm. I'm going to have to save that, and parse a few times. (usual)


    Plot thickening, seems that autotest knows about some version of
    netperf2 already... i'll be trying to see if there is some benefit to
    autotest to netperf2's top of trunk having the keyval output format, and
    if autotest groks paired systems to more easily do over a network testing.

    >>happy benchmarking,

    >
    >
    > Not really, but I can't seem to give up ;-)


    then I guess I'll close with

    successful benchmarking,

    if not necessarily happy

    rick jones
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, Oct 27, 2008 at 07:33:12PM +0100, Ingo Molnar (mingo@elte.hu) wrote:
    > the best dbench results come from systems that have enough RAM to
    > cache the full working set, and a filesystem intelligent enough to not
    > insert bogus IO serialization cycles (ext3 is not such a filesystem).


    My test system has 8gb for 8 clients and its performance dropped by 30%.
    There is no IO load since tbech uses only network part while dbench
    itself uses only disk IO. What we see right now is that usual network
    server which handles mixed set of essentially small reads and writes
    from the socket from multiple (8) clients suddenly lost one third of
    its performance.

    > The moment there's real IO it becomes harder to analyze but the same
    > basic behavior remains: the more unfair the IO scheduler, the "better"
    > dbench results we get.


    Right now there is no disk IO at all. Only quite usual network and
    process load.

    --
    Evgeniy Polyakov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 2008-10-27 at 12:18 -0700, Rick Jones wrote:
    > >
    > > Not really, but I can't seem to give up ;-)

    >
    > then I guess I'll close with
    >
    > successful benchmarking,
    >
    > if not necessarily happy


    There ya go, happy benchmarking is when they tell you what you want to
    hear. Successful is when you learn something.

    -Mike (not happy, but learning)

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [tbench regression fixes]: digging out smelly deadmen.

    From: Evgeniy Polyakov
    Date: Mon, 27 Oct 2008 22:39:34 +0300

    > On Mon, Oct 27, 2008 at 07:33:12PM +0100, Ingo Molnar (mingo@elte.hu) wrote:
    > > The moment there's real IO it becomes harder to analyze but the same
    > > basic behavior remains: the more unfair the IO scheduler, the "better"
    > > dbench results we get.

    >
    > Right now there is no disk IO at all. Only quite usual network and
    > process load.


    I think the hope is that by saying there isn't a problem enough times,
    it will become truth. :-)

    More seriously, Ingo, what in the world do we need to do in order to get
    you to start doing tbench runs and optimizing things (read as: fixing
    the regression you added)?

    I'm personally working on a test fibonacci heap implementation for
    the fair sched code, and I already did all of the cost analysis all
    the way back to the 2.6.22 pre-CFS days.

    But I'm NOT a scheduler developer, so it isn't my responsibility to do
    this crap for you. You added this regression, why do I have to get my
    hands dirty in order for there to be some hope that these regressions
    start to get fixed?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [tbench regression fixes]: digging out smelly deadmen.

    On Mon, 2008-10-27 at 12:48 -0700, David Miller wrote:
    > From: Evgeniy Polyakov
    > Date: Mon, 27 Oct 2008 22:39:34 +0300
    >
    > > On Mon, Oct 27, 2008 at 07:33:12PM +0100, Ingo Molnar (mingo@elte.hu) wrote:
    > > > The moment there's real IO it becomes harder to analyze but the same
    > > > basic behavior remains: the more unfair the IO scheduler, the "better"
    > > > dbench results we get.

    > >
    > > Right now there is no disk IO at all. Only quite usual network and
    > > process load.

    >
    > I think the hope is that by saying there isn't a problem enough times,
    > it will become truth. :-)
    >
    > More seriously, Ingo, what in the world do we need to do in order to get
    > you to start doing tbench runs and optimizing things (read as: fixing
    > the regression you added)?
    >
    > I'm personally working on a test fibonacci heap implementation for
    > the fair sched code, and I already did all of the cost analysis all
    > the way back to the 2.6.22 pre-CFS days.
    >
    > But I'm NOT a scheduler developer, so it isn't my responsibility to do
    > this crap for you. You added this regression, why do I have to get my
    > hands dirty in order for there to be some hope that these regressions
    > start to get fixed?


    I don't want to ruffle any feathers, but my box has comment or two..

    Has anyone looked at the numbers box emitted? Some what I believe to be
    very interesting data-points may have been overlooked.

    Here's a piece thereof again for better of worse. One last post won't
    burn the last electron. If they don't agree anyone else's numbers,
    that's ok, their numbers have meaning too, and speak for themselves.

    Retest hrtick pain:

    2.6.26.7-up virgin no highres timers enabled
    ring-test - 1.155 us/cycle = 865 KHz 1.000
    netperf - 130470.93 130771.00 129872.41 rr/s avg 130371.44 rr/s 1.000 (within jitter of previous tests)
    tbench - 355.153 357.163 356.836 MB/sec avg 356.384 MB/sec 1.000

    2.6.26.7-up virgin highres timers enabled, hrtick enabled
    ring-test - 1.368 us/cycle = 730 KHz .843
    netperf - 118959.08 118853.16 117761.42 rr/s avg 118524.55 rr/s .909
    tbench - 340.999 338.655 340.005 MB/sec avg 339.886 MB/sec .953

    OK, there's the htrick regression in all it's gory. Ouch, that hurt.

    Remember those numbers, box muttered them again in 27 testing. These
    previously tested kernels don't even have highres timers enabled, so
    obviously hrtick is a non-issue for them.

    2.6.26.6-up + clock + buddy + weight
    ring-test - 1.234 us/cycle = 810 KHz .947 [cmp1]
    netperf - 128026.62 128118.48 127973.54 rr/s avg 128039.54 rr/s .977
    tbench - 342.011 345.307 343.535 MB/sec avg 343.617 MB/sec .964

    2.6.26.6-up + clock + buddy + weight + revert_to_per_rq_vruntime + buddy_overhead
    ring-test - 1.174 us/cycle = 851 KHz .995 [cmp2]
    netperf - 133928.03 134265.41 134297.06 rr/s avg 134163.50 rr/s 1.024
    tbench - 358.049 359.529 358.342 MB/sec avg 358.640 MB/sec 1.006

    Note that I added all .27 additional scheduler overhead to .26, and then
    removed every last bit of it, theoretically leaving nothing but improved
    clock accuracy in the wake. The ring-test number indicates that our max
    context switch rate was thereby indeed fully recovered. We even got a
    modest throughput improvement for our trouble.

    However..
    versus .26 counterpart
    2.6.27-up virgin
    ring-test - 1.193 us/cycle = 838 KHz 1.034 [vs cmp1]
    netperf - 121293.48 121700.96 120716.98 rr/s avg 121237.14 rr/s .946
    tbench - 340.362 339.780 341.353 MB/sec avg 340.498 MB/sec .990

    2.6.27-up + revert_to_per_rq_vruntime + buddy_overhead
    ring-test - 1.122 us/cycle = 891 KHz 1.047 [vs cmp2]
    netperf - 119353.27 118600.98 119719.12 rr/s avg 119224.45 rr/s .900
    tbench - 338.701 338.508 338.562 MB/sec avg 338.590 MB/sec .951

    ...removing the overhead from .27 does not produce the anticipated result
    despite a max context switch rate markedly above that of 2.6.26.

    There lies an as yet unaddressed regression IMBHO. The hrtick has been
    addressed. It sucked at high frequency, and it's gone. The added math
    overhead in .27 hurt some too, and is now history as well.

    These two regressions are nearly identical in magnitude per box.

    I don't know who owns that regression, neither does box or git. I'm not
    pointing fingers in any direction. I've walked the regression hunting
    path, and know first-hand how rocky that path is.

    There are other things along the regression path that are worth noting:

    Three of the releases I tested were tested with identical schedulers,
    cfs-v24.1, yet they produced markedly different output, output which
    regresses. Again, I'm not pointing fingers, I'm merely illustrating how
    rocky this regression hunting path is. In 25, the sum of all kernel
    changes dropped our max switch rate markedly, yet both tbench and
    netperf _improved_ markedly. More rocks in the road. etc etc etc.

    To really illustrate rockiness, cutting network config down from distro
    lard-ball to something leaner and meaner took SMP throughput from this
    (was only testing netperf at that time) on 19 Aug..

    2.6.22.19 pinned
    16384 87380 1 1 300.00 59866.40
    16384 87380 1 1 300.01 59852.78
    16384 87380 1 1 300.01 59618.48
    16384 87380 1 1 300.01 59655.35

    ...to this on 13 Sept..

    2.6.22.19 (also pinned)
    Throughput 1136.02 MB/sec 4 procs

    16384 87380 1 1 60.01 94179.12
    16384 87380 1 1 60.01 88780.61
    16384 87380 1 1 60.01 91057.72
    16384 87380 1 1 60.01 94242.16

    ...and to this on 15 Sept.

    2.6.22.19 (also pinned)
    Throughput 1250.73 MB/sec 4 procs 1.00

    16384 87380 1 1 60.01 111272.55 1.00
    16384 87380 1 1 60.00 104689.58
    16384 87380 1 1 60.00 110733.05
    16384 87380 1 1 60.00 110748.88

    2.6.22.19-cfs-v24.1

    Throughput 1204.14 MB/sec 4 procs .962

    16384 87380 1 1 60.01 101799.85 .929
    16384 87380 1 1 60.01 101659.41
    16384 87380 1 1 60.01 101628.78
    16384 87380 1 1 60.01 101700.53

    wakeup granularity = 0 (make scheduler as preempt happy as 2.6.22 is)

    Throughput 1213.21 MB/sec 4 procs .970

    16384 87380 1 1 60.01 108569.27 .992
    16384 87380 1 1 60.01 108541.04
    16384 87380 1 1 60.00 108579.63
    16384 87380 1 1 60.01 108519.09

    Is that a rock in my let's double triple quintuple examine scheduler
    performance along the regression path or what? Same box, same
    benchmarks and same schedulers I've been examining the whole time.

    ..992 and .970.

    The list goes on and on and on, including SCHED_RR testing where I saw
    regression despite no CFS math. My point here is that every little
    change of anything changes the picture up to and including radically.
    These configuration changes, if viewed in regression terms, are HUGE.
    Build a fully enabled netfilter into the kernel vs modular, and it
    becomes even more so.

    The picture with UP config is different, but as far as box is concerned,
    while scheduler involvement is certainly interesting, there are even
    more interesting places. Somewhere.

    Hopefully this post won't be viewed in the rather cynical light of your
    first quoted stanza. Box is incapable of such, and I have no incentive
    to do such ;-) I just run the benchmarks, collect whatever numbers box
    feels like emitting, and run around trying to find the missing bits.

    -Mike

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast