unpredictability in scheduler test results - Kernel

This is a discussion on unpredictability in scheduler test results - Kernel ; I was running some tests with the "fairtest" testcase and noticed that successive runs could give wildly different results. I was originally using the tip/master tree as of Sep 16, but I also confirmed the behaviour with Linus' tree as ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: unpredictability in scheduler test results

  1. unpredictability in scheduler test results

    I was running some tests with the "fairtest" testcase and noticed that
    successive runs could give wildly different results.

    I was originally using the tip/master tree as of Sep 16, but I also
    confirmed the behaviour with Linus' tree as of Sep 14 (with the
    __load_balance_iterator() fix applied). The same behaviour is present
    in both cases.

    I'm using the test config listed at the bottom. It's pretty
    straightforward.

    The first run gave the following results. As expected, the system
    picked a static task distribution and didn't migrate tasks during the test.

    group actual(%) expected(%) avg latency(ms) max_latency(ms)
    1 33.31(33.33/33.2 30.00 23/23 37/37
    2 36.29 40.00 5 25
    3 30.40(27.40/33.40) 30.00 22/23 60/40



    On the second run, the task distribution is almost perfect, but the
    system was only using one of the two cpus as seen by the difference
    between actual and expected cpu time.

    Warning, actual cpu time different than expected. actual: 10033.011108,
    expected: 20000.000000
    group actual(%) expected(%) avg latency(ms) max_latency(ms)
    1 0.24(30.59/29.88) 30.00 26/27 68/58
    2 39.87 40.00 20 36
    3 29.89(29.87/29.91) 30.00 28/27 47/60


    Any ideas what's going on?

    Chris



    test config file:
    #delay (secs)
    1

    #duration (secs)
    10

    #groupname,share,numhogs
    1,750,n
    2,1000,1
    3,750,n


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: unpredictability in scheduler test results -- still present

    Chris Friesen wrote:

    > I'm using the test config listed at the bottom. It's pretty
    > straightforward.


    > On the second run, the task distribution is almost perfect, but the
    > system was only using one of the two cpus as seen by the difference
    > between actual and expected cpu time.
    >
    > Warning, actual cpu time different than expected. actual: 10033.011108,
    > expected: 20000.000000
    > group actual(%) expected(%) avg latency(ms) max_latency(ms)
    > 1 0.24(30.59/29.88) 30.00 26/27 68/58
    > 2 39.87 40.00 20 36
    > 3 29.89(29.87/29.91) 30.00 28/27 47/60


    This behaviour (that load balancing is messed up) is now almost
    continuous with both current tip/master and current Linus git. On the
    first test after booting, it seems to work okay (although there are
    still issues with fairness). On every subsequent test, fairness is good
    but it only uses one of the two cpus.

    Also, building a kernel with "-j10" results in one cpu being mostly idle
    while the other one is 100% busy. It used to be both 100% busy--if I get
    time today I may try bisecting it.

    Chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: unpredictability in scheduler test results -- still present

    Chris Friesen wrote:

    > This behaviour (that load balancing is messed up) is now almost
    > continuous with both current tip/master and current Linus git. On the
    > first test after booting, it seems to work okay (although there are
    > still issues with fairness). On every subsequent test, fairness is good
    > but it only uses one of the two cpus.
    >
    > Also, building a kernel with "-j10" results in one cpu being mostly idle
    > while the other one is 100% busy. It used to be both 100% busy--if I get
    > time today I may try bisecting it.


    It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
    balancing problem go away and causes all cpus to be used.

    With this option enabled, the problem seems to be present as far back as
    2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
    doesn't have ftrace).

    I have no idea why turning on dynamic ftrace would affect load balancing
    behaviour, but it's very repeatable. The very first test run after
    booting works fine, and all successive runs fail to balance properly.

    Chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: unpredictability in scheduler test results -- still present


    * Chris Friesen wrote:

    > Chris Friesen wrote:
    >
    >> This behaviour (that load balancing is messed up) is now almost
    >> continuous with both current tip/master and current Linus git. On the
    >> first test after booting, it seems to work okay (although there are
    >> still issues with fairness). On every subsequent test, fairness is
    >> good but it only uses one of the two cpus.
    >>
    >> Also, building a kernel with "-j10" results in one cpu being mostly
    >> idle while the other one is 100% busy. It used to be both 100% busy--if
    >> I get time today I may try bisecting it.

    >
    > It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
    > balancing problem go away and causes all cpus to be used.
    >
    > With this option enabled, the problem seems to be present as far back
    > as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
    > doesn't have ftrace).
    >
    > I have no idea why turning on dynamic ftrace would affect load
    > balancing behaviour, but it's very repeatable. The very first test
    > run after booting works fine, and all successive runs fail to balance
    > properly.


    very weird. Would be very nice to figure it out.

    and in tip/master we dont have the 'ftraced' kernel-patching kernel
    thread anymore, so ftrace should be passive by all means.

    OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
    the .config, or also activating it via /debug/tracing/current_tracer?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: unpredictability in scheduler test results -- still present

    Ingo Molnar wrote:
    > * Chris Friesen wrote:


    >> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
    >> balancing problem go away and causes all cpus to be used.
    >>
    >> With this option enabled, the problem seems to be present as far back
    >> as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
    >> doesn't have ftrace).
    >>
    >> I have no idea why turning on dynamic ftrace would affect load
    >> balancing behaviour, but it's very repeatable. The very first test
    >> run after booting works fine, and all successive runs fail to balance
    >> properly.


    > OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
    > the .config, or also activating it via /debug/tracing/current_tracer?


    Just enabling it in the .config is enough to trigger the behaviour
    change. I'm not explicitly activating any traces.

    Chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: unpredictability in scheduler test results -- still present


    * Chris Friesen wrote:

    > Ingo Molnar wrote:
    >> * Chris Friesen wrote:

    >
    >>> It turns out that disabling CONFIG_DYNAMIC_FTRACE makes the load
    >>> balancing problem go away and causes all cpus to be used.
    >>>
    >>> With this option enabled, the problem seems to be present as far back
    >>> as 2.6.27-rc2. (2.6.27-rc1 doesn't compile on my machine, and 2.6.26
    >>> doesn't have ftrace).
    >>>
    >>> I have no idea why turning on dynamic ftrace would affect load
    >>> balancing behaviour, but it's very repeatable. The very first test
    >>> run after booting works fine, and all successive runs fail to balance
    >>> properly.

    >
    >> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
    >> the .config, or also activating it via /debug/tracing/current_tracer?

    >
    > Just enabling it in the .config is enough to trigger the behaviour
    > change. I'm not explicitly activating any traces.


    ok, that would be a clear ftrace bug i guess?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: unpredictability in scheduler test results -- still present

    Ingo Molnar wrote:
    > * Chris Friesen wrote:
    >> Ingo Molnar wrote:


    >>> OTOH, what does 'truning on dftrace' exactly mean? Just enabling it in
    >>> the .config, or also activating it via /debug/tracing/current_tracer?


    >> Just enabling it in the .config is enough to trigger the behaviour
    >> change. I'm not explicitly activating any traces.


    > ok, that would be a clear ftrace bug i guess?


    It's either an ftrace bug or a fragile load balancer bug. I wonder if
    it's related somehow to the stop_machine() call in ftrace_dynamic_init()?

    Chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread