Strange runtime behaviour, job time fluctuates - Powerpc

This is a discussion on Strange runtime behaviour, job time fluctuates - Powerpc ; I have a completely deterministic executable on a idle machine, and yet I'm getting some wildly fluctuating running times. The executable in question is a SPEC benchmark, and so should be completely deterministic. The machine is idle, no one else ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Strange runtime behaviour, job time fluctuates

  1. Strange runtime behaviour, job time fluctuates

    I have a completely deterministic executable on a idle machine, and
    yet I'm getting some wildly fluctuating running times.

    The executable in question is a SPEC benchmark, and so should be
    completely deterministic. The machine is idle, no one else is logged
    on, and the benchmark gets 99% of the CPU for the duration. The
    machine is a CELL Blade running Fedora 7 and the benchmark is single-
    threaded and running completely on the PPE, although I've seen this on
    POWER5 as well. I'm using the time command to get these measurements.

    My problem is simple, I have no explanation for the variable running
    time, which fluctuates pretty drastically. The last two runs gave me
    7m24s and 9m19s. Does anyone have any idea why this would happen? The
    only significant difference between the two runs was in involuntary
    context switches (457 vs 564), which suggests that for some reason one
    run is getting much less work done per time slice than the other...
    and I have no clue why. Things like # of page faults and voluntary
    context switches are the same. Now, I don't expect the exact same
    running time every invocation, since the rest of the machine isn't
    free from outside influences, but like I said, I've made sure that
    disturbances to the machine have been minimized and so I don't expect
    over 2 minutes difference in running time. Typically I most of the
    times are distributed closely around 7m and 9m, for whatever reason.

    Have I missed something? I've considered cache, the SMP nature of the
    PPE, and scheduling, but I don't see how those might contribute to
    this problem. If anyone has any ideas I'd love to hear them.


  2. Re: Strange runtime behaviour, job time fluctuates

    ym@dodgeit.com wrote:
    > I have a completely deterministic executable on a idle machine, and
    > yet I'm getting some wildly fluctuating running times.
    >
    > The executable in question is a SPEC benchmark, and so should be
    > completely deterministic. The machine is idle, no one else is logged
    > on, and the benchmark gets 99% of the CPU for the duration. The
    > machine is a CELL Blade running Fedora 7 and the benchmark is single-
    > threaded and running completely on the PPE, although I've seen this on
    > POWER5 as well. I'm using the time command to get these measurements.
    >
    > My problem is simple, I have no explanation for the variable running
    > time, which fluctuates pretty drastically. The last two runs gave me
    > 7m24s and 9m19s. Does anyone have any idea why this would happen? The
    > only significant difference between the two runs was in involuntary
    > context switches (457 vs 564), which suggests that for some reason one
    > run is getting much less work done per time slice than the other...
    > and I have no clue why. Things like # of page faults and voluntary
    > context switches are the same. Now, I don't expect the exact same
    > running time every invocation, since the rest of the machine isn't
    > free from outside influences, but like I said, I've made sure that
    > disturbances to the machine have been minimized and so I don't expect
    > over 2 minutes difference in running time. Typically I most of the
    > times are distributed closely around 7m and 9m, for whatever reason.
    >
    > Have I missed something? I've considered cache, the SMP nature of the
    > PPE, and scheduling, but I don't see how those might contribute to
    > this problem. If anyone has any ideas I'd love to hear them.
    >


    A couple of thoughts:

    - I don't know how many CPUs you have on each blade, but try using
    taskset to bind your process to a specific CPU/core.

    - Is it possible that your clock speed is getting throttled for some
    reason ? Some blades have clock speed control which ramps up and down
    with load, and which can also ramp down if the CPU internal temperature
    gets too high. Does cpufreq-info tell you anything ?

    Paul

  3. Re: Strange runtime behaviour, job time fluctuates

    On 2007-08-10, ym@dodgeit.com wrote:
    > I have a completely deterministic executable on a idle machine, and
    > yet I'm getting some wildly fluctuating running times.
    >
    > The executable in question is a SPEC benchmark, and so should be
    > completely deterministic. The machine is idle, no one else is logged
    > on, and the benchmark gets 99% of the CPU for the duration. The
    > machine is a CELL Blade running Fedora 7 and the benchmark is single-
    > threaded and running completely on the PPE, although I've seen this on
    > POWER5 as well. I'm using the time command to get these measurements.
    >
    > My problem is simple, I have no explanation for the variable running
    > time, which fluctuates pretty drastically. The last two runs gave me
    > 7m24s and 9m19s. Does anyone have any idea why this would happen? The
    > only significant difference between the two runs was in involuntary
    > context switches (457 vs 564), which suggests that for some reason one
    > run is getting much less work done per time slice than the other...
    > and I have no clue why. Things like # of page faults and voluntary
    > context switches are the same. Now, I don't expect the exact same
    > running time every invocation, since the rest of the machine isn't
    > free from outside influences, but like I said, I've made sure that
    > disturbances to the machine have been minimized and so I don't expect
    > over 2 minutes difference in running time. Typically I most of the
    > times are distributed closely around 7m and 9m, for whatever reason.
    >
    > Have I missed something? I've considered cache, the SMP nature of the
    > PPE, and scheduling, but I don't see how those might contribute to
    > this problem. If anyone has any ideas I'd love to hear them.


    Two things to check in case they _MIGHT_ be the cause:

    - CPU temperature that might cause throttling, if those
    CPUs do thermal throttling.

    - If these CPUs have the inverted page tables like the
    PowerPC (and Power?) architecture, might there be some
    odd effect related to that?

    --
    Robert Riches
    spamtrap42@verizon.net
    (Yes, that is one of my email addresses.)

  4. Re: Strange runtime behaviour, job time fluctuates

    ym@dodgeit.com writes:
    >My problem is simple, I have no explanation for the variable running
    >time, which fluctuates pretty drastically. The last two runs gave me
    >7m24s and 9m19s. Does anyone have any idea why this would happen? The
    >only significant difference between the two runs was in involuntary
    >context switches (457 vs 564), which suggests that for some reason one
    >run is getting much less work done per time slice than the other...
    >and I have no clue why.


    They are getting pretty much the same length of time slice (slightly
    more than a second). It's not surprising if a longer running job gets
    preempted more often.

    Concerning your problem, I have no idea. For a normal CPU I would be
    thinking about effects from the MMU and caches (mapping several hot
    pages to the same cache set, causing increased conflict misses), but
    AFAIK a PPE has no MMU and no cache, so that can't be it.

    - anton
    --
    M. Anton Ertl Some things have to be seen to be believed
    anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
    http://www.complang.tuwien.ac.at/anton/home.html

  5. Re: Strange runtime behaviour, job time fluctuates

    ym@dodgeit.com wrote:
    > I have a completely deterministic executable on a idle machine, and
    > yet I'm getting some wildly fluctuating running times.


    [snip]

    > Have I missed something? I've considered cache, the SMP nature of the
    > PPE, and scheduling, but I don't see how those might contribute to
    > this problem. If anyone has any ideas I'd love to hear them.


    1 - Maybe cron jobs?

    2 - Is the computer connected to the Internet? Processing
    network packets might slow it down.

  6. Re: Strange runtime behaviour, job time fluctuates

    On Aug 9, 8:58 pm, y...@dodgeit.com wrote:
    > I have a completely deterministic executable on a idle machine, and
    > yet I'm getting some wildly fluctuating running times.
    >
    > The executable in question is a SPEC benchmark, and so should be
    > completely deterministic. The machine is idle, no one else is logged


    you could boot single user.

    > on, and the benchmark gets 99% of the CPU for the duration. The
    > machine is a CELL Blade running Fedora 7 and the benchmark is single-


    which kernel is that.

    is your kernel configured to allow processor scaling of cpu frequency.
    many do this to save energy. it could be just sitting it slows down.
    then it has to speed up again. I have heard people complain on
    Debian list sometimes their macs had been seeming to be sticky
    around the lower cycling.



    > threaded and running completely on the PPE, although I've seen this on
    > POWER5 as well. I'm using the time command to get these measurements.
    >
    > My problem is simple, I have no explanation for the variable running
    > time, which fluctuates pretty drastically. The last two runs gave me
    > 7m24s and 9m19s. Does anyone have any idea why this would happen? The


    any code will run faster the second time than the first time, because
    instructions which formerly had to be loaded from disk are now
    cached in memory.

    BTW all the cell systems i have seen have max 512MB RAM and
    some have only 256. (i have not used them, I have been shopping
    around to maybe buy one). this is considerably less than usual,
    so I wonder if they have more sophisticated memory management
    control.

    if I had time (if i had a cell system to test) I might try another
    distro/ kernel or two.


    > only significant difference between the two runs was in involuntary
    > context switches (457 vs 564), which suggests that for some reason one
    > run is getting much less work done per time slice than the other...
    > and I have no clue why. Things like # of page faults and voluntary
    > context switches are the same. Now, I don't expect the exact same
    > running time every invocation, since the rest of the machine isn't
    > free from outside influences, but like I said, I've made sure that
    > disturbances to the machine have been minimized and so I don't expect
    > over 2 minutes difference in running time. Typically I most of the
    > times are distributed closely around 7m and 9m, for whatever reason.


    actually that difference is only 22%. it is worrisome though.
    >
    > Have I missed something? I've considered cache, the SMP nature of the
    > PPE, and scheduling, but I don't see how those might contribute to
    > this problem. If anyone has any ideas I'd love to hear them.




+ Reply to Thread