[RFC v1] Tunable sched_mc_power_savings=n - Kernel

This is a discussion on [RFC v1] Tunable sched_mc_power_savings=n - Kernel ; Peter Zijlstra wrote: > There used to be an option for them to also up on niced load. If that You could always force socket power saving mode to off globally too if you don't want it at all. > ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 36 of 36

Thread: [RFC v1] Tunable sched_mc_power_savings=n

  1. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Peter Zijlstra wrote:

    > There used to be an option for them to also up on niced load. If that


    You could always force socket power saving mode to off globally too if
    you don't want it at all.

    > disappeared then I'd call that a huge usability regression. Basically
    > making ondemand useless.
    >
    > /me checks,..
    >
    > Yeah, on F9, my opteron runs at 1GHz when idle, but when I start distcc,
    > which like said runs on nice 19, the cpu speed goes up to 2.4GHz.


    Ok distcc is a special case, but it doesn't apply to a lot of other
    processes (do you really want your CPU to crank up for "updatedb" or
    beagle or some backup job for example?)

    Perhaps there should be a way to express this in priorities?
    "I am low priority, but want to be work conserving if the system
    is idle"

    The group scheduler is changing the semantics of nice completely
    anyways, so so more changes could be applied.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Hi

    > Advantages:
    >
    > * Enterprise workloads on large hardware configurations may need
    > aggressive consolidation strategy
    > * Performance impact on server is different from desktop or laptops.
    > Interactivity is less of a concern on large enterprise servers while
    > workload response times and performance per watt is more significant
    > * Aggressive power savings even with marginal performance penalty is
    > is a useful tunable for servers since it may provide good
    > performance-per-watt at low utilisation
    > * This tunable can influence other parts of scheduler like wakeup
    > biasing for overall task consolidation


    I'd like to know how many saving power.
    if there are only small saving, I think this is not interesting feature.

    Do you expect how many percentage saving?



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC v1] Tunable sched_mc_power_savings=n

    * KOSAKI Motohiro [2008-06-27 17:08:22]:

    > Hi
    >
    > > Advantages:
    > >
    > > * Enterprise workloads on large hardware configurations may need
    > > aggressive consolidation strategy
    > > * Performance impact on server is different from desktop or laptops.
    > > Interactivity is less of a concern on large enterprise servers while
    > > workload response times and performance per watt is more significant
    > > * Aggressive power savings even with marginal performance penalty is
    > > is a useful tunable for servers since it may provide good
    > > performance-per-watt at low utilisation
    > > * This tunable can influence other parts of scheduler like wakeup
    > > biasing for overall task consolidation

    >
    > I'd like to know how many saving power.
    > if there are only small saving, I think this is not interesting feature.
    >
    > Do you expect how many percentage saving?


    The power savings depends on the number of sockets. With the present
    hardware on servers, we are seeing very small power savings. However
    deep sleep states and wide variation in CPU power consumption in
    future will increase the percentage. The percentage may be around
    1 to 5 percent. Given the system utilisation pattern and large number
    of systems idle in a datacenter, this is not an insignificant number.
    The power value can be significant in a 4 socket or larger system
    configuration.

    --Vaidy
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [RFC v1] Tunable sched_mc_power_savings=n

    KOSAKI Motohiro wrote:
    > Hi
    >
    >
    >>Advantages:
    >>
    >>* Enterprise workloads on large hardware configurations may need
    >> aggressive consolidation strategy
    >>* Performance impact on server is different from desktop or laptops.
    >> Interactivity is less of a concern on large enterprise servers while
    >> workload response times and performance per watt is more significant
    >>* Aggressive power savings even with marginal performance penalty is
    >> is a useful tunable for servers since it may provide good
    >> performance-per-watt at low utilisation
    >>* This tunable can influence other parts of scheduler like wakeup
    >> biasing for overall task consolidation

    >
    >
    > I'd like to know how many saving power.
    > if there are only small saving, I think this is not interesting feature.
    >
    > Do you expect how many percentage saving?
    >


    An experiment using DVFS on Xeon yeilded a 15-watt allowable reduction
    even under running a considerable TPC-W workload. Lesser loads allowed
    a 40-watt (out of 160) reduction.

    --dave
    --
    David Collier-Brown | Always do right. This will gratify
    Sun Microsystems, Toronto | some people and astonish the rest
    davecb@sun.com | -- Mark Twain
    (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
    bridge: (877) 385-4099 code: 506 9191#
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Andi Kleen said on Fri, 27 Jun 2008 00:38:53 +0200:
    > Peter Zijlstra wrote:
    >
    > >> And your workload manager could just nice processes. It should probably
    > >> do that anyways to tell ondemand you don't need full frequency.

    > >
    > > Except that I want my nice 19 distcc processes to utilize as much cpu as
    > > possible, but just not bother any other stuff I might be doing...

    >
    > They already won't do that if you run ondemand and cpufreq. It won't
    > crank up the frequency for niced processes.


    Shouldn't there be a powernice, just as there is an ionice and a nice?
    Just as you don't always want CPU priority and IO priority to be
    coupled, Peter has just demonstrated a very good case where you don't
    want power and CPU choices to be coupled. Whether the ondemand
    governor of CPUFreq counts a process as wanting the CPU to run at a
    higher speed, and these scheduler decisions should be controlled by
    powernice. By default, perhaps a high powernice should equal a high
    nice equal to a high ionice, but the user should be able to change
    this. The last thing you want is a distcc process taking up lots of
    time, burning more Joules because it runs 10 times longer with only
    half the power. It's not a nice choice between that and running at
    nice 0 where it interferes with the user's editing.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Andi Kleen said on Fri, 27 Jun 2008 10:06:28 +0200:
    > Peter Zijlstra wrote:
    > > disappeared then I'd call that a huge usability regression. Basically
    > > making ondemand useless.
    > >
    > > /me checks,..
    > >
    > > Yeah, on F9, my opteron runs at 1GHz when idle, but when I start distcc,
    > > which like said runs on nice 19, the cpu speed goes up to 2.4GHz.

    >
    > Ok distcc is a special case,


    No it's not. Most compute heavy jobs most people run would be better
    off being done sooner rather than later, otherwise you might as well
    go out and buy a 100MHz computer. But most users also want "nice" to
    do what was intended of it -- make one app not steal *any* CPU cycles
    from another app that would really rather those CPU cycles right now
    (yes, I know that long running CPU jobs theoretically become lower
    priority so steal less, and in theory, there is no difference between
    theory and practice. But in practice, there is, and these long
    running jobs still impact on desktop and ssh interactivity)

    I end up nicing opera and firefox half the time because I'm sick of
    their CPU leaks. It doesn't mean I don't want them to finish their
    screen updating sooner.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Tim Connors wrote:
    > Andi Kleen said on Fri, 27 Jun 2008 10:06:28 +0200:
    >> Peter Zijlstra wrote:
    >>> disappeared then I'd call that a huge usability regression. Basically
    >>> making ondemand useless.
    >>>
    >>> /me checks,..
    >>>
    >>> Yeah, on F9, my opteron runs at 1GHz when idle, but when I start distcc,
    >>> which like said runs on nice 19, the cpu speed goes up to 2.4GHz.

    >> Ok distcc is a special case,

    >
    > No it's not.


    You're arguing against the current default of ondemand then.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC v1] Tunable sched_mc_power_savings=n

    On Fri, Jun 27, 2008 at 10:06:28AM +0200, Andi Kleen wrote:

    > Ok distcc is a special case, but it doesn't apply to a lot of other
    > processes (do you really want your CPU to crank up for "updatedb" or
    > beagle or some backup job for example?)


    If something's CPU-bound, then you almost certainly want to speed the
    CPU up. There's no power advantage to leaving it at a low frequency. I'd
    be surprised if things like beagle or updatedb are CPU-bound, though.

    --
    Matthew Garrett | mjg59@srcf.ucam.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Matthew Garrett wrote:
    > On Fri, Jun 27, 2008 at 10:06:28AM +0200, Andi Kleen wrote:
    >
    >> Ok distcc is a special case, but it doesn't apply to a lot of other
    >> processes (do you really want your CPU to crank up for "updatedb" or
    >> beagle or some backup job for example?)

    >
    > If something's CPU-bound, then you almost certainly want to speed the
    > CPU up. There's no power advantage to leaving it at a low frequency.


    I'm not sure you can say it that certainly. While on many standalone systems
    "race to idle" is the best strategy, there are cases where it is not
    true.

    For example if you're in a data center at a specific operating point and
    you would need to crank up the air condition at significant power cost it might
    be well better overall to force all servers to a lower operating point
    and avoid that.

    That said in general you all should have complained when ondemand behaviour
    was introduced.

    Also it's unclear that the general "race to idle" heuristic really
    applies to the case of the "keep sockets idle" power optimization
    that started this thread.

    Usually package C states bring much more than core C states
    and keeping another package completely idle saves likely
    more power than the power cost of running something a little
    bit slower on a package that is already busy on another core.

    I still think using nice levels for this is reasonable.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC v1] Tunable sched_mc_power_savings=n

    On Sat, Jun 28, 2008 at 02:36:02PM +0200, Andi Kleen wrote:

    > For example if you're in a data center at a specific operating point and
    > you would need to crank up the air condition at significant power cost it might
    > be well better overall to force all servers to a lower operating point
    > and avoid that.


    Sure, there are cases where you have additional constraints. But within
    those constraints, you probably want to run as fast as possible.

    > That said in general you all should have complained when ondemand behaviour
    > was introduced.


    ignore_nice seems to be set to 0 by default?

    > Also it's unclear that the general "race to idle" heuristic really
    > applies to the case of the "keep sockets idle" power optimization
    > that started this thread.
    >
    > Usually package C states bring much more than core C states
    > and keeping another package completely idle saves likely
    > more power than the power cost of running something a little
    > bit slower on a package that is already busy on another core.


    I'd agree with that.
    --
    Matthew Garrett | mjg59@srcf.ucam.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Andi Kleen said on Fri, 27 Jun 2008 00:38:53 +0200:
    >>Peter Zijlstra wrote:
    >>>>And your workload manager could just nice processes. It should probably
    >>>>do that anyways to tell ondemand you don't need full frequency.
    >>>
    >>>Except that I want my nice 19 distcc processes to utilize as much cpu as
    >>>possible, but just not bother any other stuff I might be doing...

    >>
    >>They already won't do that if you run ondemand and cpufreq. It won't
    >>crank up the frequency for niced processes.



    Tim Connors then wrote:
    > Shouldn't there be a powernice, just as there is an ionice and a nice?

    Hmmn, how about:

    User Commands nice(1)

    NAME
    nice - invoke a command with an altered priority

    SYNOPSIS
    /usr/bin/nice [-increment | -n increment] [-s|-i|-e|-p] command [argu-
    ment...]

    DESCRIPTION
    The nice utility invokes command, requesting that it be run
    with a different priority. If -i is specified, the priority
    of (disk) I/O is modified. If -e is specified, ethernet (or
    other networking) priority is changed. If -p is specified, power
    usage priority is changed and if -s is specified, or none
    of -1, -e or -p is specified, then system scheduling priority
    is modified...

    --dave
    --
    David Collier-Brown | Always do right. This will gratify
    Sun Microsystems, Toronto | some people and astonish the rest
    davecb@sun.com | -- Mark Twain
    (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
    bridge: (877) 385-4099 code: 506 9191#
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC v1] Tunable sched_mc_power_savings=n

    * David Collier-Brown [2008-06-29 14:02:58]:

    > Andi Kleen said on Fri, 27 Jun 2008 00:38:53 +0200:
    >>> Peter Zijlstra wrote:
    >>>>> And your workload manager could just nice processes. It should probably
    >>>>> do that anyways to tell ondemand you don't need full frequency.
    >>>>
    >>>> Except that I want my nice 19 distcc processes to utilize as much cpu as
    >>>> possible, but just not bother any other stuff I might be doing...
    >>>
    >>> They already won't do that if you run ondemand and cpufreq. It won't
    >>> crank up the frequency for niced processes.

    >
    >
    > Tim Connors then wrote:
    >> Shouldn't there be a powernice, just as there is an ionice and a nice?

    > Hmmn, how about:
    >
    > User Commands nice(1)
    >
    > NAME
    > nice - invoke a command with an altered priority
    >
    > SYNOPSIS
    > /usr/bin/nice [-increment | -n increment] [-s|-i|-e|-p] command [argu-
    > ment...]
    >
    > DESCRIPTION
    > The nice utility invokes command, requesting that it be run
    > with a different priority. If -i is specified, the priority
    > of (disk) I/O is modified. If -e is specified, ethernet (or
    > other networking) priority is changed. If -p is specified, power
    > usage priority is changed and if -s is specified, or none of -1,
    > -e or -p is specified, then system scheduling priority
    > is modified...


    This is good. We are exploring powernice. 'Generally' cpu, io and
    power nice values should be similar: high or low. Can we comeup with
    use cases where we want to have conflicting nice values for cpu, io
    and power?

    CPU IO POWER
    distcc: low low low
    firefox: low high high
    ssh/shell: high high high
    X: high high low


    I am trying to find answer to the question: Should we have the power
    saving tunable as 'nice' value per process or system wide?

    How should we interpret the POWER parameter in a datacenter with power
    constraint as mentioned in this thread? Or in a simple case of AC vs
    battery in a laptop.

    Thanks,
    Vaidy

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC v1] Tunable sched_mc_power_savings=n

    On Mon, 30 Jun 2008, Vaidyanathan Srinivasan wrote:

    > * David Collier-Brown [2008-06-29 14:02:58]:
    >
    > > Andi Kleen said on Fri, 27 Jun 2008 00:38:53 +0200:
    > >>> Peter Zijlstra wrote:
    > >>>>> And your workload manager could just nice processes. It should probably
    > >>>>> do that anyways to tell ondemand you don't need full frequency.
    > >>>>
    > >>>> Except that I want my nice 19 distcc processes to utilize as much cpu as
    > >>>> possible, but just not bother any other stuff I might be doing...
    > >>>
    > >>> They already won't do that if you run ondemand and cpufreq. It won't
    > >>> crank up the frequency for niced processes.

    > >
    > >
    > > Tim Connors then wrote:
    > >> Shouldn't there be a powernice, just as there is an ionice and a nice?

    > > Hmmn, how about:
    > >
    > > User Commands nice(1)
    > >
    > > NAME
    > > nice - invoke a command with an altered priority
    > >
    > > SYNOPSIS
    > > /usr/bin/nice [-increment | -n increment] [-s|-i|-e|-p] command [argu-
    > > ment...]
    > >
    > > DESCRIPTION
    > > The nice utility invokes command, requesting that it be run
    > > with a different priority. If -i is specified, the priority
    > > of (disk) I/O is modified. If -e is specified, ethernet (or
    > > other networking) priority is changed. If -p is specified, power
    > > usage priority is changed and if -s is specified, or none of -1,

    -i ^^^
    > > -e or -p is specified, then system scheduling priority
    > > is modified...

    >
    > This is good. We are exploring powernice. 'Generally' cpu, io and
    > power nice values should be similar: high or low. Can we comeup with
    > use cases where we want to have conflicting nice values for cpu, io
    > and power?
    >
    > CPU IO POWER
    > distcc: low low low
    > firefox: low high high
    > ssh/shell: high high high
    > X: high high low


    What's "high" mean? High priority, or high niceness?

    Looks like you're referring to priority there. Although, if those are
    real examples, then it demonstrates why different people would set
    different priororities (I's say firefox would be both high CPU and power
    nice).

    distcc wants to be high CPU "nice" (low CPU priority - let other desktop
    etc things get done first). But low niceness for power and probably io
    (get it over and done with sooner, and IO traffic is burst, so won't
    interfere so much with other IO).

    > How should we interpret the POWER parameter in a datacenter with power
    > constraint as mentioned in this thread? Or in a simple case of AC vs
    > battery in a laptop.


    On laptop battery, background tasks like firefox redrawing crappy
    animations -- high power nice, high cpu nice (ie, if it was the only thing
    running, and it still wanted to chew 100% cpu, it'll only be chewing
    850MHz of 100% cpu on my Core2 Duo). My shell though, will be running at
    the default io=cpu=power nice of 0.

    Datacentre running with little loading because it's approaching midnight
    localtime, so lets run the general background tasks at high power nice,
    medium cpu nice, medium IO nice. During peak times, the main transaction
    tasks running at low power nice, low cpu and low io nice, will be busy,
    and so the cpus all go up a notch or three. It's not just a matter of
    installing powersaved and saying "performance" vs "ondemand", at various
    times of the day, because it's better to adjust dynamically based on real
    load.

    --
    Tim Connors

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC v1] Tunable sched_mc_power_savings=n

    Vaidyanathan Srinivasan wrote:
    > I am trying to find answer to the question: Should we have the power
    > saving tunable as 'nice' value per process or system wide?
    >
    > How should we interpret the POWER parameter in a datacenter with power
    > constraint as mentioned in this thread? Or in a simple case of AC vs
    > battery in a laptop.


    I agree with Tim re setting them all independently, and suggest that
    they're all really per-process values: setting power saving
    system-wide is meaningful, but so are individual settings.
    There is therefor an argument for making them subsets of
    a higher-level nice program.

    Mind you, the order in which one *implements* the capability,
    and whether one does powernice first and adds it to nice later
    is your call! I have no idea of how hard what I suggested is (;-))

    --dave
    --
    David Collier-Brown | Always do right. This will gratify
    Sun Microsystems, Toronto | some people and astonish the rest
    davecb@sun.com | -- Mark Twain
    (905) 943-1983, cell: (647) 833-9377, (800) 555-9786 x56583
    bridge: (877) 385-4099 code: 506 9191#
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [RFC v1] Tunable sched_mc_power_savings=n

    David Collier-Brown wrote:
    > Vaidyanathan Srinivasan wrote:
    >> I am trying to find answer to the question: Should we have the power
    >> saving tunable as 'nice' value per process or system wide?
    >>
    >> How should we interpret the POWER parameter in a datacenter with power
    >> constraint as mentioned in this thread? Or in a simple case of AC vs
    >> battery in a laptop.

    >
    > I agree with Tim re setting them all independently,


    I agree that powernice is likely a good idea (although the semantics
    are not 100% clear yet), but there's still the issue
    (shared with ionice) that 99.99+% of all setups won't set powernice
    explicitely so you still need a reasonable default when it is not
    set.

    Me thinks the correct strategy would be something like this:

    - When powernice is set prefer it
    - For the idle socket optimization: use nice because it's
    unclear that "race to idle" applies here.
    - For ondemand: when nice is set behave more like the conservative
    governor and take longer to crank up [this might be controversal]

    Also are the best powernice semantics the same between idle
    sockets and ondemand? I'm not sure.


    and suggest that
    > they're all really per-process values: setting power saving system-wide
    > is meaningful, but so are individual settings.
    > There is therefor an argument for making them subsets of
    > a higher-level nice program.
    >
    > Mind you, the order in which one *implements* the capability,
    > and whether one does powernice first and adds it to nice later
    > is your call! I have no idea of how hard what I suggested is (;-))


    In general for Linux deployment it tends to be easier
    to provide another package with an own command instead of
    patching a core package like coreutils

    With an own package you can just tell the user
    "type (yum|zypper|apt-get|...) install powernice",
    while an updated coreutils tends to be more trouble or even
    require a distribution update.

    -Andi


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [RFC v1] Tunable sched_mc_power_savings=n

    On Fri, Jun 27, 2008 at 10:03:06AM +0200, Andi Kleen wrote:
    > Dipankar Sarma wrote:
    > > On Thu, Jun 26, 2008 at 11:37:08PM +0200, Andi Kleen wrote:
    > >> Dipankar Sarma wrote:
    > >>

    > > The current usage of this we are looking requires system-wide
    > > settings. That means nicing every process running on the system.
    > > That seems a little messy.

    >
    > Is it less messy than the letting applications negotiate
    > for the best policy by themselves as someone else suggested on the thread?


    I don't think letting applications negotiate among
    themselves is a good idea. The kernel should do that.

    > > Secondly, even if you nice the processes
    > > they are still going to be spread all over the CPU packages
    > > running at lower frequencies due to nice.

    >
    > My point was that this could be fixed and you could use nice
    > (or another per process parameter if you prefer)
    > as an input to load balancer decisions.


    Agreed. A variation of this that allows tasks to indicate
    their CPU power requirement, is something that we experimented
    with long ago. There are some difficult issues that need to be
    sorted out if this is to be effective -

    1. For some applications, like xmms, it is easy to predict. For
    commercial workloads - like a database, it is hard to get
    it right.

    2. Conflicting power requirements are hard to resolve. Grouping
    of tasks based on various combinations of power requirement
    is complex.

    3. Setting global policy is expensive - you have to loop through
    all the tasks in the system.

    > > We are talking about a different optimization here - something
    > > that will give more benefits in powersave mode when you have large
    > > systems.

    >
    > Yes it's a different optimization (although the over all theme -- power saving
    > -- is the same), but is there a real reason it cannot be driven from the
    > same per process heuristics instead of your ugly global sysctl?


    See the issues #1 and #2 above. Apart from that, what we discovered
    was that server admins really want a global settings at the moment.
    Any finer granularity than that would be a waste for them at the
    moment. No one really is looking at running php+mysql at one powernice
    and tomcat in another level *in the same server*.


    > My point was just that the heuristics
    > used by one power saving mechanism (ondemand) could be used
    > for the other too (socket grouping) -- and it would be certainly
    > a far saner interface than a global sysctl!.


    Per-task settings was the first thing we looked at when we
    started out. I think we should experiment with it and see
    if we can come up with a simple implementation that handles
    conflicting requirements well. If this can also handle global
    system power settings without having to loop through all the
    tasks in the system, I am OK with it.


    Thanks
    Dipankar
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2