Re: current linux-2.6.git: cpusets completely broken - Kernel
This is a discussion on Re: current linux-2.6.git: cpusets completely broken - Kernel ; 2008/7/12 Dmitry Adamushko :
> 2008/7/12 Linus Torvalds :
>>
>>
>> On Sat, 12 Jul 2008, Vegard Nossum wrote:
>>>
>>> Can somebody else please test/ack/review it too? This should eventually
>>> go into 2.6.26 if it doesn't break ...
-
Re: current linux-2.6.git: cpusets completely broken
2008/7/12 Dmitry Adamushko :
> 2008/7/12 Linus Torvalds :
>>
>>
>> On Sat, 12 Jul 2008, Vegard Nossum wrote:
>>>
>>> Can somebody else please test/ack/review it too? This should eventually
>>> go into 2.6.26 if it doesn't break anything else.
>>
>> And Dmitry, _please_ also explain what was going on. Why did things break
>> from calling common_cpu_mem_hotplug_unplug() too much? That function is
>> called pretty randomly anyway (for just about any random CPU event), so
>> why did it fail in some circumstances?
>
> Upon CPU_DOWN_PREPARE, update_sched_domains() ->
> detach_destroy_domains(&cpu_online_map) ;
> does the following:
>
> /*
> * Force a reinitialization of the sched domains hierarchy. The domains
> * and groups cannot be updated in place without racing with the balancing
> * code, so we temporarily attach all running cpus to the NULL domain
> * which will prevent rebalancing while the sched domains are recalculated.
> */
>
> The sched-domains should be rebuilt when a CPU_DOWN ops. is completed,
> effectivelly either upon CPU_DEAD{_FROZEN} (upon success) or
> CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
> initial state). That's what update_sched_domains() also does but only
> for !CPUSETS case.
>
> With Max's patch, sched-domains' reinitialization is delegated to CPUSETS code:
>
> cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
> rebuild_sched_domains()
>
> which as you've said "called pretty randomly anyway", e.g. for CPU_UP_PREPARE.
>
> [ ah, then rebuild_sched_domains() should not be there. It should be
> nop for MEMPLUG events I presume - should make another patch. ]
I had in mind something like this:
[ yes, probably the patch makes things somewhat uglier. I tried to bring a minimal amount of changes so far, just to emulate the 'old' behavior of update_sched_domains().
I guess, common_cpu_mem_hotplug_unplug() needs to be split up into cpu- and mem-hotplug parts to make it cleaner ]
(not tested yet)
---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..965d9eb 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
* in order to minimize text size.
*/
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
{
cgroup_lock();
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
* Scheduler destroys domains on hotplug events.
* Rebuild them based on the current settings.
*/
- rebuild_sched_domains();
+ if (rebuild_sd)
+ rebuild_sched_domains();
cgroup_unlock();
}
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
unsigned long phase, void *unused_cpu)
{
- if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+ swicth (phase) {
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
+ break;
+ default:
return NOTIFY_DONE;
+ }
- common_cpu_mem_hotplug_unplug();
- return 0;
+ return NOTIFY_OK;
}
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
void cpuset_track_online_nodes(void)
{
- common_cpu_mem_hotplug_unplug();
+ common_cpu_mem_hotplug_unplug(0);
}
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: current linux-2.6.git: cpusets completely broken
On Sat, 2008-07-12 at 12:45 +0200, Dmitry Adamushko wrote:
> 2008/7/12 Dmitry Adamushko :
> > 2008/7/12 Linus Torvalds :
> >>
> >>
> >> On Sat, 12 Jul 2008, Vegard Nossum wrote:
> >>>
> >>> Can somebody else please test/ack/review it too? This should eventually
> >>> go into 2.6.26 if it doesn't break anything else.
> >>
> >> And Dmitry, _please_ also explain what was going on. Why did things break
> >> from calling common_cpu_mem_hotplug_unplug() too much? That function is
> >> called pretty randomly anyway (for just about any random CPU event), so
> >> why did it fail in some circumstances?
> >
> > Upon CPU_DOWN_PREPARE, update_sched_domains() ->
> > detach_destroy_domains(&cpu_online_map) ;
> > does the following:
> >
> > /*
> > * Force a reinitialization of the sched domains hierarchy. The domains
> > * and groups cannot be updated in place without racing with the balancing
> > * code, so we temporarily attach all running cpus to the NULL domain
> > * which will prevent rebalancing while the sched domains are recalculated.
> > */
> >
> > The sched-domains should be rebuilt when a CPU_DOWN ops. is completed,
> > effectivelly either upon CPU_DEAD{_FROZEN} (upon success) or
> > CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
> > initial state). That's what update_sched_domains() also does but only
> > for !CPUSETS case.
> >
> > With Max's patch, sched-domains' reinitialization is delegated to CPUSETS code:
> >
> > cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
> > rebuild_sched_domains()
> >
> > which as you've said "called pretty randomly anyway", e.g. for CPU_UP_PREPARE.
> >
> > [ ah, then rebuild_sched_domains() should not be there. It should be
> > nop for MEMPLUG events I presume - should make another patch. ]
>
> I had in mind something like this:
>
> [ yes, probably the patch makes things somewhat uglier. I tried to bring a minimal amount of changes so far, just to emulate the 'old' behavior of update_sched_domains().
> I guess, common_cpu_mem_hotplug_unplug() needs to be split up into cpu- and mem-hotplug parts to make it cleaner ]
>
> (not tested yet)
>
> ---
argh, this one compiles (will test shortly).
Signed-off-by: Dmitry Adamushko
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..798b3ab 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct
cpuset *root)
* in order to minimize text size.
*/
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
{
cgroup_lock();
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
* Scheduler destroys domains on hotplug events.
* Rebuild them based on the current settings.
*/
- rebuild_sched_domains();
+ if (rebuild_sd)
+ rebuild_sched_domains();
cgroup_unlock();
}
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
unsigned long phase, void *unused_cpu)
{
- if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+ switch (phase) {
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
+ break;
+ default:
return NOTIFY_DONE;
+ }
- common_cpu_mem_hotplug_unplug();
- return 0;
+ return NOTIFY_OK;
}
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct
notifier_block *unused_nb,
void cpuset_track_online_nodes(void)
{
- common_cpu_mem_hotplug_unplug();
+ common_cpu_mem_hotplug_unplug(0);
}
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: current linux-2.6.git: cpusets completely broken
Linus,
(just that we have it all together in one place, ready for testing and
further consideration).
below is the patch and explanation.
Basically the fix below just emulates the 'old' behavior
of update_sched_domains(). We call rebuild_sched_domains() for the same hotplug-events
as it was called (and is still called for !CPUSETS case) in update_sched_domains().
The aim is to keep sched-domain consistent wrt cpu-down/up.
This should be a minimal change. Effectively, the change is against
f18f982abf183e91f435990d337164c7a43d1e6d. So the logic of this patch should be easily visible comparing it to
what the aforementioned commit does.
Ingo, could also please comment on this issue? TIA.
Subject: fix cpuset_handle_cpuhp()
The following commit
---
commit f18f982abf183e91f435990d337164c7a43d1e6d
Author: Max Krasnyansky
Date: Thu May 29 11:17:01 2008 -0700
sched: CPU hotplug events must not destroy scheduler domains created by
the cpusets
---
[ Note, with this commit arch_update_cpu_topology is not called any more for CPUSETS. But it's just a nop.
The whole scheme should be probably reworked later. ]
introduced a hotplug-related problem as described below:
[ Basically the fix below just emulates the 'old' behavior of update_sched_domains().
We call rebuild_sched_domains() for the same hotplug-events as it was called (and is still called
for !CPUSETS case) in update_sched_domains(). ]
Upon CPU_DOWN_PREPARE, update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
does the following:
/*
* Force a reinitialization of the sched domains hierarchy. The domains
* and groups cannot be updated in place without racing with the
balancing
* code, so we temporarily attach all running cpus to the NULL domain
* which will prevent rebalancing while the sched domains are
recalculated.
*/
The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.
With Max's patch, sched-domains' reinitialization is delegated to
CPUSETS code:
cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()
Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.
__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).
try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.
Signed-off-by: Dmitry Adamushko
CC: Ingo Molnar
CC: Vegard Nossum
CC: Paul Menage
CC: Max Krasnyansky
CC: Paul Jackson
CC: Peter Zijlstra
CC: miaox@cn.fujitsu.com
CC: rostedt@goodmis.org
CC: Thomas Gleixner
---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..798b3ab 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
* in order to minimize text size.
*/
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
{
cgroup_lock();
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
* Scheduler destroys domains on hotplug events.
* Rebuild them based on the current settings.
*/
- rebuild_sched_domains();
+ if (rebuild_sd)
+ rebuild_sched_domains();
cgroup_unlock();
}
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
unsigned long phase, void *unused_cpu)
{
- if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+ switch (phase) {
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
+ break;
+ default:
return NOTIFY_DONE;
+ }
- common_cpu_mem_hotplug_unplug();
- return 0;
+ return NOTIFY_OK;
}
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
void cpuset_track_online_nodes(void)
{
- common_cpu_mem_hotplug_unplug();
+ common_cpu_mem_hotplug_unplug(0);
}
#endif
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: current linux-2.6.git: cpusets completely broken
On Sun, Jul 13, 2008 at 2:10 AM, Dmitry Adamushko
wrote:
> Subject: fix cpuset_handle_cpuhp()
>
> The following commit
>
> ---
> commit f18f982abf183e91f435990d337164c7a43d1e6d
> Author: Max Krasnyansky
> Date: Thu May 29 11:17:01 2008 -0700
>
> sched: CPU hotplug events must not destroy scheduler domains created by
> the cpusets
> ---
>
> [ Note, with this commit arch_update_cpu_topology is not called any more for CPUSETS. But it's just a nop.
> The whole scheme should be probably reworked later. ]
>
>
> introduced a hotplug-related problem as described below:
>
> [ Basically the fix below just emulates the 'old' behavior of update_sched_domains().
> We call rebuild_sched_domains() for the same hotplug-events as it was called (and is still called
> for !CPUSETS case) in update_sched_domains(). ]
>
>
> Upon CPU_DOWN_PREPARE, update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
> does the following:
>
> /*
> * Force a reinitialization of the sched domains hierarchy. The domains
> * and groups cannot be updated in place without racing with the
> balancing
> * code, so we temporarily attach all running cpus to the NULL domain
> * which will prevent rebalancing while the sched domains are
> recalculated.
> */
>
> The sched-domains should be rebuilt when a CPU_DOWN ops. has been
> completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
> CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
> initial state). That's what update_sched_domains() also does but only
> for !CPUSETS case.
>
> With Max's patch, sched-domains' reinitialization is delegated to
> CPUSETS code:
>
> cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
> rebuild_sched_domains()
>
> Being called for CPU_UP_PREPARE and if its callback is called after
> update_sched_domains()), it just negates all the work done by
> update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
> the sched-domains and that makes it visible for the load-balancer
> while the CPU_DOWN ops. is in progress.
>
> __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
> "offline" when this function is called).
>
> try_to_wake_up() is called for one of these tasks from another CPU ->
> the load-balancer (wake_idle()) picks up a "dead" CPU and places the
> task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
> -> oops.
>
>
> Signed-off-by: Dmitry Adamushko
Tested-by: Vegard Nossum
Works :-)
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: current linux-2.6.git: cpusets completely broken
* Vegard Nossum wrote:
> On Sun, Jul 13, 2008 at 2:10 AM, Dmitry Adamushko
> wrote:
> > Subject: fix cpuset_handle_cpuhp()
> >
> > The following commit
> >
> > ---
> > commit f18f982abf183e91f435990d337164c7a43d1e6d
> > Author: Max Krasnyansky
> > Date: Thu May 29 11:17:01 2008 -0700
> >
> > sched: CPU hotplug events must not destroy scheduler domains created by
> > the cpusets
> > ---
> >
> > [ Note, with this commit arch_update_cpu_topology is not called any more for CPUSETS. But it's just a nop.
> > The whole scheme should be probably reworked later. ]
> >
> >
> > introduced a hotplug-related problem as described below:
> >
> > [ Basically the fix below just emulates the 'old' behavior of update_sched_domains().
> > We call rebuild_sched_domains() for the same hotplug-events as it was called (and is still called
> > for !CPUSETS case) in update_sched_domains(). ]
> >
> >
> > Upon CPU_DOWN_PREPARE, update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
> > does the following:
> >
> > /*
> > * Force a reinitialization of the sched domains hierarchy. The domains
> > * and groups cannot be updated in place without racing with the
> > balancing
> > * code, so we temporarily attach all running cpus to the NULL domain
> > * which will prevent rebalancing while the sched domains are
> > recalculated.
> > */
> >
> > The sched-domains should be rebuilt when a CPU_DOWN ops. has been
> > completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
> > CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
> > initial state). That's what update_sched_domains() also does but only
> > for !CPUSETS case.
> >
> > With Max's patch, sched-domains' reinitialization is delegated to
> > CPUSETS code:
> >
> > cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
> > rebuild_sched_domains()
> >
> > Being called for CPU_UP_PREPARE and if its callback is called after
> > update_sched_domains()), it just negates all the work done by
> > update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
> > the sched-domains and that makes it visible for the load-balancer
> > while the CPU_DOWN ops. is in progress.
> >
> > __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
> > "offline" when this function is called).
> >
> > try_to_wake_up() is called for one of these tasks from another CPU ->
> > the load-balancer (wake_idle()) picks up a "dead" CPU and places the
> > task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
> > -> oops.
> >
> >
> > Signed-off-by: Dmitry Adamushko
>
> Tested-by: Vegard Nossum
>
> Works :-)
thanks! I've tidied up the changelog and queued it up into
tip/sched/urgent. I'd prefer this more conservative patch so late in the
cycle, but i'll also queue up the more intrusive real fix from Linus and
Dmitry in sched/devel.
Linus, if you've not applied it already, you can pull Dmitry's fix from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git sched-fixes-for-linus
shortlog, diffstat and diff below.
Thanks,
Ingo
------------------>
Dmitry Adamushko (1):
cpusets, hotplug, scheduler: fix scheduler domain breakage
kernel/cpuset.c | 24 ++++++++++++++++++------
1 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9fceb97..798b3ab 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1882,7 +1882,7 @@ static void scan_for_empty_cpusets(const struct cpuset *root)
* in order to minimize text size.
*/
-static void common_cpu_mem_hotplug_unplug(void)
+static void common_cpu_mem_hotplug_unplug(int rebuild_sd)
{
cgroup_lock();
@@ -1894,7 +1894,8 @@ static void common_cpu_mem_hotplug_unplug(void)
* Scheduler destroys domains on hotplug events.
* Rebuild them based on the current settings.
*/
- rebuild_sched_domains();
+ if (rebuild_sd)
+ rebuild_sched_domains();
cgroup_unlock();
}
@@ -1912,11 +1913,22 @@ static void common_cpu_mem_hotplug_unplug(void)
static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
unsigned long phase, void *unused_cpu)
{
- if (phase == CPU_DYING || phase == CPU_DYING_FROZEN)
+ switch (phase) {
+ case CPU_UP_CANCELED:
+ case CPU_UP_CANCELED_FROZEN:
+ case CPU_DOWN_FAILED:
+ case CPU_DOWN_FAILED_FROZEN:
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ common_cpu_mem_hotplug_unplug(1);
+ break;
+ default:
return NOTIFY_DONE;
+ }
- common_cpu_mem_hotplug_unplug();
- return 0;
+ return NOTIFY_OK;
}
#ifdef CONFIG_MEMORY_HOTPLUG
@@ -1929,7 +1941,7 @@ static int cpuset_handle_cpuhp(struct notifier_block *unused_nb,
void cpuset_track_online_nodes(void)
{
- common_cpu_mem_hotplug_unplug();
+ common_cpu_mem_hotplug_unplug(0);
}
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/