[RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock - Kernel

This is a discussion on [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock - Kernel ; This patch adds a hierarchy_mutex to the cgroup_subsys object that protects changes to the hierarchy observed by that subsystem. It is taken by the cgroup subsystem for the following operations: - linking a cgroup into that subsystem's cgroup tree - ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

  1. [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock


    This patch adds a hierarchy_mutex to the cgroup_subsys object that
    protects changes to the hierarchy observed by that subsystem. It is
    taken by the cgroup subsystem for the following operations:

    - linking a cgroup into that subsystem's cgroup tree
    - unlinking a cgroup from that subsystem's cgroup tree
    - moving the subsystem to/from a hierarchy (including across the
    bind() callback)

    Thus if the subsystem holds its own hierarchy_mutex, it can safely
    traverse its ts own hierarchy.

    This patch also changes the cpuset hotplug code to take
    cpuset_subsys.hierarchy_mutex rather than cgroup_mutex, removing a
    cyclical locking dependancy between cpu_hotplug.lock and cgroup_mutex.

    ---

    This is just a first cut - a full version of this patch would also
    make use of the hierarchy_mutex at other places in cpuset.c that use
    the cgroup hierarchy, but I wanted to see what people thought of the
    concept. It appears to keep lockdep quiet.

    include/linux/cgroup.h | 10 ++++++++--
    kernel/cgroup.c | 35 ++++++++++++++++++++++++++++++++++-
    kernel/cpuset.c | 4 ++--
    3 files changed, 44 insertions(+), 5 deletions(-)

    Index: lockfix-2.6.26-rc5-mm3/include/linux/cgroup.h
    ================================================== =================
    --- lockfix-2.6.26-rc5-mm3.orig/include/linux/cgroup.h
    +++ lockfix-2.6.26-rc5-mm3/include/linux/cgroup.h
    @@ -319,9 +319,15 @@ struct cgroup_subsys {
    #define MAX_CGROUP_TYPE_NAMELEN 32
    const char *name;

    - /* Protected by RCU */
    - struct cgroupfs_root *root;
    + /*
    + * Protects sibling/children links of cgroups in this
    + * hierarchy, plus protects which hierarchy (or none) the
    + * subsystem is a part of (i.e. root/sibling)
    + */
    + struct mutex hierarchy_mutex;

    + /* Protected by RCU, this->hierarchy_mutex and cgroup_lock() */
    + struct cgroupfs_root *root;
    struct list_head sibling;

    void *private;
    Index: lockfix-2.6.26-rc5-mm3/kernel/cgroup.c
    ================================================== =================
    --- lockfix-2.6.26-rc5-mm3.orig/kernel/cgroup.c
    +++ lockfix-2.6.26-rc5-mm3/kernel/cgroup.c
    @@ -728,23 +728,26 @@ static int rebind_subsystems(struct cgro
    BUG_ON(cgrp->subsys[i]);
    BUG_ON(!dummytop->subsys[i]);
    BUG_ON(dummytop->subsys[i]->cgroup != dummytop);
    + mutex_lock(&ss->hierarchy_mutex);
    cgrp->subsys[i] = dummytop->subsys[i];
    cgrp->subsys[i]->cgroup = cgrp;
    list_add(&ss->sibling, &root->subsys_list);
    rcu_assign_pointer(ss->root, root);
    if (ss->bind)
    ss->bind(ss, cgrp);
    -
    + mutex_unlock(&ss->hierarchy_mutex);
    } else if (bit & removed_bits) {
    /* We're removing this subsystem */
    BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
    BUG_ON(cgrp->subsys[i]->cgroup != cgrp);
    + mutex_lock(&ss->hierarchy_mutex);
    if (ss->bind)
    ss->bind(ss, dummytop);
    dummytop->subsys[i]->cgroup = dummytop;
    cgrp->subsys[i] = NULL;
    rcu_assign_pointer(subsys[i]->root, &rootnode);
    list_del(&ss->sibling);
    + mutex_unlock(&ss->hierarchy_mutex);
    } else if (bit & final_bits) {
    /* Subsystem state should already exist */
    BUG_ON(!cgrp->subsys[i]);
    @@ -2291,6 +2294,29 @@ static void init_cgroup_css(struct cgrou
    cgrp->subsys[ss->subsys_id] = css;
    }

    +static void cgroup_lock_hierarchy(struct cgroupfs_root *root)
    +{
    + /* We need to take each hierarchy_mutex in a consistent order */
    + int i;
    +
    + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
    + struct cgroup_subsys *ss = subsys[i];
    + if (ss->root == root)
    + mutex_lock_nested(&ss->hierarchy_mutex, i);
    + }
    +}
    +
    +static void cgroup_unlock_hierarchy(struct cgroupfs_root *root)
    +{
    + int i;
    +
    + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
    + struct cgroup_subsys *ss = subsys[i];
    + if (ss->root == root)
    + mutex_unlock(&ss->hierarchy_mutex);
    + }
    +}
    +
    /*
    * cgroup_create - create a cgroup
    * @parent: cgroup that will be parent of the new cgroup
    @@ -2342,8 +2368,10 @@ static long cgroup_create(struct cgroup
    init_cgroup_css(css, ss, cgrp);
    }

    + cgroup_lock_hierarchy(root);
    list_add(&cgrp->sibling, &cgrp->parent->children);
    root->number_of_cgroups++;
    + cgroup_unlock_hierarchy(root);

    err = cgroup_create_dir(cgrp, dentry, mode);
    if (err < 0)
    @@ -2460,8 +2488,12 @@ static int cgroup_rmdir(struct inode *un
    if (!list_empty(&cgrp->release_list))
    list_del(&cgrp->release_list);
    spin_unlock(&release_list_lock);
    +
    + cgroup_lock_hierarchy(root);
    /* delete my sibling from parent->children */
    list_del(&cgrp->sibling);
    + cgroup_unlock_hierarchy(root);
    +
    spin_lock(&cgrp->dentry->d_lock);
    d = dget(cgrp->dentry);
    cgrp->dentry = NULL;
    @@ -2504,6 +2536,7 @@ static void __init cgroup_init_subsys(st
    * need to invoke fork callbacks here. */
    BUG_ON(!list_empty(&init_task.tasks));

    + mutex_init(&ss->hierarchy_mutex);
    ss->active = 1;
    }

    Index: lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
    ================================================== =================
    --- lockfix-2.6.26-rc5-mm3.orig/kernel/cpuset.c
    +++ lockfix-2.6.26-rc5-mm3/kernel/cpuset.c
    @@ -1929,7 +1929,7 @@ static void scan_for_empty_cpusets(const

    static void common_cpu_mem_hotplug_unplug(void)
    {
    - cgroup_lock();
    + mutex_lock(&cpuset_subsys.hierarchy_mutex);

    top_cpuset.cpus_allowed = cpu_online_map;
    top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
    @@ -1941,7 +1941,7 @@ static void common_cpu_mem_hotplug_unplu
    */
    rebuild_sched_domains();

    - cgroup_unlock();
    + mutex_unlock(&cpuset_subsys.hierarchy_mutex);
    }

    /*
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

    On Fri, Jun 27, 2008 at 12:59 AM, Paul Menage wrote:
    >
    > This patch adds a hierarchy_mutex to the cgroup_subsys object that
    > protects changes to the hierarchy observed by that subsystem. It is
    > taken by the cgroup subsystem for the following operations:
    >
    > - linking a cgroup into that subsystem's cgroup tree
    > - unlinking a cgroup from that subsystem's cgroup tree
    > - moving the subsystem to/from a hierarchy (including across the
    > bind() callback)
    >
    > Thus if the subsystem holds its own hierarchy_mutex, it can safely
    > traverse its ts own hierarchy.
    >


    It struck me that there's a small race in this code now that we're not
    doing cgroup_lock() in the hotplug path.

    - we start to attach a task T to cpuset C, with a single CPU "X" in
    its "cpus" list
    - cpuset_can_attach() returns successfully since the cpuset has a cpu
    - CPU X gets hot-unplugged; any tasks in C are moved to their parent
    cpuset and C loses its cpu.
    - we update T->cgroups to point to C, which is broken since C has no cpus.

    So we'll need some additional locking work on top of this - but I
    think this patch is still a step in the right direction.

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. references:in-reply-to:content-type:

    Hi Paul,

    Sorry for the delay.

    Paul Menage wrote:
    > On Fri, Jun 27, 2008 at 12:59 AM, Paul Menage wrote:
    >> This patch adds a hierarchy_mutex to the cgroup_subsys object that
    >> protects changes to the hierarchy observed by that subsystem. It is
    >> taken by the cgroup subsystem for the following operations:
    >>
    >> - linking a cgroup into that subsystem's cgroup tree
    >> - unlinking a cgroup from that subsystem's cgroup tree
    >> - moving the subsystem to/from a hierarchy (including across the
    >> bind() callback)
    >>
    >> Thus if the subsystem holds its own hierarchy_mutex, it can safely
    >> traverse its ts own hierarchy.
    >>

    >
    > It struck me that there's a small race in this code now that we're not
    > doing cgroup_lock() in the hotplug path.
    >
    > - we start to attach a task T to cpuset C, with a single CPU "X" in
    > its "cpus" list
    > - cpuset_can_attach() returns successfully since the cpuset has a cpu
    > - CPU X gets hot-unplugged; any tasks in C are moved to their parent
    > cpuset and C loses its cpu.
    > - we update T->cgroups to point to C, which is broken since C has no cpus.
    >
    > So we'll need some additional locking work on top of this - but I
    > think this patch is still a step in the right direction.

    I was about to say "yeah, looks good" and then tried a couple of
    different hot-plug scenarious.
    We still have circular locking even with your patch

    [ INFO: possible circular locking dependency detected ]
    2.6.26-rc8 #4
    -------------------------------------------------------
    bash/2779 is trying to acquire lock:
    (&cpu_hotplug.lock){--..}, at: [] get_online_cpus+0x24/0x40

    but task is already holding lock:
    (sched_domains_mutex){--..}, at: []
    partition_sched_domains+0x29/0x2b0

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (sched_domains_mutex){--..}:
    [] __lock_acquire+0x9cf/0xe50
    [] lock_acquire+0x5b/0x80
    [] mutex_lock_nested+0x94/0x250
    [] partition_sched_domains+0x29/0x2b0
    [] rebuild_sched_domains+0x9d/0x3f0
    [] cpuset_handle_cpuhp+0x205/0x220
    [] notifier_call_chain+0x3f/0x80
    [] __raw_notifier_call_chain+0x9/0x10
    [] _cpu_down+0xa8/0x290
    [] cpu_down+0x3b/0x60
    [] store_online+0x48/0xa0
    [] sysdev_store+0x24/0x30
    [] sysfs_write_file+0xca/0x140
    [] vfs_write+0xcb/0x170
    [] sys_write+0x50/0x90
    [] system_call_after_swapgs+0x7b/0x80
    [] 0xffffffffffffffff

    -> #1 (&ss->hierarchy_mutex){--..}:
    [] __lock_acquire+0x9cf/0xe50
    [] lock_acquire+0x5b/0x80
    [] mutex_lock_nested+0x94/0x250
    [] cpuset_handle_cpuhp+0x39/0x220
    [] notifier_call_chain+0x3f/0x80
    [] __raw_notifier_call_chain+0x9/0x10
    [] _cpu_down+0xa8/0x290
    [] cpu_down+0x3b/0x60
    [] store_online+0x48/0xa0
    [] sysdev_store+0x24/0x30
    [] sysfs_write_file+0xca/0x140
    [] vfs_write+0xcb/0x170
    [] sys_write+0x50/0x90
    [] system_call_after_swapgs+0x7b/0x80
    [] 0xffffffffffffffff

    -> #0 (&cpu_hotplug.lock){--..}:
    [] __lock_acquire+0xa53/0xe50
    [] lock_acquire+0x5b/0x80
    [] mutex_lock_nested+0x94/0x250
    [] get_online_cpus+0x24/0x40
    [] sched_getaffinity+0x11/0x80
    [] __synchronize_sched+0x19/0x90
    [] detach_destroy_domains+0x46/0x50
    [] partition_sched_domains+0xf9/0x2b0
    [] rebuild_sched_domains+0x9d/0x3f0
    [] cpuset_common_file_write+0x2b8/0x5c0
    [] cgroup_file_write+0x7c/0x1a0
    [] vfs_write+0xcb/0x170
    [] sys_write+0x50/0x90
    [] system_call_after_swapgs+0x7b/0x80
    [] 0xffffffffffffffff

    other info that might help us debug this:

    2 locks held by bash/2779:
    #0: (cgroup_mutex){--..}, at: [] cgroup_lock+0x12/0x20
    #1: (sched_domains_mutex){--..}, at: []
    partition_sched_domains+0x29/0x2b0

    stack backtrace:
    Pid: 2779, comm: bash Not tainted 2.6.26-rc8 #4

    Call Trace:
    [] print_circular_bug_tail+0x8c/0x90
    [] ? print_circular_bug_entry+0x54/0x60
    [] __lock_acquire+0xa53/0xe50
    [] ? get_online_cpus+0x24/0x40
    [] lock_acquire+0x5b/0x80
    [] ? get_online_cpus+0x24/0x40
    [] mutex_lock_nested+0x94/0x250
    [] ? mark_held_locks+0x4d/0x90
    [] get_online_cpus+0x24/0x40
    [] sched_getaffinity+0x11/0x80
    [] __synchronize_sched+0x19/0x90
    [] detach_destroy_domains+0x46/0x50
    [] partition_sched_domains+0xf9/0x2b0
    [] ? trace_hardirqs_on+0xc1/0xe0
    [] rebuild_sched_domains+0x9d/0x3f0
    [] cpuset_common_file_write+0x2b8/0x5c0
    [] ? cpuset_test_cpumask+0x0/0x20
    [] ? cpuset_change_cpumask+0x0/0x20
    [] ? started_after+0x0/0x50
    [] cgroup_file_write+0x7c/0x1a0
    [] vfs_write+0xcb/0x170
    [] sys_write+0x50/0x90
    [] system_call_after_swapgs+0x7b/0x80

    CPU3 attaching NULL sched-domain.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

    On Tue, Jul 1, 2008 at 8:55 PM, Max Krasnyansky wrote:
    > I was about to say "yeah, looks good" and then tried a couple of
    > different hot-plug scenarious.
    > We still have circular locking even with your patch
    >


    What sequence of actions do you do? I've not been able to reproduce a
    lockdep failure.

    Paul

    > [ INFO: possible circular locking dependency detected ]
    > 2.6.26-rc8 #4
    > -------------------------------------------------------
    > bash/2779 is trying to acquire lock:
    > (&cpu_hotplug.lock){--..}, at: [] get_online_cpus+0x24/0x40
    >
    > but task is already holding lock:
    > (sched_domains_mutex){--..}, at: []
    > partition_sched_domains+0x29/0x2b0
    >
    > which lock already depends on the new lock.
    >
    > the existing dependency chain (in reverse order) is:
    >
    > -> #2 (sched_domains_mutex){--..}:
    > [] __lock_acquire+0x9cf/0xe50
    > [] lock_acquire+0x5b/0x80
    > [] mutex_lock_nested+0x94/0x250
    > [] partition_sched_domains+0x29/0x2b0
    > [] rebuild_sched_domains+0x9d/0x3f0
    > [] cpuset_handle_cpuhp+0x205/0x220
    > [] notifier_call_chain+0x3f/0x80
    > [] __raw_notifier_call_chain+0x9/0x10
    > [] _cpu_down+0xa8/0x290
    > [] cpu_down+0x3b/0x60
    > [] store_online+0x48/0xa0
    > [] sysdev_store+0x24/0x30
    > [] sysfs_write_file+0xca/0x140
    > [] vfs_write+0xcb/0x170
    > [] sys_write+0x50/0x90
    > [] system_call_after_swapgs+0x7b/0x80
    > [] 0xffffffffffffffff
    >
    > -> #1 (&ss->hierarchy_mutex){--..}:
    > [] __lock_acquire+0x9cf/0xe50
    > [] lock_acquire+0x5b/0x80
    > [] mutex_lock_nested+0x94/0x250
    > [] cpuset_handle_cpuhp+0x39/0x220
    > [] notifier_call_chain+0x3f/0x80
    > [] __raw_notifier_call_chain+0x9/0x10
    > [] _cpu_down+0xa8/0x290
    > [] cpu_down+0x3b/0x60
    > [] store_online+0x48/0xa0
    > [] sysdev_store+0x24/0x30
    > [] sysfs_write_file+0xca/0x140
    > [] vfs_write+0xcb/0x170
    > [] sys_write+0x50/0x90
    > [] system_call_after_swapgs+0x7b/0x80
    > [] 0xffffffffffffffff
    >
    > -> #0 (&cpu_hotplug.lock){--..}:
    > [] __lock_acquire+0xa53/0xe50
    > [] lock_acquire+0x5b/0x80
    > [] mutex_lock_nested+0x94/0x250
    > [] get_online_cpus+0x24/0x40
    > [] sched_getaffinity+0x11/0x80
    > [] __synchronize_sched+0x19/0x90
    > [] detach_destroy_domains+0x46/0x50
    > [] partition_sched_domains+0xf9/0x2b0
    > [] rebuild_sched_domains+0x9d/0x3f0
    > [] cpuset_common_file_write+0x2b8/0x5c0
    > [] cgroup_file_write+0x7c/0x1a0
    > [] vfs_write+0xcb/0x170
    > [] sys_write+0x50/0x90
    > [] system_call_after_swapgs+0x7b/0x80
    > [] 0xffffffffffffffff
    >
    > other info that might help us debug this:
    >
    > 2 locks held by bash/2779:
    > #0: (cgroup_mutex){--..}, at: [] cgroup_lock+0x12/0x20
    > #1: (sched_domains_mutex){--..}, at: []
    > partition_sched_domains+0x29/0x2b0
    >
    > stack backtrace:
    > Pid: 2779, comm: bash Not tainted 2.6.26-rc8 #4
    >
    > Call Trace:
    > [] print_circular_bug_tail+0x8c/0x90
    > [] ? print_circular_bug_entry+0x54/0x60
    > [] __lock_acquire+0xa53/0xe50
    > [] ? get_online_cpus+0x24/0x40
    > [] lock_acquire+0x5b/0x80
    > [] ? get_online_cpus+0x24/0x40
    > [] mutex_lock_nested+0x94/0x250
    > [] ? mark_held_locks+0x4d/0x90
    > [] get_online_cpus+0x24/0x40
    > [] sched_getaffinity+0x11/0x80
    > [] __synchronize_sched+0x19/0x90
    > [] detach_destroy_domains+0x46/0x50
    > [] partition_sched_domains+0xf9/0x2b0
    > [] ? trace_hardirqs_on+0xc1/0xe0
    > [] rebuild_sched_domains+0x9d/0x3f0
    > [] cpuset_common_file_write+0x2b8/0x5c0
    > [] ? cpuset_test_cpumask+0x0/0x20
    > [] ? cpuset_change_cpumask+0x0/0x20
    > [] ? started_after+0x0/0x50
    > [] cgroup_file_write+0x7c/0x1a0
    > [] vfs_write+0xcb/0x170
    > [] sys_write+0x50/0x90
    > [] system_call_after_swapgs+0x7b/0x80
    >
    > CPU3 attaching NULL sched-domain.
    >
    >
    >
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. references:in-reply-to:content-type:



    Paul Menage wrote:
    > On Tue, Jul 1, 2008 at 8:55 PM, Max Krasnyansky wrote:
    >> I was about to say "yeah, looks good" and then tried a couple of
    >> different hot-plug scenarious.
    >> We still have circular locking even with your patch
    >>

    >
    > What sequence of actions do you do? I've not been able to reproduce a
    > lockdep failure.


    mkdir /dev/cpuset
    mount -t cgroup -o cpuset cpuset /dev/cpuset
    mkdir /dev/cpuset/0
    mkdir /dev/cpuset/1
    echo 0-2 > /dev/cpuset/0/cpuset.cpus
    echo 3 > /dev/cpuset/1/cpuset.cpus
    echo 0 > /dev/cpuset/cpuset.sched_load_balance
    echo 0 > /sys/devices/system/cpu/cpu3/online

    I just tried it again and got exact same lockdep output that I sent before.

    Max

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. references:in-reply-to:content-type:



    Max Krasnyansky wrote:
    >
    > Paul Menage wrote:
    >> On Tue, Jul 1, 2008 at 8:55 PM, Max Krasnyansky wrote:
    >>> I was about to say "yeah, looks good" and then tried a couple of
    >>> different hot-plug scenarious.
    >>> We still have circular locking even with your patch
    >>>

    >> What sequence of actions do you do? I've not been able to reproduce a
    >> lockdep failure.

    >
    > mkdir /dev/cpuset
    > mount -t cgroup -o cpuset cpuset /dev/cpuset
    > mkdir /dev/cpuset/0
    > mkdir /dev/cpuset/1
    > echo 0-2 > /dev/cpuset/0/cpuset.cpus
    > echo 3 > /dev/cpuset/1/cpuset.cpus
    > echo 0 > /dev/cpuset/cpuset.sched_load_balance
    > echo 0 > /sys/devices/system/cpu/cpu3/online
    >
    > I just tried it again and got exact same lockdep output that I sent before.


    Paul, any luck with that ?

    Max
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

    On Wed, Jul 2, 2008 at 9:46 PM, Max Krasnyansky wrote:
    >
    > mkdir /dev/cpuset
    > mount -t cgroup -o cpuset cpuset /dev/cpuset
    > mkdir /dev/cpuset/0
    > mkdir /dev/cpuset/1
    > echo 0-2 > /dev/cpuset/0/cpuset.cpus
    > echo 3 > /dev/cpuset/1/cpuset.cpus
    > echo 0 > /dev/cpuset/cpuset.sched_load_balance
    > echo 0 > /sys/devices/system/cpu/cpu3/online
    >


    OK, I still can't reproduce this, on a 2-cpu system using one cpu for
    each cpuset.

    But the basic problem seemns to be that we have cpu_hotplug.lock taken
    at the outer level (when offlining a CPU) and at the inner level (via
    get_online_cpus() called from the guts of partition_sched_domains(),
    if we didn't already take it at the outer level.

    While looking at the code trying to figure out a nice way around this,
    it struck me that we have the call path

    cpuset_track_online_nodes() ->
    common_cpu_mem_hotplug_unplug() ->
    scan_for_empty_cpusets() ->
    access cpu_online_map with no calls to get_online_cpus()

    Is that allowed? Maybe we need separate versions of
    scan_for_empty_cpusets() that look at memory and cpus?

    I think that we're going to want eventually a solution such pushing
    the locking of cpuset_subsys.hierarchy_mutex down into the first part
    of partition_sched_domains, that actually walks the cpuset tree

    Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. references:in-reply-to:content-type:

    Paul Menage wrote:
    > On Wed, Jul 2, 2008 at 9:46 PM, Max Krasnyansky wrote:
    >> mkdir /dev/cpuset
    >> mount -t cgroup -o cpuset cpuset /dev/cpuset
    >> mkdir /dev/cpuset/0
    >> mkdir /dev/cpuset/1
    >> echo 0-2 > /dev/cpuset/0/cpuset.cpus
    >> echo 3 > /dev/cpuset/1/cpuset.cpus
    >> echo 0 > /dev/cpuset/cpuset.sched_load_balance
    >> echo 0 > /sys/devices/system/cpu/cpu3/online
    >>

    >
    > OK, I still can't reproduce this, on a 2-cpu system using one cpu for
    > each cpuset.
    >
    > But the basic problem seemns to be that we have cpu_hotplug.lock taken
    > at the outer level (when offlining a CPU) and at the inner level (via
    > get_online_cpus() called from the guts of partition_sched_domains(),
    > if we didn't already take it at the outer level.

    Yes. I'm looking at this stuff right now.

    > While looking at the code trying to figure out a nice way around this,
    > it struck me that we have the call path
    >
    > cpuset_track_online_nodes() ->
    > common_cpu_mem_hotplug_unplug() ->
    > scan_for_empty_cpusets() ->
    > access cpu_online_map with no calls to get_online_cpus()
    >
    > Is that allowed? Maybe we need separate versions of
    > scan_for_empty_cpusets() that look at memory and cpus?

    Yes. This also came up in the other discussions. ie That we need to have this
    split up somehow. Looks like my fix for domain handling exposed all kinds of
    problems that we punted on before .

    > I think that we're going to want eventually a solution such pushing
    > the locking of cpuset_subsys.hierarchy_mutex down into the first part
    > of partition_sched_domains, that actually walks the cpuset tree

    Right now I'm trying to get away with not using new locks. Since we need some
    solution for 2.6.26 and new cgroup locking that you added is probably a .27
    material.

    Max
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread