[PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) - Kernel

This is a discussion on [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) - Kernel ; From: Max Krasnyanskiy Addressed Ingo's comments and merged on top of latest Linus's tree. This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 29

Thread: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

  1. [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    From: Max Krasnyanskiy

    Addressed Ingo's comments and merged on top of latest Linus's tree.

    This is based on Linus' idea of creating cpu_active_map that prevents
    scheduler load balancer from migrating tasks to the cpu that is going
    down.

    It allows us to simplify domain management code and avoid unecessary
    domain rebuilds during cpu hotplug event handling.

    Please ignore the cpusets part for now. It needs some more work in order
    to avoid crazy lock nesting. Although I did simplfy and unify domain
    reinitialization logic. We now simply call partition_sched_domains() in
    all the cases. This means that we're using exact same code paths as in
    cpusets case and hence the test below cover cpusets too.
    Cpuset changes to make rebuild_sched_domains() callable from various
    contexts are in the separate patch (right next after this one).

    This not only boots but also easily handles
    while true; do make clean; make -j 8; done
    and
    while true; do on-off-cpu 1; done
    at the same time.
    (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

    Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
    this on right now in gnome-terminal and things are moving just fine.

    Also this is running with most of the debug features enabled (lockdep,
    mutex, etc) no BUG_ONs or lockdep complaints so far.

    I beleive I addressed all of the Dmitry's comments for original Linus'
    version. I changed both fair and rt balancer to mask out non-active cpus.
    And replaced cpu_is_offline() with !cpu_active() in the main scheduler
    code where it made sense (to me).

    Peter, please take a look at how I merged it with your latest RT runtime
    fixes. I basically moved calls to disable_runtime() into the separate
    notifier callback since we do not need update_sched_domains() anymore if
    cpusets are enabled.

    Greg, please take a look at the RT balancer merge. I decided not to muck
    with the cpupri thing and instead mask inactive cpus after the search.

    I've probably missed something but I'd dare to say consider for the
    inclusion ;-)

    Signed-off-by: Max Krasnyanskiy
    Cc: ghaskins@novell.com
    Cc: Max Krasnyanskiy
    Cc: dmitry.adamushko@gmail.com
    Cc: a.p.zijlstra@chello.nl
    Cc: torvalds@linux-foundation.org
    Cc: pj@sgi.com
    Signed-off-by: Ingo Molnar
    ---
    include/linux/cpumask.h | 6 ++-
    include/linux/cpuset.h | 7 +++
    init/main.c | 7 +++
    kernel/cpu.c | 30 ++++++++++---
    kernel/cpuset.c | 2 +-
    kernel/sched.c | 108 +++++++++++++++++++---------------------------
    kernel/sched_fair.c | 3 +
    kernel/sched_rt.c | 7 +++
    8 files changed, 99 insertions(+), 71 deletions(-)

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index c24875b..d614d24 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -359,13 +359,14 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,

    /*
    * The following particular system cpumasks and operations manage
    - * possible, present and online cpus. Each of them is a fixed size
    + * possible, present, active and online cpus. Each of them is a fixed size
    * bitmap of size NR_CPUS.
    *
    * #ifdef CONFIG_HOTPLUG_CPU
    * cpu_possible_map - has bit 'cpu' set iff cpu is populatable
    * cpu_present_map - has bit 'cpu' set iff cpu is populated
    * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
    + * cpu_active_map - has bit 'cpu' set iff cpu available to migration
    * #else
    * cpu_possible_map - has bit 'cpu' set iff cpu is populated
    * cpu_present_map - copy of cpu_possible_map
    @@ -416,6 +417,7 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
    extern cpumask_t cpu_possible_map;
    extern cpumask_t cpu_online_map;
    extern cpumask_t cpu_present_map;
    +extern cpumask_t cpu_active_map;

    #if NR_CPUS > 1
    #define num_online_cpus() cpus_weight(cpu_online_map)
    @@ -424,6 +426,7 @@ extern cpumask_t cpu_present_map;
    #define cpu_online(cpu) cpu_isset((cpu), cpu_online_map)
    #define cpu_possible(cpu) cpu_isset((cpu), cpu_possible_map)
    #define cpu_present(cpu) cpu_isset((cpu), cpu_present_map)
    +#define cpu_active(cpu) cpu_isset((cpu), cpu_active_map)
    #else
    #define num_online_cpus() 1
    #define num_possible_cpus() 1
    @@ -431,6 +434,7 @@ extern cpumask_t cpu_present_map;
    #define cpu_online(cpu) ((cpu) == 0)
    #define cpu_possible(cpu) ((cpu) == 0)
    #define cpu_present(cpu) ((cpu) == 0)
    +#define cpu_active(cpu) ((cpu) == 0)
    #endif

    #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
    diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
    index 0385783..e8f450c 100644
    --- a/include/linux/cpuset.h
    +++ b/include/linux/cpuset.h
    @@ -78,6 +78,8 @@ extern void cpuset_track_online_nodes(void);

    extern int current_cpuset_is_being_rebound(void);

    +extern void rebuild_sched_domains(void);
    +
    #else /* !CONFIG_CPUSETS */

    static inline int cpuset_init_early(void) { return 0; }
    @@ -156,6 +158,11 @@ static inline int current_cpuset_is_being_rebound(void)
    return 0;
    }

    +static inline void rebuild_sched_domains(void)
    +{
    + partition_sched_domains(0, NULL, NULL);
    +}
    +
    #endif /* !CONFIG_CPUSETS */

    #endif /* _LINUX_CPUSET_H */
    diff --git a/init/main.c b/init/main.c
    index f7fb200..bfccff6 100644
    --- a/init/main.c
    +++ b/init/main.c
    @@ -414,6 +414,13 @@ static void __init smp_init(void)
    {
    unsigned int cpu;

    + /*
    + * Set up the current CPU as possible to migrate to.
    + * The other ones will be done by cpu_up/cpu_down()
    + */
    + cpu = smp_processor_id();
    + cpu_set(cpu, cpu_active_map);
    +
    /* FIXME: This should be done in userspace --RR */
    for_each_present_cpu(cpu) {
    if (num_online_cpus() >= setup_max_cpus)
    diff --git a/kernel/cpu.c b/kernel/cpu.c
    index b11f06d..71c5c9d 100644
    --- a/kernel/cpu.c
    +++ b/kernel/cpu.c
    @@ -64,6 +64,8 @@ void __init cpu_hotplug_init(void)
    cpu_hotplug.refcount = 0;
    }

    +cpumask_t cpu_active_map;
    +
    #ifdef CONFIG_HOTPLUG_CPU

    void get_online_cpus(void)
    @@ -291,11 +293,20 @@ int __ref cpu_down(unsigned int cpu)
    int err = 0;

    cpu_maps_update_begin();
    - if (cpu_hotplug_disabled)
    +
    + if (cpu_hotplug_disabled) {
    err = -EBUSY;
    - else
    - err = _cpu_down(cpu, 0);
    + goto out;
    + }
    +
    + cpu_clear(cpu, cpu_active_map);
    +
    + err = _cpu_down(cpu, 0);
    +
    + if (cpu_online(cpu))
    + cpu_set(cpu, cpu_active_map);

    +out:
    cpu_maps_update_done();
    return err;
    }
    @@ -354,11 +365,18 @@ int __cpuinit cpu_up(unsigned int cpu)
    }

    cpu_maps_update_begin();
    - if (cpu_hotplug_disabled)
    +
    + if (cpu_hotplug_disabled) {
    err = -EBUSY;
    - else
    - err = _cpu_up(cpu, 0);
    + goto out;
    + }

    + err = _cpu_up(cpu, 0);
    +
    + if (cpu_online(cpu))
    + cpu_set(cpu, cpu_active_map);
    +
    +out:
    cpu_maps_update_done();
    return err;
    }
    diff --git a/kernel/cpuset.c b/kernel/cpuset.c
    index 459d601..3c3ef02 100644
    --- a/kernel/cpuset.c
    +++ b/kernel/cpuset.c
    @@ -564,7 +564,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
    * partition_sched_domains().
    */

    -static void rebuild_sched_domains(void)
    +void rebuild_sched_domains(void)
    {
    struct kfifo *q; /* queue of cpusets to be scanned */
    struct cpuset *cp; /* scans q */
    diff --git a/kernel/sched.c b/kernel/sched.c
    index 99e6d85..9019f63 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -2881,7 +2881,7 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)

    rq = task_rq_lock(p, &flags);
    if (!cpu_isset(dest_cpu, p->cpus_allowed)
    - || unlikely(cpu_is_offline(dest_cpu)))
    + || unlikely(!cpu_active(dest_cpu)))
    goto out;

    /* force the process onto the specified CPU */
    @@ -3849,7 +3849,7 @@ int select_nohz_load_balancer(int stop_tick)
    /*
    * If we are going offline and still the leader, give up!
    */
    - if (cpu_is_offline(cpu) &&
    + if (!cpu_active(cpu) &&
    atomic_read(&nohz.load_balancer) == cpu) {
    if (atomic_cmpxchg(&nohz.load_balancer, cpu, -1) != cpu)
    BUG();
    @@ -5876,7 +5876,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
    struct rq *rq_dest, *rq_src;
    int ret = 0, on_rq;

    - if (unlikely(cpu_is_offline(dest_cpu)))
    + if (unlikely(!cpu_active(dest_cpu)))
    return ret;

    rq_src = cpu_rq(src_cpu);
    @@ -7553,18 +7553,6 @@ void __attribute__((weak)) arch_update_cpu_topology(void)
    }

    /*
    - * Free current domain masks.
    - * Called after all cpus are attached to NULL domain.
    - */
    -static void free_sched_domains(void)
    -{
    - ndoms_cur = 0;
    - if (doms_cur != &fallback_doms)
    - kfree(doms_cur);
    - doms_cur = &fallback_doms;
    -}
    -
    -/*
    * Set up scheduler domains and groups. Callers must hold the hotplug lock.
    * For now this just excludes isolated cpus, but could be used to
    * exclude other special cases in the future.
    @@ -7642,7 +7630,7 @@ static int dattrs_equal(struct sched_domain_attr *cur, int idx_cur,
    * ownership of it and will kfree it when done with it. If the caller
    * failed the kmalloc call, then it can pass in doms_new == NULL,
    * and partition_sched_domains() will fallback to the single partition
    - * 'fallback_doms'.
    + * 'fallback_doms', it also forces the domains to be rebuilt.
    *
    * Call with hotplug lock held
    */
    @@ -7656,12 +7644,8 @@ void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
    /* always unregister in case we don't destroy any domains */
    unregister_sched_domain_sysctl();

    - if (doms_new == NULL) {
    - ndoms_new = 1;
    - doms_new = &fallback_doms;
    - cpus_andnot(doms_new[0], cpu_online_map, cpu_isolated_map);
    - dattr_new = NULL;
    - }
    + if (doms_new == NULL)
    + ndoms_new = 0;

    /* Destroy deleted domains */
    for (i = 0; i < ndoms_cur; i++) {
    @@ -7676,6 +7660,14 @@ match1:
    ;
    }

    + if (doms_new == NULL) {
    + ndoms_cur = 0;
    + ndoms_new = 1;
    + doms_new = &fallback_doms;
    + cpus_andnot(doms_new[0], cpu_online_map, cpu_isolated_map);
    + dattr_new = NULL;
    + }
    +
    /* Build new domains */
    for (i = 0; i < ndoms_new; i++) {
    for (j = 0; j < ndoms_cur; j++) {
    @@ -7706,17 +7698,10 @@ match2:
    #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
    int arch_reinit_sched_domains(void)
    {
    - int err;
    -
    get_online_cpus();
    - mutex_lock(&sched_domains_mutex);
    - detach_destroy_domains(&cpu_online_map);
    - free_sched_domains();
    - err = arch_init_sched_domains(&cpu_online_map);
    - mutex_unlock(&sched_domains_mutex);
    + rebuild_sched_domains();
    put_online_cpus();
    -
    - return err;
    + return 0;
    }

    static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
    @@ -7782,59 +7767,49 @@ int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls)
    }
    #endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */

    +#ifndef CONFIG_CPUSETS
    /*
    - * Force a reinitialization of the sched domains hierarchy. The domains
    - * and groups cannot be updated in place without racing with the balancing
    - * code, so we temporarily attach all running cpus to the NULL domain
    - * which will prevent rebalancing while the sched domains are recalculated.
    + * Add online and remove offline CPUs from the scheduler domains.
    + * When cpusets are enabled they take over this function.
    */
    static int update_sched_domains(struct notifier_block *nfb,
    unsigned long action, void *hcpu)
    {
    + switch (action) {
    + case CPU_ONLINE:
    + case CPU_ONLINE_FROZEN:
    + case CPU_DEAD:
    + case CPU_DEAD_FROZEN:
    + partition_sched_domains(0, NULL, NULL);
    + return NOTIFY_OK;
    +
    + default:
    + return NOTIFY_DONE;
    + }
    +}
    +#endif
    +
    +static int update_runtime(struct notifier_block *nfb,
    + unsigned long action, void *hcpu)
    +{
    int cpu = (int)(long)hcpu;

    switch (action) {
    case CPU_DOWN_PREPARE:
    case CPU_DOWN_PREPARE_FROZEN:
    disable_runtime(cpu_rq(cpu));
    - /* fall-through */
    - case CPU_UP_PREPARE:
    - case CPU_UP_PREPARE_FROZEN:
    - detach_destroy_domains(&cpu_online_map);
    - free_sched_domains();
    return NOTIFY_OK;

    -
    case CPU_DOWN_FAILED:
    case CPU_DOWN_FAILED_FROZEN:
    case CPU_ONLINE:
    case CPU_ONLINE_FROZEN:
    enable_runtime(cpu_rq(cpu));
    - /* fall-through */
    - case CPU_UP_CANCELED:
    - case CPU_UP_CANCELED_FROZEN:
    - case CPU_DEAD:
    - case CPU_DEAD_FROZEN:
    - /*
    - * Fall through and re-initialise the domains.
    - */
    - break;
    + return NOTIFY_OK;
    +
    default:
    return NOTIFY_DONE;
    }
    -
    -#ifndef CONFIG_CPUSETS
    - /*
    - * Create default domain partitioning if cpusets are disabled.
    - * Otherwise we let cpusets rebuild the domains based on the
    - * current setup.
    - */
    -
    - /* The hotplug lock is already held by cpu_up/cpu_down */
    - arch_init_sched_domains(&cpu_online_map);
    -#endif
    -
    - return NOTIFY_OK;
    }

    void __init sched_init_smp(void)
    @@ -7854,8 +7829,15 @@ void __init sched_init_smp(void)
    cpu_set(smp_processor_id(), non_isolated_cpus);
    mutex_unlock(&sched_domains_mutex);
    put_online_cpus();
    +
    +#ifndef CONFIG_CPUSETS
    /* XXX: Theoretical race here - CPU may be hotplugged now */
    hotcpu_notifier(update_sched_domains, 0);
    +#endif
    +
    + /* RT runtime code needs to handle some hotplug events */
    + hotcpu_notifier(update_runtime, 0);
    +
    init_hrtick();

    /* Move init over to a non-isolated CPU */
    diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
    index f2aa987..d924c67 100644
    --- a/kernel/sched_fair.c
    +++ b/kernel/sched_fair.c
    @@ -1004,6 +1004,8 @@ static void yield_task_fair(struct rq *rq)
    * not idle and an idle cpu is available. The span of cpus to
    * search starts with cpus closest then further out as needed,
    * so we always favor a closer, idle cpu.
    + * Domains may include CPUs that are not usable for migration,
    + * hence we need to mask them out (cpu_active_map)
    *
    * Returns the CPU we should wake onto.
    */
    @@ -1031,6 +1033,7 @@ static int wake_idle(int cpu, struct task_struct *p)
    || ((sd->flags & SD_WAKE_IDLE_FAR)
    && !task_hot(p, task_rq(p)->clock, sd))) {
    cpus_and(tmp, sd->span, p->cpus_allowed);
    + cpus_and(tmp, tmp, cpu_active_map);
    for_each_cpu_mask(i, tmp) {
    if (idle_cpu(i)) {
    if (i != task_cpu(p)) {
    diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    index 47ceac9..5166080 100644
    --- a/kernel/sched_rt.c
    +++ b/kernel/sched_rt.c
    @@ -922,6 +922,13 @@ static int find_lowest_rq(struct task_struct *task)
    return -1; /* No targets found */

    /*
    + * Only consider CPUs that are usable for migration.
    + * I guess we might want to change cpupri_find() to ignore those
    + * in the first place.
    + */
    + cpus_and(*lowest_mask, *lowest_mask, cpu_active_map);
    +
    + /*
    * At this point we have built a mask of cpus representing the
    * lowest priority tasks in the system. Now we want to elect
    * the best one based on our affinity and topology.
    --
    1.5.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    Hi Peter,

    > > > * #ifdef CONFIG_HOTPLUG_CPU
    > > > * cpu_possible_map - has bit 'cpu' set iff cpu is populatable
    > > > * cpu_present_map - has bit 'cpu' set iff cpu is populated
    > > > * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
    > > > + * cpu_active_map - has bit 'cpu' set iff cpu available to migration

    > >
    > > why not just fixing all spelling mistakes instead of keeping them
    > > around

    >
    > What spelling mistakes?


    so it wasn't meant to say "... if cpu ..." and I simply miss something
    here (which is possible)?

    Regards

    Marcel


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    On Tue, 2008-07-15 at 13:49 +0200, Marcel Holtmann wrote:
    > Hi Max,
    >
    > > * #ifdef CONFIG_HOTPLUG_CPU
    > > * cpu_possible_map - has bit 'cpu' set iff cpu is populatable
    > > * cpu_present_map - has bit 'cpu' set iff cpu is populated
    > > * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
    > > + * cpu_active_map - has bit 'cpu' set iff cpu available to migration

    >
    > why not just fixing all spelling mistakes instead of keeping them
    > around


    What spelling mistakes?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    On Tue, 2008-07-15 at 13:57 +0200, Marcel Holtmann wrote:
    > Hi Peter,
    >
    > > > > * #ifdef CONFIG_HOTPLUG_CPU
    > > > > * cpu_possible_map - has bit 'cpu' set iff cpu is populatable
    > > > > * cpu_present_map - has bit 'cpu' set iff cpu is populated
    > > > > * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
    > > > > + * cpu_active_map - has bit 'cpu' set iff cpu available to migration
    > > >
    > > > why not just fixing all spelling mistakes instead of keeping them
    > > > around

    > >
    > > What spelling mistakes?

    >
    > so it wasn't meant to say "... if cpu ..." and I simply miss something
    > here (which is possible)?


    It's a math term, iff means: if and only if.





    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    Hi Max,

    >>> On Tue, Jul 15, 2008 at 7:43 AM, in message

    <1216122229-4865-1-git-send-email-maxk@qualcomm.com>, Max Krasnyansky
    wrote:
    >
    > Greg, please take a look at the RT balancer merge. I decided not to muck
    > with the cpupri thing and instead mask inactive cpus after the search.


    Ill take a look at this later today.

    Regards,
    -Greg



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)



    On Tue, 15 Jul 2008, Max Krasnyansky wrote:
    >
    > I've probably missed something but I'd dare to say consider for the
    > inclusion ;-)


    Acked-by: Linus Torvalds

    Thanks for doing the cleanups and the details.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)



    On Tue, 15 Jul 2008, Marcel Holtmann wrote:
    >
    > so it wasn't meant to say "... if cpu ..." and I simply miss something
    > here (which is possible)?


    Heh. You don't have enough of a math background, it seems. 'iff' is a
    stronger 'if' - it's 'if and only if'.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    2008/7/15 Max Krasnyansky :
    > From: Max Krasnyanskiy
    >
    > Addressed Ingo's comments and merged on top of latest Linus's tree.


    a few remarks:

    (1) in __migrate_task(), a test for !cpu_active(dest_cpu) should be
    done after double_rq_lock() [ or add the second one ]

    migration_thread() calls __migrate_task() with disabled interrupts (no
    rq-locks held), i.e. if we merely rely on rq-locks for
    synchronization, this can race with cpu_down(dest_cpu).

    [ assume, the test was done in __migration_task() and it's about to
    take double_lock()... and at this time, down_cpu(dest_cpu) starts and
    completes on another CPU ]

    note, we may still take the rq-lock for a "dead" cpu in this case and
    then only do a check (remark: in fact, not with stop_machine() in
    place _but_ I consider that we don't make any assumptions on its
    behavior);

    (2) it's worth to take a look at the use of any_online_cpu():

    many places are ok, because there will be an additional check against
    cpu_active_mask later on, but e.g.

    set_cpus_allowed_ptr() ->
    migrate_task(p, any_online_cpu(mask), ...) ->
    migrate_task(p, dest_cpu)

    doesn't seem to have any verifications wrt cpu_active_map.

    (3) I assume, we have some kind of explicit sched_sync() after
    cpu_clear(cpu, cpu_active_mask) because:

    (a) not all places where task-migration may take place do take the
    rq-lock for dest_cpu : e.g. try_to_wake_up() or even
    sched_migrate_task() [ yes, there is a special (one might call
    "subtle") assumption/restriction in this case ]

    that's why the fact that cpu_down() takes the rq-lock for
    soon-to-be-offline cpu at some point can not be a "natural" sync.
    point to guarantee that "stale" cpu_active_map is not used.

    (b) in fact, stop_machine() acts as a (very strong) sync. point,
    sched-wise. But perhaps, we don't want to have this new easy-to-follow
    approach to be built on top of assumptions on how something from
    another sub-system behaves.

    >
    > This is based on Linus' idea of creating cpu_active_map that prevents
    > scheduler load balancer from migrating tasks to the cpu that is going
    > down.
    >
    > It allows us to simplify domain management code and avoid unecessary
    > domain rebuilds during cpu hotplug event handling.
    >
    > Please ignore the cpusets part for now. It needs some more work in order
    > to avoid crazy lock nesting. Although I did simplfy and unify domain
    > reinitialization logic. We now simply call partition_sched_domains() in
    > all the cases. This means that we're using exact same code paths as in
    > cpusets case and hence the test below cover cpusets too.
    > Cpuset changes to make rebuild_sched_domains() callable from various
    > contexts are in the separate patch (right next after this one).
    >
    > This not only boots but also easily handles
    > while true; do make clean; make -j 8; done
    > and
    > while true; do on-off-cpu 1; done
    > at the same time.
    > (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
    >
    > Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
    > this on right now in gnome-terminal and things are moving just fine.
    >
    > Also this is running with most of the debug features enabled (lockdep,
    > mutex, etc) no BUG_ONs or lockdep complaints so far.
    >
    > I beleive I addressed all of the Dmitry's comments for original Linus'
    > version. I changed both fair and rt balancer to mask out non-active cpus.
    > And replaced cpu_is_offline() with !cpu_active() in the main scheduler
    > code where it made sense (to me).
    >
    > Peter, please take a look at how I merged it with your latest RT runtime
    > fixes. I basically moved calls to disable_runtime() into the separate
    > notifier callback since we do not need update_sched_domains() anymore if
    > cpusets are enabled.
    >
    > Greg, please take a look at the RT balancer merge. I decided not to muck
    > with the cpupri thing and instead mask inactive cpus after the search.
    >
    > I've probably missed something but I'd dare to say consider for the
    > inclusion ;-)
    >
    > Signed-off-by: Max Krasnyanskiy
    > Cc: ghaskins@novell.com
    > Cc: Max Krasnyanskiy
    > Cc: dmitry.adamushko@gmail.com
    > Cc: a.p.zijlstra@chello.nl
    > Cc: torvalds@linux-foundation.org
    > Cc: pj@sgi.com
    > Signed-off-by: Ingo Molnar
    > ---
    > include/linux/cpumask.h | 6 ++-
    > include/linux/cpuset.h | 7 +++
    > init/main.c | 7 +++
    > kernel/cpu.c | 30 ++++++++++---
    > kernel/cpuset.c | 2 +-
    > kernel/sched.c | 108 +++++++++++++++++++---------------------------
    > kernel/sched_fair.c | 3 +
    > kernel/sched_rt.c | 7 +++
    > 8 files changed, 99 insertions(+), 71 deletions(-)
    >
    > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    > index c24875b..d614d24 100644
    > --- a/include/linux/cpumask.h
    > +++ b/include/linux/cpumask.h
    > @@ -359,13 +359,14 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
    >
    > /*
    > * The following particular system cpumasks and operations manage
    > - * possible, present and online cpus. Each of them is a fixed size
    > + * possible, present, active and online cpus. Each of them is a fixed size
    > * bitmap of size NR_CPUS.
    > *
    > * #ifdef CONFIG_HOTPLUG_CPU
    > * cpu_possible_map - has bit 'cpu' set iff cpu is populatable
    > * cpu_present_map - has bit 'cpu' set iff cpu is populated
    > * cpu_online_map - has bit 'cpu' set iff cpu available to scheduler
    > + * cpu_active_map - has bit 'cpu' set iff cpu available to migration
    > * #else
    > * cpu_possible_map - has bit 'cpu' set iff cpu is populated
    > * cpu_present_map - copy of cpu_possible_map
    > @@ -416,6 +417,7 @@ static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp,
    > extern cpumask_t cpu_possible_map;
    > extern cpumask_t cpu_online_map;
    > extern cpumask_t cpu_present_map;
    > +extern cpumask_t cpu_active_map;
    >
    > #if NR_CPUS > 1
    > #define num_online_cpus() cpus_weight(cpu_online_map)
    > @@ -424,6 +426,7 @@ extern cpumask_t cpu_present_map;
    > #define cpu_online(cpu) cpu_isset((cpu), cpu_online_map)
    > #define cpu_possible(cpu) cpu_isset((cpu), cpu_possible_map)
    > #define cpu_present(cpu) cpu_isset((cpu), cpu_present_map)
    > +#define cpu_active(cpu) cpu_isset((cpu), cpu_active_map)
    > #else
    > #define num_online_cpus() 1
    > #define num_possible_cpus() 1
    > @@ -431,6 +434,7 @@ extern cpumask_t cpu_present_map;
    > #define cpu_online(cpu) ((cpu) == 0)
    > #define cpu_possible(cpu) ((cpu) == 0)
    > #define cpu_present(cpu) ((cpu) == 0)
    > +#define cpu_active(cpu) ((cpu) == 0)
    > #endif
    >
    > #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
    > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
    > index 0385783..e8f450c 100644
    > --- a/include/linux/cpuset.h
    > +++ b/include/linux/cpuset.h
    > @@ -78,6 +78,8 @@ extern void cpuset_track_online_nodes(void);
    >
    > extern int current_cpuset_is_being_rebound(void);
    >
    > +extern void rebuild_sched_domains(void);
    > +
    > #else /* !CONFIG_CPUSETS */
    >
    > static inline int cpuset_init_early(void) { return 0; }
    > @@ -156,6 +158,11 @@ static inline int current_cpuset_is_being_rebound(void)
    > return 0;
    > }
    >
    > +static inline void rebuild_sched_domains(void)
    > +{
    > + partition_sched_domains(0, NULL, NULL);
    > +}
    > +
    > #endif /* !CONFIG_CPUSETS */
    >
    > #endif /* _LINUX_CPUSET_H */
    > diff --git a/init/main.c b/init/main.c
    > index f7fb200..bfccff6 100644
    > --- a/init/main.c
    > +++ b/init/main.c
    > @@ -414,6 +414,13 @@ static void __init smp_init(void)
    > {
    > unsigned int cpu;
    >
    > + /*
    > + * Set up the current CPU as possible to migrate to.
    > + * The other ones will be done by cpu_up/cpu_down()
    > + */
    > + cpu = smp_processor_id();
    > + cpu_set(cpu, cpu_active_map);
    > +
    > /* FIXME: This should be done in userspace --RR */
    > for_each_present_cpu(cpu) {
    > if (num_online_cpus() >= setup_max_cpus)
    > diff --git a/kernel/cpu.c b/kernel/cpu.c
    > index b11f06d..71c5c9d 100644
    > --- a/kernel/cpu.c
    > +++ b/kernel/cpu.c
    > @@ -64,6 +64,8 @@ void __init cpu_hotplug_init(void)
    > cpu_hotplug.refcount = 0;
    > }
    >
    > +cpumask_t cpu_active_map;
    > +
    > #ifdef CONFIG_HOTPLUG_CPU
    >
    > void get_online_cpus(void)
    > @@ -291,11 +293,20 @@ int __ref cpu_down(unsigned int cpu)
    > int err = 0;
    >
    > cpu_maps_update_begin();
    > - if (cpu_hotplug_disabled)
    > +
    > + if (cpu_hotplug_disabled) {
    > err = -EBUSY;
    > - else
    > - err = _cpu_down(cpu, 0);
    > + goto out;
    > + }
    > +
    > + cpu_clear(cpu, cpu_active_map);
    > +
    > + err = _cpu_down(cpu, 0);
    > +
    > + if (cpu_online(cpu))
    > + cpu_set(cpu, cpu_active_map);
    >
    > +out:
    > cpu_maps_update_done();
    > return err;
    > }
    > @@ -354,11 +365,18 @@ int __cpuinit cpu_up(unsigned int cpu)
    > }
    >
    > cpu_maps_update_begin();
    > - if (cpu_hotplug_disabled)
    > +
    > + if (cpu_hotplug_disabled) {
    > err = -EBUSY;
    > - else
    > - err = _cpu_up(cpu, 0);
    > + goto out;
    > + }
    >
    > + err = _cpu_up(cpu, 0);
    > +
    > + if (cpu_online(cpu))
    > + cpu_set(cpu, cpu_active_map);
    > +
    > +out:
    > cpu_maps_update_done();
    > return err;
    > }
    > diff --git a/kernel/cpuset.c b/kernel/cpuset.c
    > index 459d601..3c3ef02 100644
    > --- a/kernel/cpuset.c
    > +++ b/kernel/cpuset.c
    > @@ -564,7 +564,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
    > * partition_sched_domains().
    > */
    >
    > -static void rebuild_sched_domains(void)
    > +void rebuild_sched_domains(void)
    > {
    > struct kfifo *q; /* queue of cpusets to be scanned */
    > struct cpuset *cp; /* scans q */
    > diff --git a/kernel/sched.c b/kernel/sched.c
    > index 99e6d85..9019f63 100644
    > --- a/kernel/sched.c
    > +++ b/kernel/sched.c
    > @@ -2881,7 +2881,7 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)
    >
    > rq = task_rq_lock(p, &flags);
    > if (!cpu_isset(dest_cpu, p->cpus_allowed)
    > - || unlikely(cpu_is_offline(dest_cpu)))
    > + || unlikely(!cpu_active(dest_cpu)))
    > goto out;
    >
    > /* force the process onto the specified CPU */
    > @@ -3849,7 +3849,7 @@ int select_nohz_load_balancer(int stop_tick)
    > /*
    > * If we are going offline and still the leader, give up!
    > */
    > - if (cpu_is_offline(cpu) &&
    > + if (!cpu_active(cpu) &&
    > atomic_read(&nohz.load_balancer) == cpu) {
    > if (atomic_cmpxchg(&nohz.load_balancer, cpu, -1) != cpu)
    > BUG();
    > @@ -5876,7 +5876,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
    > struct rq *rq_dest, *rq_src;
    > int ret = 0, on_rq;
    >
    > - if (unlikely(cpu_is_offline(dest_cpu)))
    > + if (unlikely(!cpu_active(dest_cpu)))
    > return ret;
    >
    > rq_src = cpu_rq(src_cpu);
    > @@ -7553,18 +7553,6 @@ void __attribute__((weak)) arch_update_cpu_topology(void)
    > }
    >
    > /*
    > - * Free current domain masks.
    > - * Called after all cpus are attached to NULL domain.
    > - */
    > -static void free_sched_domains(void)
    > -{
    > - ndoms_cur = 0;
    > - if (doms_cur != &fallback_doms)
    > - kfree(doms_cur);
    > - doms_cur = &fallback_doms;
    > -}
    > -
    > -/*
    > * Set up scheduler domains and groups. Callers must hold the hotplug lock.
    > * For now this just excludes isolated cpus, but could be used to
    > * exclude other special cases in the future.
    > @@ -7642,7 +7630,7 @@ static int dattrs_equal(struct sched_domain_attr *cur, int idx_cur,
    > * ownership of it and will kfree it when done with it. If the caller
    > * failed the kmalloc call, then it can pass in doms_new == NULL,
    > * and partition_sched_domains() will fallback to the single partition
    > - * 'fallback_doms'.
    > + * 'fallback_doms', it also forces the domains to be rebuilt.
    > *
    > * Call with hotplug lock held
    > */
    > @@ -7656,12 +7644,8 @@ void partition_sched_domains(int ndoms_new, cpumask_t *doms_new,
    > /* always unregister in case we don't destroy any domains */
    > unregister_sched_domain_sysctl();
    >
    > - if (doms_new == NULL) {
    > - ndoms_new = 1;
    > - doms_new = &fallback_doms;
    > - cpus_andnot(doms_new[0], cpu_online_map, cpu_isolated_map);
    > - dattr_new = NULL;
    > - }
    > + if (doms_new == NULL)
    > + ndoms_new = 0;
    >
    > /* Destroy deleted domains */
    > for (i = 0; i < ndoms_cur; i++) {
    > @@ -7676,6 +7660,14 @@ match1:
    > ;
    > }
    >
    > + if (doms_new == NULL) {
    > + ndoms_cur = 0;
    > + ndoms_new = 1;
    > + doms_new = &fallback_doms;
    > + cpus_andnot(doms_new[0], cpu_online_map, cpu_isolated_map);
    > + dattr_new = NULL;
    > + }
    > +
    > /* Build new domains */
    > for (i = 0; i < ndoms_new; i++) {
    > for (j = 0; j < ndoms_cur; j++) {
    > @@ -7706,17 +7698,10 @@ match2:
    > #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
    > int arch_reinit_sched_domains(void)
    > {
    > - int err;
    > -
    > get_online_cpus();
    > - mutex_lock(&sched_domains_mutex);
    > - detach_destroy_domains(&cpu_online_map);
    > - free_sched_domains();
    > - err = arch_init_sched_domains(&cpu_online_map);
    > - mutex_unlock(&sched_domains_mutex);
    > + rebuild_sched_domains();
    > put_online_cpus();
    > -
    > - return err;
    > + return 0;
    > }
    >
    > static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
    > @@ -7782,59 +7767,49 @@ int sched_create_sysfs_power_savings_entries(struct sysdev_class *cls)
    > }
    > #endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
    >
    > +#ifndef CONFIG_CPUSETS
    > /*
    > - * Force a reinitialization of the sched domains hierarchy. The domains
    > - * and groups cannot be updated in place without racing with the balancing
    > - * code, so we temporarily attach all running cpus to the NULL domain
    > - * which will prevent rebalancing while the sched domains are recalculated.
    > + * Add online and remove offline CPUs from the scheduler domains.
    > + * When cpusets are enabled they take over this function.
    > */
    > static int update_sched_domains(struct notifier_block *nfb,
    > unsigned long action, void *hcpu)
    > {
    > + switch (action) {
    > + case CPU_ONLINE:
    > + case CPU_ONLINE_FROZEN:
    > + case CPU_DEAD:
    > + case CPU_DEAD_FROZEN:
    > + partition_sched_domains(0, NULL, NULL);
    > + return NOTIFY_OK;
    > +
    > + default:
    > + return NOTIFY_DONE;
    > + }
    > +}
    > +#endif
    > +
    > +static int update_runtime(struct notifier_block *nfb,
    > + unsigned long action, void *hcpu)
    > +{
    > int cpu = (int)(long)hcpu;
    >
    > switch (action) {
    > case CPU_DOWN_PREPARE:
    > case CPU_DOWN_PREPARE_FROZEN:
    > disable_runtime(cpu_rq(cpu));
    > - /* fall-through */
    > - case CPU_UP_PREPARE:
    > - case CPU_UP_PREPARE_FROZEN:
    > - detach_destroy_domains(&cpu_online_map);
    > - free_sched_domains();
    > return NOTIFY_OK;
    >
    > -
    > case CPU_DOWN_FAILED:
    > case CPU_DOWN_FAILED_FROZEN:
    > case CPU_ONLINE:
    > case CPU_ONLINE_FROZEN:
    > enable_runtime(cpu_rq(cpu));
    > - /* fall-through */
    > - case CPU_UP_CANCELED:
    > - case CPU_UP_CANCELED_FROZEN:
    > - case CPU_DEAD:
    > - case CPU_DEAD_FROZEN:
    > - /*
    > - * Fall through and re-initialise the domains.
    > - */
    > - break;
    > + return NOTIFY_OK;
    > +
    > default:
    > return NOTIFY_DONE;
    > }
    > -
    > -#ifndef CONFIG_CPUSETS
    > - /*
    > - * Create default domain partitioning if cpusets are disabled.
    > - * Otherwise we let cpusets rebuild the domains based on the
    > - * current setup.
    > - */
    > -
    > - /* The hotplug lock is already held by cpu_up/cpu_down */
    > - arch_init_sched_domains(&cpu_online_map);
    > -#endif
    > -
    > - return NOTIFY_OK;
    > }
    >
    > void __init sched_init_smp(void)
    > @@ -7854,8 +7829,15 @@ void __init sched_init_smp(void)
    > cpu_set(smp_processor_id(), non_isolated_cpus);
    > mutex_unlock(&sched_domains_mutex);
    > put_online_cpus();
    > +
    > +#ifndef CONFIG_CPUSETS
    > /* XXX: Theoretical race here - CPU may be hotplugged now */
    > hotcpu_notifier(update_sched_domains, 0);
    > +#endif
    > +
    > + /* RT runtime code needs to handle some hotplug events */
    > + hotcpu_notifier(update_runtime, 0);
    > +
    > init_hrtick();
    >
    > /* Move init over to a non-isolated CPU */
    > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
    > index f2aa987..d924c67 100644
    > --- a/kernel/sched_fair.c
    > +++ b/kernel/sched_fair.c
    > @@ -1004,6 +1004,8 @@ static void yield_task_fair(struct rq *rq)
    > * not idle and an idle cpu is available. The span of cpus to
    > * search starts with cpus closest then further out as needed,
    > * so we always favor a closer, idle cpu.
    > + * Domains may include CPUs that are not usable for migration,
    > + * hence we need to mask them out (cpu_active_map)
    > *
    > * Returns the CPU we should wake onto.
    > */
    > @@ -1031,6 +1033,7 @@ static int wake_idle(int cpu, struct task_struct *p)
    > || ((sd->flags & SD_WAKE_IDLE_FAR)
    > && !task_hot(p, task_rq(p)->clock, sd))) {
    > cpus_and(tmp, sd->span, p->cpus_allowed);
    > + cpus_and(tmp, tmp, cpu_active_map);
    > for_each_cpu_mask(i, tmp) {
    > if (idle_cpu(i)) {
    > if (i != task_cpu(p)) {
    > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    > index 47ceac9..5166080 100644
    > --- a/kernel/sched_rt.c
    > +++ b/kernel/sched_rt.c
    > @@ -922,6 +922,13 @@ static int find_lowest_rq(struct task_struct *task)
    > return -1; /* No targets found */
    >
    > /*
    > + * Only consider CPUs that are usable for migration.
    > + * I guess we might want to change cpupri_find() to ignore those
    > + * in the first place.
    > + */
    > + cpus_and(*lowest_mask, *lowest_mask, cpu_active_map);
    > +
    > + /*
    > * At this point we have built a mask of cpus representing the
    > * lowest priority tasks in the system. Now we want to elect
    > * the best one based on our affinity and topology.
    > --
    > 1.5.5.1
    >
    >




    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    >>> On Tue, Jul 15, 2008 at 7:43 AM, in message
    <1216122229-4865-1-git-send-email-maxk@qualcomm.com>, Max Krasnyansky
    wrote:

    > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    > index 47ceac9..5166080 100644
    > --- a/kernel/sched_rt.c
    > +++ b/kernel/sched_rt.c
    > @@ -922,6 +922,13 @@ static int find_lowest_rq(struct task_struct *task)
    > return -1; /* No targets found */
    >
    > /*
    > + * Only consider CPUs that are usable for migration.
    > + * I guess we might want to change cpupri_find() to ignore those
    > + * in the first place.
    > + */
    > + cpus_and(*lowest_mask, *lowest_mask, cpu_active_map);
    > +
    > + /*
    > * At this point we have built a mask of cpus representing the
    > * lowest priority tasks in the system. Now we want to elect
    > * the best one based on our affinity and topology.


    Hi Max,
    Its still early in the morning, and I havent had my coffee yet, so what I am about to
    say may be totally bogus

    ...but, I am not sure we need to do this mask here. If the hotcpu_notifiier is still
    running (and it appears that it is) the runqueue that is taken offline will be removed
    from cpupri as well.

    Or perhaps I am misunderstanding the intention of "active" verses "online". If I
    understand correctly, active and online mean more or less the same thing, but
    splitting it out like this allows us to skip rebuilding the domains on every hotplug.
    Is that correct?

    Assuming that the above is true, and assuming that the hotcpu_notifier is
    still invoked when the online status changes, cpupri will be properly updated
    to exclude the offline core. That will save an extra cpus_and (which the big-iron
    guys will be happy about

    Regards,
    -Greg


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. references:in-reply-to:content-type:



    Dmitry Adamushko wrote:
    > 2008/7/15 Max Krasnyansky :
    >> From: Max Krasnyanskiy
    >>
    >> Addressed Ingo's comments and merged on top of latest Linus's tree.

    >
    > a few remarks:
    >
    > (1) in __migrate_task(), a test for !cpu_active(dest_cpu) should be
    > done after double_rq_lock() [ or add the second one ]
    >
    > migration_thread() calls __migrate_task() with disabled interrupts (no
    > rq-locks held), i.e. if we merely rely on rq-locks for
    > synchronization, this can race with cpu_down(dest_cpu).
    >
    > [ assume, the test was done in __migration_task() and it's about to
    > take double_lock()... and at this time, down_cpu(dest_cpu) starts and
    > completes on another CPU ]
    >
    > note, we may still take the rq-lock for a "dead" cpu in this case and
    > then only do a check (remark: in fact, not with stop_machine() in
    > place _but_ I consider that we don't make any assumptions on its
    > behavior);

    Hmm, as you suggested I added synchronize_sched() after clearing the active
    bit (see below). Is that not nought enough ? I mean you mentioned that
    stop_machine syncs things up, I assume synchronize_sched does too.

    I guess testing inside the double_rq_lock() does not hurt anyway. We already
    have fail recovery path there. But are you sure it's needed given the explicit
    sync (in fact we have double sync now , one with synchronize_sched() and
    then with the stop_machine)).

    > (2) it's worth to take a look at the use of any_online_cpu():
    >
    > many places are ok, because there will be an additional check against
    > cpu_active_mask later on, but e.g.
    >
    > set_cpus_allowed_ptr() ->
    > migrate_task(p, any_online_cpu(mask), ...) ->
    > migrate_task(p, dest_cpu)
    >
    > doesn't seem to have any verifications wrt cpu_active_map.

    How about we just introduce any_active_cpu() and replace all the usages of
    any_online_cpu() in the scheduler ?

    > (3) I assume, we have some kind of explicit sched_sync() after
    > cpu_clear(cpu, cpu_active_mask) because:
    >
    > (a) not all places where task-migration may take place do take the
    > rq-lock for dest_cpu : e.g. try_to_wake_up() or even
    > sched_migrate_task() [ yes, there is a special (one might call
    > "subtle") assumption/restriction in this case ]
    >
    > that's why the fact that cpu_down() takes the rq-lock for
    > soon-to-be-offline cpu at some point can not be a "natural" sync.
    > point to guarantee that "stale" cpu_active_map is not used.
    >
    > (b) in fact, stop_machine() acts as a (very strong) sync. point,
    > sched-wise. But perhaps, we don't want to have this new easy-to-follow
    > approach to be built on top of assumptions on how something from
    > another sub-system behaves.

    Yep. As you suggested I've added synchronize_sched() and updated the comment
    that explains the deal with the stop machine.
    http://lkml.org/lkml/2008/7/15/736
    Peter, already ACKed it.

    Max
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. references:in-reply-to:content-type:

    Gregory Haskins wrote:
    >>>> On Tue, Jul 15, 2008 at 7:43 AM, in message

    > <1216122229-4865-1-git-send-email-maxk@qualcomm.com>, Max Krasnyansky
    > wrote:
    >
    >> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    >> index 47ceac9..5166080 100644
    >> --- a/kernel/sched_rt.c
    >> +++ b/kernel/sched_rt.c
    >> @@ -922,6 +922,13 @@ static int find_lowest_rq(struct task_struct *task)
    >> return -1; /* No targets found */
    >>
    >> /*
    >> + * Only consider CPUs that are usable for migration.
    >> + * I guess we might want to change cpupri_find() to ignore those
    >> + * in the first place.
    >> + */
    >> + cpus_and(*lowest_mask, *lowest_mask, cpu_active_map);
    >> +
    >> + /*
    >> * At this point we have built a mask of cpus representing the
    >> * lowest priority tasks in the system. Now we want to elect
    >> * the best one based on our affinity and topology.

    >
    > Hi Max,
    > Its still early in the morning, and I havent had my coffee yet, so what I am about to
    > say may be totally bogus
    >
    > ..but, I am not sure we need to do this mask here. If the hotcpu_notifiier is still
    > running (and it appears that it is) the runqueue that is taken offline will be removed
    > from cpupri as well.
    >
    > Or perhaps I am misunderstanding the intention of "active" verses "online". If I
    > understand correctly, active and online mean more or less the same thing, but
    > splitting it out like this allows us to skip rebuilding the domains on every hotplug.
    > Is that correct?

    Basically with the cpu_active_map we're saying that sched domain masks may
    contain cpus that are going down, and the scheduler is supposed to ignore
    those (by checking cpu_active_map). ie The idea was to simplify cpu hotplug
    handling. My impression was that cpupri updates are similar to the sched
    domains in that respect.

    > Assuming that the above is true, and assuming that the hotcpu_notifier is
    > still invoked when the online status changes, cpupri will be properly updated
    > to exclude the offline core. That will save an extra cpus_and (which the big-iron
    > guys will be happy about

    Ah, now I see what you mean by the hotplug handler is still running. You're
    talking about set_rq_online()/set_rq_offline() calls from migration_call().
    Yes did not know what they were for and did not touch that path.
    btw I'm definitely with you on the cpus_and() thing. When I added it in both
    balancers I thought that it quite an overhead on bigger boxes.

    So I'm not sure what the best way to handle this. If we say we're relying on
    hotplug event sequence to ensure that rt balancer state is consistent then we
    kind of back to square one. ie Might as do the same for the sched domains.

    I guess we could update cpupri state when we update cpu_active_map. That way
    the two will be synced up and we do not have to "and" them in the fast path.
    Any other thoughts ?

    Max




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)

    2008/7/16 Max Krasnyansky :
    >
    >
    > Dmitry Adamushko wrote:
    >> 2008/7/15 Max Krasnyansky :
    >>> From: Max Krasnyanskiy
    >>>
    >>> Addressed Ingo's comments and merged on top of latest Linus's tree.

    >>
    >> a few remarks:
    >>
    >> (1) in __migrate_task(), a test for !cpu_active(dest_cpu) should be
    >> done after double_rq_lock() [ or add the second one ]
    >>
    >> migration_thread() calls __migrate_task() with disabled interrupts (no
    >> rq-locks held), i.e. if we merely rely on rq-locks for
    >> synchronization, this can race with cpu_down(dest_cpu).
    >>
    >> [ assume, the test was done in __migration_task() and it's about to
    >> take double_lock()... and at this time, down_cpu(dest_cpu) starts and
    >> completes on another CPU ]
    >>
    >> note, we may still take the rq-lock for a "dead" cpu in this case and
    >> then only do a check (remark: in fact, not with stop_machine() in
    >> place _but_ I consider that we don't make any assumptions on its
    >> behavior);

    > Hmm, as you suggested I added synchronize_sched() after clearing the active
    > bit (see below). Is that not nought enough ? I mean you mentioned that
    > stop_machine syncs things up, I assume synchronize_sched does too.


    Yes, sorry for the noise here.

    * synchronize_sched - block until all CPUs have exited any non-preemptive
    * kernel code sequences

    so "any non-preemptive" sections, not just the ones with run-queue
    locks being held.


    >> (2) it's worth to take a look at the use of any_online_cpu():
    >>
    >> many places are ok, because there will be an additional check against
    >> cpu_active_mask later on, but e.g.
    >>
    >> set_cpus_allowed_ptr() ->
    >> migrate_task(p, any_online_cpu(mask), ...) ->
    >> migrate_task(p, dest_cpu)
    >>
    >> doesn't seem to have any verifications wrt cpu_active_map.

    > How about we just introduce any_active_cpu() and replace all the usages of
    > any_online_cpu() in the scheduler ?


    I think, at least for places related to task placement (like
    migrate_task(..., any_online_cpu()) it would make sense,
    consistency-wise.



    >
    > Max
    >


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redosched domain managment (take 2)

    >>> On Wed, Jul 16, 2008 at 5:44 PM, in message <487E6BD7.3020006@qualcomm.com>,
    Max Krasnyansky wrote:
    > Gregory Haskins wrote:
    >>>>> On Tue, Jul 15, 2008 at 7:43 AM, in message

    >> <1216122229-4865-1-git-send-email-maxk@qualcomm.com>, Max Krasnyansky
    >> wrote:
    >>
    >>> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    >>> index 47ceac9..5166080 100644
    >>> --- a/kernel/sched_rt.c
    >>> +++ b/kernel/sched_rt.c
    >>> @@ -922,6 +922,13 @@ static int find_lowest_rq(struct task_struct *task)
    >>> return -1; /* No targets found */
    >>>
    >>> /*
    >>> + * Only consider CPUs that are usable for migration.
    >>> + * I guess we might want to change cpupri_find() to ignore those
    >>> + * in the first place.
    >>> + */
    >>> + cpus_and(*lowest_mask, *lowest_mask, cpu_active_map);
    >>> +
    >>> + /*
    >>> * At this point we have built a mask of cpus representing the
    >>> * lowest priority tasks in the system. Now we want to elect
    >>> * the best one based on our affinity and topology.

    >>
    >> Hi Max,
    >> Its still early in the morning, and I havent had my coffee yet, so what I

    > am about to
    >> say may be totally bogus
    >>
    >> ..but, I am not sure we need to do this mask here. If the hotcpu_notifiier

    > is still
    >> running (and it appears that it is) the runqueue that is taken offline will

    > be removed
    >> from cpupri as well.
    >>
    >> Or perhaps I am misunderstanding the intention of "active" verses "online".

    > If I
    >> understand correctly, active and online mean more or less the same thing,

    > but
    >> splitting it out like this allows us to skip rebuilding the domains on

    > every hotplug.
    >> Is that correct?

    >
    > Basically with the cpu_active_map we're saying that sched domain masks may
    > contain cpus that are going down, and the scheduler is supposed to ignore
    > those (by checking cpu_active_map).


    Well, admittedly I am not entirely clear on what problem is being solved as
    I was not part of the original thread with Linus. My impression of what you
    were trying to solve was to eliminate the need to rebuild the domains for a
    hotplug event (which I think is a good problem to solve), thus eliminating
    some complexity and (iiuc) races there.

    However, based on what you just said, I am not sure I've got that entirely
    right anymore. Can you clarify the intent (or point me at the original thread)
    so we are on the same page?

    > ie The idea was to simplify cpu hotplug
    > handling. My impression was that cpupri updates are similar to the sched
    > domains in that respect.


    Well, not quite. cpupri ends up hanging off of a root-domain, so its more
    closely related to the global cpu_XXX_maps than it is to the domains. As
    you know, to clear a bit from the domains means walking each RQ and each
    domain within that runqueue since the domain structures are per-cpu. This
    is why historically the domains have been rebuilt when a cpu is plugged in
    (iiuc).

    However, with cpupri (and root-domains in general) the structure is shared
    among all member RQs. Therefore, taking a cpu online/offline is simply a
    matter of updating a single cpumap_t. This happens today (in sched-devel,
    anyway) via the set_rq_online/offline() routines.

    >
    >> Assuming that the above is true, and assuming that the hotcpu_notifier is
    >> still invoked when the online status changes, cpupri will be properly

    > updated
    >> to exclude the offline core. That will save an extra cpus_and (which the

    > big-iron
    >> guys will be happy about

    > Ah, now I see what you mean by the hotplug handler is still running. You're
    > talking about set_rq_online()/set_rq_offline() calls from migration_call().
    > Yes did not know what they were for and did not touch that path.
    > btw I'm definitely with you on the cpus_and() thing. When I added it in both
    > balancers I thought that it quite an overhead on bigger boxes.
    >
    > So I'm not sure what the best way to handle this. If we say we're relying on
    > hotplug event sequence to ensure that rt balancer state is consistent then
    > we
    > kind of back to square one. ie Might as do the same for the sched domains.


    Im not following you here (but again, it might be because of my
    misunderstanding of what it is you are actually solving). I see the cleanup
    that you did to not require rebuilding domains on hotplug as a really good thing.
    However, I don't see the problem with updating cpupri in the manner I mentioned
    either. The rq_offline routine is invoked on the DYING event (pre-offline), and
    the rq_online on the ONLINE event. This means that it will effectively disable the
    runqueue at the earliest possible moment in the hotplug processing, thereby
    removing the possibility that we will route to that core.

    My gut tells me that if this is not adequate, there there is something wrong with
    the hotplug notifier infrastructure and we should fix that, not put some more
    infrastructure on top of it.

    Don't get me wrong. I am not nacking your concept. As ive said I like decoupling
    the hotplug from the cpuset stuff (assuming that is what you were trying to do).
    But I am also not convinced that part of what you are doing is already handled
    properly, so why duplicate it when you can just tie into the existing logic.

    On this topic, (and assuming I am right in my cpupri online/offline theory above), FYI
    you can probably use rq->rd->online instead of cpu_active_map in many places. This
    mask is updated according to the hotplug state in a similar manner as cpupri and should
    effectively be a cached version of cpus_active_map & rq->rd->span.

    I guess the point here really is that it would be best (IMO) to not introduce another
    map (at least to the scheduler) if it can be helped at all. If you could somehow tie your
    cpuset/hotplug decoupling logic with rq->rd->online, et. al., I would be very enthused

    If you want to retain the cpu_active_map for other code (e.g. cpusets/hotplug) to use, I
    see no problem with that. It just seems the scheduler already has the hooks in place
    to play nice here, so Id like to see them used if possible

    Regards,
    -Greg

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. references:in-reply-to:content-type:



    Gregory Haskins wrote:
    > Well, admittedly I am not entirely clear on what problem is being solved as
    > I was not part of the original thread with Linus. My impression of what you
    > were trying to solve was to eliminate the need to rebuild the domains for a
    > hotplug event (which I think is a good problem to solve), thus eliminating
    > some complexity and (iiuc) races there.
    >
    > However, based on what you just said, I am not sure I've got that entirely
    > right anymore. Can you clarify the intent (or point me at the original thread)
    > so we are on the same page?

    Here is the link to the original thread
    http://lkml.org/lkml/2008/7/11/328
    And here is where Linus explained the idea
    http://lkml.org/lkml/2008/7/12/137

    I'll reply to the rest of your email tomorrow (can't keep my yes open any
    longer ).

    Max
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redoscheddomain managment (take 2)

    >>> On Thu, Jul 17, 2008 at 3:16 AM, in message <487EF1E9.2040101@qualcomm.com>,
    Max Krasnyansky wrote:

    >
    > Gregory Haskins wrote:
    >> Well, admittedly I am not entirely clear on what problem is being solved as
    >> I was not part of the original thread with Linus. My impression of what you
    >> were trying to solve was to eliminate the need to rebuild the domains for a
    >> hotplug event (which I think is a good problem to solve), thus eliminating
    >> some complexity and (iiuc) races there.
    >>
    >> However, based on what you just said, I am not sure I've got that entirely
    >> right anymore. Can you clarify the intent (or point me at the original

    > thread)
    >> so we are on the same page?

    > Here is the link to the original thread
    > http://lkml.org/lkml/2008/7/11/328
    > And here is where Linus explained the idea
    > http://lkml.org/lkml/2008/7/12/137
    >
    > I'll reply to the rest of your email tomorrow (can't keep my yes open any
    > longer ).
    >
    > Max


    Hi Max,
    Thanks for the pointers. I see that I did indeed misunderstand the intent of the patch.
    It seems you already solved the rebuild problem, and were just trying to solve the
    "migrate to a dead cpu" problem that Linus mentions as a solution with cpu_active_map.

    In that case, note that rq->rd->online already fits the bill, I believe. In a nutshell,
    rq->rd->span contains all the cpus within your disjoint cpuset, and rq->rd->online,
    contains the subset of rq->rd->span that are online. The online bit is cleared at the
    earliest point in cpu hotplug removal (DYING), and it is set at the very latest point on
    insertion (ONLINE). Therefore it is redundant with the cpus_active_map concept.

    I think the simplest solution is to make sure that we cpus_and against rq->rd->online
    before allowing a migration. This is how I intended the mask to be used, anyway. Its
    what the RT scheduler does. It sounds like we just need to touch up the few places
    in the CFS side that were causing those oops.

    Thoughts?

    -Greg

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. references:in-reply-to:content-type:

    Gregory Haskins wrote:
    >>>> On Thu, Jul 17, 2008 at 3:16 AM, in message <487EF1E9.2040101@qualcomm.com>,

    > Max Krasnyansky wrote:
    >
    >> Gregory Haskins wrote:
    >>> Well, admittedly I am not entirely clear on what problem is being solved as
    >>> I was not part of the original thread with Linus. My impression of what you
    >>> were trying to solve was to eliminate the need to rebuild the domains for a
    >>> hotplug event (which I think is a good problem to solve), thus eliminating
    >>> some complexity and (iiuc) races there.
    >>>
    >>> However, based on what you just said, I am not sure I've got that entirely
    >>> right anymore. Can you clarify the intent (or point me at the original

    >> thread)
    >>> so we are on the same page?

    >> Here is the link to the original thread
    >> http://lkml.org/lkml/2008/7/11/328
    >> And here is where Linus explained the idea
    >> http://lkml.org/lkml/2008/7/12/137
    >>
    >> I'll reply to the rest of your email tomorrow (can't keep my yes open any
    >> longer ).
    >>
    >> Max

    >
    > Hi Max,
    > Thanks for the pointers. I see that I did indeed misunderstand the intent of the patch.
    > It seems you already solved the rebuild problem, and were just trying to solve the
    > "migrate to a dead cpu" problem that Linus mentions as a solution with cpu_active_map.

    Yes. btw they are definitely related, because the reason we were blowing away
    the domains is to avoid "migration to a dead cpu". ie We were relying on the
    fact that domain masks never contain cpus that are either dying or already dead.

    > In that case, note that rq->rd->online already fits the bill, I believe. In a nutshell,
    > rq->rd->span contains all the cpus within your disjoint cpuset, and rq->rd->online,
    > contains the subset of rq->rd->span that are online. The online bit is cleared at the
    > earliest point in cpu hotplug removal (DYING), and it is set at the very latest point on
    > insertion (ONLINE). Therefore it is redundant with the cpus_active_map concept.
    >
    > I think the simplest solution is to make sure that we cpus_and against rq->rd->online
    > before allowing a migration. This is how I intended the mask to be used, anyway. Its
    > what the RT scheduler does. It sounds like we just need to touch up the few places
    > in the CFS side that were causing those oops.
    >
    > Thoughts?

    None at this point . I need to run right now and will try to look at this
    later today. My knowledge of the internal sched structs is definitely lacking
    so I need to look at the rq->rd thing to have and opinion.

    Thanx
    Max


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redoscheddomainmanagment (take 2)

    >>> On Thu, Jul 17, 2008 at 2:52 PM, in message <487F9509.9050802@qualcomm.com>,
    Max Krasnyansky wrote:
    > Gregory Haskins wrote:
    >>
    >> Hi Max,
    >> Thanks for the pointers. I see that I did indeed misunderstand the intent
    >> of the patch. It seems you already solved the rebuild problem, and were
    >> just trying to solve the "migrate to a dead cpu" problem that Linus mentions
    >> as a solution with cpu_active_map.

    >
    > Yes. btw they are definitely related, because the reason we were blowing
    > away the domains is to avoid "migration to a dead cpu". ie We were relying
    > on the fact that domain masks never contain cpus that are either dying or
    > already dead.


    Agreed.

    >>
    >> Thoughts?

    >
    > None at this point . I need to run right now and will try to look at this
    > later today. My knowledge of the internal sched structs is definitely
    > lacking so I need to look at the rq->rd thing to have and opinion.


    Sounds good, Max. Thanks!

    -Greg


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)


    * Max Krasnyansky wrote:

    > From: Max Krasnyanskiy
    >
    > Addressed Ingo's comments and merged on top of latest Linus's tree.
    >
    > This is based on Linus' idea of creating cpu_active_map that prevents
    > scheduler load balancer from migrating tasks to the cpu that is going
    > down.


    applied to tip/sched/devel - thanks Max.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] cpu hotplug, sched: Introduce cpu_active_map and redoscheddomainmanagment (take 2)

    On Thu, 2008-07-17 at 13:46 -0600, Gregory Haskins wrote:
    > >>> On Thu, Jul 17, 2008 at 2:52 PM, in message <487F9509.9050802@qualcomm..com>,

    > Max Krasnyansky wrote:
    > > Gregory Haskins wrote:
    > >>
    > >> Hi Max,
    > >> Thanks for the pointers. I see that I did indeed misunderstand the intent
    > >> of the patch. It seems you already solved the rebuild problem, and were
    > >> just trying to solve the "migrate to a dead cpu" problem that Linus mentions
    > >> as a solution with cpu_active_map.

    > >
    > > Yes. btw they are definitely related, because the reason we were blowing
    > > away the domains is to avoid "migration to a dead cpu". ie We were relying
    > > on the fact that domain masks never contain cpus that are either dying or
    > > already dead.

    >
    > Agreed.
    >
    > >>
    > >> Thoughts?

    > >
    > > None at this point . I need to run right now and will try to look at this
    > > later today. My knowledge of the internal sched structs is definitely
    > > lacking so I need to look at the rq->rd thing to have and opinion.

    >
    > Sounds good, Max. Thanks!


    I'm thinking doing it explicitly with the new cpu mask is clearer and
    easier to understand than 'hiding' the variable in the root domain and
    having to understand all the domain/root-domain stuff.

    I think this was Linus' main point. It should be easier to understand
    this code.


    So, if there is functional overlap with the root domain stuff, it might
    be good to remove that bit and use the cpu_active_map stuff for that
    instead.



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] cpu hotplug, sched:Introduce cpu_active_map and redoscheddomainmanagment (take 2)

    Hi Peter,

    >>> On Fri, Jul 18, 2008 at 7:53 AM, in message <1216382024.28405.26.camel@twins>,

    Peter Zijlstra wrote:
    > On Thu, 2008-07-17 at 13:46 -0600, Gregory Haskins wrote:
    >> >>> On Thu, Jul 17, 2008 at 2:52 PM, in message

    > <487F9509.9050802@qualcomm..com>,
    >> Max Krasnyansky wrote:
    >> > Gregory Haskins wrote:
    >> >>
    >> >> Hi Max,
    >> >> Thanks for the pointers. I see that I did indeed misunderstand the

    > intent
    >> >> of the patch. It seems you already solved the rebuild problem, and were
    >> >> just trying to solve the "migrate to a dead cpu" problem that Linus

    > mentions
    >> >> as a solution with cpu_active_map.
    >> >
    >> > Yes. btw they are definitely related, because the reason we were blowing
    >> > away the domains is to avoid "migration to a dead cpu". ie We were relying
    >> > on the fact that domain masks never contain cpus that are either dying or
    >> > already dead.

    >>
    >> Agreed.
    >>
    >> >>
    >> >> Thoughts?
    >> >
    >> > None at this point . I need to run right now and will try to look at this
    >> > later today. My knowledge of the internal sched structs is definitely
    >> > lacking so I need to look at the rq->rd thing to have and opinion.

    >>
    >> Sounds good, Max. Thanks!

    >
    > I'm thinking doing it explicitly with the new cpu mask is clearer and
    > easier to understand than 'hiding' the variable in the root domain and
    > having to understand all the domain/root-domain stuff.
    >
    > I think this was Linus' main point. It should be easier to understand
    > this code.


    While I can appreciate this sentiment, note that we conceptually require
    IMO the notion that the root-domain masks present. E.g. we really dont
    want to migrate to just cpus_active_map, but rather
    rq->rd->span & cpus_active_map (otherwise we could route outside
    of a disjoint cpuset). And this is precisely what rq->rd->online is (a
    cached version of cpus_active_map & rq->rd->span).

    So while I can understand the motivation to keep things simple, note that
    I tried to design the root-domain stuff to be as simple as possible while
    still meeting the requirements for working within disjoint sets. I am
    open to other suggestions, but I see nothing particularly complex or
    wrong with whats going on there currently. Perhaps this very
    conversation is evidence that I needed to comment better

    >
    >
    > So, if there is functional overlap with the root domain stuff, it might
    > be good to remove that bit and use the cpu_active_map stuff for that
    > instead.


    I think we would be doing the code that does use it a disservice, per above.

    Regards,
    -Greg


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast