[PATCH 1/2] workqueues: make get_online_cpus() useable for work->func() - Kernel

This is a discussion on [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func() - Kernel ; workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under cpu_maps_update_begin(). This means that the multithreaded workqueues can't use get_online_cpus() due to the possible deadlock, very bad and very old problem. Introduce the new state, CPU_POST_DEAD, which is called after cpu_hotplug_done() but before cpu_maps_update_done(). Change workqueue_cpu_callback() ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

  1. [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    cpu_maps_update_begin(). This means that the multithreaded workqueues can't
    use get_online_cpus() due to the possible deadlock, very bad and very old
    problem.

    Introduce the new state, CPU_POST_DEAD, which is called after
    cpu_hotplug_done() but before cpu_maps_update_done().

    Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    This means that create/destroy functions can't rely on get_online_cpus()
    any longer and should take cpu_add_remove_lock instead.

    Signed-off-by: Oleg Nesterov

    include/linux/notifier.h | 2 ++
    kernel/cpu.c | 5 +++++
    kernel/workqueue.c | 18 +++++++++---------
    3 files changed, 16 insertions(+), 9 deletions(-)

    --- 26-rc2/include/linux/notifier.h~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:16.000000000 +0400
    +++ 26-rc2/include/linux/notifier.h 2008-05-18 15:44:16.000000000 +0400
    @@ -213,6 +213,8 @@ static inline int notifier_to_errno(int
    #define CPU_DEAD 0x0007 /* CPU (unsigned)v dead */
    #define CPU_DYING 0x0008 /* CPU (unsigned)v not running any task,
    * not handling interrupts, soon dead */
    +#define CPU_POST_DEAD 0x0009 /* CPU (unsigned)v dead, cpu_hotplug
    + * lock is dropped */

    /* Used for CPU hotplug events occuring while tasks are frozen due to a suspend
    * operation in progress
    --- 26-rc2/kernel/cpu.c~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:18.000000000 +0400
    +++ 26-rc2/kernel/cpu.c 2008-06-29 20:03:19.000000000 +0400
    @@ -261,6 +261,11 @@ out_allowed:
    set_cpus_allowed_ptr(current, &old_allowed);
    out_release:
    cpu_hotplug_done();
    + if (!err) {
    + if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
    + hcpu) == NOTIFY_BAD)
    + BUG();
    + }
    return err;
    }

    --- 26-rc2/kernel/workqueue.c~WQ_4_GET_ONLINE_CPUS 2008-06-29 18:34:06.000000000 +0400
    +++ 26-rc2/kernel/workqueue.c 2008-06-29 19:36:27.000000000 +0400
    @@ -791,7 +791,7 @@ struct workqueue_struct *__create_workqu
    err = create_workqueue_thread(cwq, singlethread_cpu);
    start_workqueue_thread(cwq, -1);
    } else {
    - get_online_cpus();
    + cpu_maps_update_begin();
    spin_lock(&workqueue_lock);
    list_add(&wq->list, &workqueues);
    spin_unlock(&workqueue_lock);
    @@ -803,7 +803,7 @@ struct workqueue_struct *__create_workqu
    err = create_workqueue_thread(cwq, cpu);
    start_workqueue_thread(cwq, cpu);
    }
    - put_online_cpus();
    + cpu_maps_update_done();
    }

    if (err) {
    @@ -817,8 +817,8 @@ EXPORT_SYMBOL_GPL(__create_workqueue_key
    static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
    {
    /*
    - * Our caller is either destroy_workqueue() or CPU_DEAD,
    - * get_online_cpus() protects cwq->thread.
    + * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
    + * cpu_add_remove_lock protects cwq->thread.
    */
    if (cwq->thread == NULL)
    return;
    @@ -828,7 +828,7 @@ static void cleanup_workqueue_thread(str

    flush_cpu_workqueue(cwq);
    /*
    - * If the caller is CPU_DEAD and cwq->worklist was not empty,
    + * If the caller is CPU_POST_DEAD and cwq->worklist was not empty,
    * a concurrent flush_workqueue() can insert a barrier after us.
    * However, in that case run_workqueue() won't return and check
    * kthread_should_stop() until it flushes all work_struct's.
    @@ -852,14 +852,14 @@ void destroy_workqueue(struct workqueue_
    const cpumask_t *cpu_map = wq_cpu_map(wq);
    int cpu;

    - get_online_cpus();
    + cpu_maps_update_begin();
    spin_lock(&workqueue_lock);
    list_del(&wq->list);
    spin_unlock(&workqueue_lock);

    for_each_cpu_mask(cpu, *cpu_map)
    cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
    - put_online_cpus();
    + cpu_maps_update_done();

    free_percpu(wq->cpu_wq);
    kfree(wq);
    @@ -898,7 +898,7 @@ static int __devinit workqueue_cpu_callb

    case CPU_UP_CANCELED:
    start_workqueue_thread(cwq, -1);
    - case CPU_DEAD:
    + case CPU_POST_DEAD:
    cleanup_workqueue_thread(cwq);
    break;
    }
    @@ -906,7 +906,7 @@ static int __devinit workqueue_cpu_callb

    switch (action) {
    case CPU_UP_CANCELED:
    - case CPU_DEAD:
    + case CPU_POST_DEAD:
    cpu_clear(cpu, cpu_populated_map);
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    On Sun, Jun 29, 2008 at 08:51:31PM +0400, Oleg Nesterov wrote:
    > workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    > cpu_maps_update_begin(). This means that the multithreaded workqueues can't
    > use get_online_cpus() due to the possible deadlock, very bad and very old
    > problem.
    >
    > Introduce the new state, CPU_POST_DEAD, which is called after
    > cpu_hotplug_done() but before cpu_maps_update_done().
    >
    > Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    > This means that create/destroy functions can't rely on get_online_cpus()
    > any longer and should take cpu_add_remove_lock instead.


    Ah, nice!

    > --- 26-rc2/kernel/cpu.c~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:18.000000000 +0400
    > +++ 26-rc2/kernel/cpu.c 2008-06-29 20:03:19.000000000 +0400
    > @@ -261,6 +261,11 @@ out_allowed:
    > set_cpus_allowed_ptr(current, &old_allowed);
    > out_release:
    > cpu_hotplug_done();
    > + if (!err) {


    This should be (!err && !cpu_online(cpu)), no?

    This is because it might be that __stop_machine_run() succeeded, but
    take_cpu_down() failed and therefore our cpu is still online.

    Which in turn is not good when doing this:

    > + if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
    > + hcpu) == NOTIFY_BAD)
    > + BUG();
    > + }

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    On Mon, Jun 30, 2008 at 03:43:49PM +0200, Heiko Carstens wrote:
    > On Sun, Jun 29, 2008 at 08:51:31PM +0400, Oleg Nesterov wrote:
    > > workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    > > cpu_maps_update_begin(). This means that the multithreaded workqueues can't
    > > use get_online_cpus() due to the possible deadlock, very bad and very old
    > > problem.
    > >
    > > Introduce the new state, CPU_POST_DEAD, which is called after
    > > cpu_hotplug_done() but before cpu_maps_update_done().
    > >
    > > Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    > > This means that create/destroy functions can't rely on get_online_cpus()
    > > any longer and should take cpu_add_remove_lock instead.

    >
    > Ah, nice!
    >
    > > --- 26-rc2/kernel/cpu.c~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:18.000000000 +0400
    > > +++ 26-rc2/kernel/cpu.c 2008-06-29 20:03:19.000000000 +0400
    > > @@ -261,6 +261,11 @@ out_allowed:
    > > set_cpus_allowed_ptr(current, &old_allowed);
    > > out_release:
    > > cpu_hotplug_done();
    > > + if (!err) {

    >
    > This should be (!err && !cpu_online(cpu)), no?
    >
    > This is because it might be that __stop_machine_run() succeeded, but
    > take_cpu_down() failed and therefore our cpu is still online.


    Erk.. it's ok as is since err will contain the return value of take_cpu_down()
    after err = kthread_stop(p).
    Never mind, just ignore me
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    On Sun, Jun 29, 2008 at 08:51:31PM +0400, Oleg Nesterov wrote:
    > workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    > cpu_maps_update_begin(). This means that the multithreaded workqueues can't
    > use get_online_cpus() due to the possible deadlock, very bad and very old
    > problem.
    >
    > Introduce the new state, CPU_POST_DEAD, which is called after
    > cpu_hotplug_done() but before cpu_maps_update_done().
    >
    > Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    > This means that create/destroy functions can't rely on get_online_cpus()
    > any longer and should take cpu_add_remove_lock instead.
    >
    > Signed-off-by: Oleg Nesterov

    That should simplify a lot of subsystems.
    Thanks Oleg!

    Acked-by: Gautham R Shenoy

    >
    > include/linux/notifier.h | 2 ++
    > kernel/cpu.c | 5 +++++
    > kernel/workqueue.c | 18 +++++++++---------
    > 3 files changed, 16 insertions(+), 9 deletions(-)
    >
    > --- 26-rc2/include/linux/notifier.h~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:16.000000000 +0400
    > +++ 26-rc2/include/linux/notifier.h 2008-05-18 15:44:16.000000000 +0400
    > @@ -213,6 +213,8 @@ static inline int notifier_to_errno(int
    > #define CPU_DEAD 0x0007 /* CPU (unsigned)v dead */
    > #define CPU_DYING 0x0008 /* CPU (unsigned)v not running any task,
    > * not handling interrupts, soon dead */
    > +#define CPU_POST_DEAD 0x0009 /* CPU (unsigned)v dead, cpu_hotplug
    > + * lock is dropped */
    >
    > /* Used for CPU hotplug events occuring while tasks are frozen due to a suspend
    > * operation in progress
    > --- 26-rc2/kernel/cpu.c~WQ_4_GET_ONLINE_CPUS 2008-05-18 15:44:18.000000000 +0400
    > +++ 26-rc2/kernel/cpu.c 2008-06-29 20:03:19.000000000 +0400
    > @@ -261,6 +261,11 @@ out_allowed:
    > set_cpus_allowed_ptr(current, &old_allowed);
    > out_release:
    > cpu_hotplug_done();
    > + if (!err) {
    > + if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
    > + hcpu) == NOTIFY_BAD)
    > + BUG();
    > + }
    > return err;
    > }
    >
    > --- 26-rc2/kernel/workqueue.c~WQ_4_GET_ONLINE_CPUS 2008-06-29 18:34:06.000000000 +0400
    > +++ 26-rc2/kernel/workqueue.c 2008-06-29 19:36:27.000000000 +0400
    > @@ -791,7 +791,7 @@ struct workqueue_struct *__create_workqu
    > err = create_workqueue_thread(cwq, singlethread_cpu);
    > start_workqueue_thread(cwq, -1);
    > } else {
    > - get_online_cpus();
    > + cpu_maps_update_begin();
    > spin_lock(&workqueue_lock);
    > list_add(&wq->list, &workqueues);
    > spin_unlock(&workqueue_lock);
    > @@ -803,7 +803,7 @@ struct workqueue_struct *__create_workqu
    > err = create_workqueue_thread(cwq, cpu);
    > start_workqueue_thread(cwq, cpu);
    > }
    > - put_online_cpus();
    > + cpu_maps_update_done();
    > }
    >
    > if (err) {
    > @@ -817,8 +817,8 @@ EXPORT_SYMBOL_GPL(__create_workqueue_key
    > static void cleanup_workqueue_thread(struct cpu_workqueue_struct *cwq)
    > {
    > /*
    > - * Our caller is either destroy_workqueue() or CPU_DEAD,
    > - * get_online_cpus() protects cwq->thread.
    > + * Our caller is either destroy_workqueue() or CPU_POST_DEAD,
    > + * cpu_add_remove_lock protects cwq->thread.
    > */
    > if (cwq->thread == NULL)
    > return;
    > @@ -828,7 +828,7 @@ static void cleanup_workqueue_thread(str
    >
    > flush_cpu_workqueue(cwq);
    > /*
    > - * If the caller is CPU_DEAD and cwq->worklist was not empty,
    > + * If the caller is CPU_POST_DEAD and cwq->worklist was not empty,
    > * a concurrent flush_workqueue() can insert a barrier after us.
    > * However, in that case run_workqueue() won't return and check
    > * kthread_should_stop() until it flushes all work_struct's.
    > @@ -852,14 +852,14 @@ void destroy_workqueue(struct workqueue_
    > const cpumask_t *cpu_map = wq_cpu_map(wq);
    > int cpu;
    >
    > - get_online_cpus();
    > + cpu_maps_update_begin();
    > spin_lock(&workqueue_lock);
    > list_del(&wq->list);
    > spin_unlock(&workqueue_lock);
    >
    > for_each_cpu_mask(cpu, *cpu_map)
    > cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu));
    > - put_online_cpus();
    > + cpu_maps_update_done();
    >
    > free_percpu(wq->cpu_wq);
    > kfree(wq);
    > @@ -898,7 +898,7 @@ static int __devinit workqueue_cpu_callb
    >
    > case CPU_UP_CANCELED:
    > start_workqueue_thread(cwq, -1);
    > - case CPU_DEAD:
    > + case CPU_POST_DEAD:
    > cleanup_workqueue_thread(cwq);
    > break;
    > }
    > @@ -906,7 +906,7 @@ static int __devinit workqueue_cpu_callb
    >
    > switch (action) {
    > case CPU_UP_CANCELED:
    > - case CPU_DEAD:
    > + case CPU_POST_DEAD:
    > cpu_clear(cpu, cpu_populated_map);
    > }
    >
    >


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    On Sun, 29 Jun 2008 20:51:31 +0400 Oleg Nesterov wrote:

    > workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    > cpu_maps_update_begin(). This means that the multithreaded workqueues can't
    > use get_online_cpus() due to the possible deadlock, very bad and very old
    > problem.
    >
    > Introduce the new state, CPU_POST_DEAD, which is called after
    > cpu_hotplug_done() but before cpu_maps_update_done().
    >
    > Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    > This means that create/destroy functions can't rely on get_online_cpus()
    > any longer and should take cpu_add_remove_lock instead.


    I know that Document/SubmitChecklist has a lot of stuff. But a basic
    allnoconfig only takes seconds and it's often the thing which breaks.

    include/linux/cpu.h | 15 +++++++++++----
    1 file changed, 11 insertions(+), 4 deletions(-)

    diff -puN include/linux/cpu.h~workqueues-make-get_online_cpus-useable-for-work-func-fix include/linux/cpu.h
    --- a/include/linux/cpu.h~workqueues-make-get_online_cpus-useable-for-work-func-fix
    +++ a/include/linux/cpu.h
    @@ -69,10 +69,11 @@ static inline void unregister_cpu_notifi
    #endif

    int cpu_up(unsigned int cpu);
    -
    extern void cpu_hotplug_init(void);
    +extern void cpu_maps_update_begin(void);
    +extern void cpu_maps_update_done(void);

    -#else
    +#else /* CONFIG_SMP */

    static inline int register_cpu_notifier(struct notifier_block *nb)
    {
    @@ -87,10 +88,16 @@ static inline void cpu_hotplug_init(void
    {
    }

    +static inline void cpu_maps_update_begin(void)
    +{
    +}
    +
    +static inline void cpu_maps_update_done(void)
    +{
    +}
    +
    #endif /* CONFIG_SMP */
    extern struct sysdev_class cpu_sysdev_class;
    -extern void cpu_maps_update_begin(void);
    -extern void cpu_maps_update_done(void);

    #ifdef CONFIG_HOTPLUG_CPU
    /* Stop CPUs going up and down. */
    _

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/2] workqueues: make get_online_cpus() useable for work->func()

    On 07/02, Andrew Morton wrote:
    >
    > I know that Document/SubmitChecklist has a lot of stuff. But a basic
    > allnoconfig only takes seconds and it's often the thing which breaks.
    >
    > include/linux/cpu.h | 15 +++++++++++----
    > 1 file changed, 11 insertions(+), 4 deletions(-)
    >
    > diff -puN include/linux/cpu.h~workqueues-make-get_online_cpus-useable-for-work-func-fix include/linux/cpu.h
    > --- a/include/linux/cpu.h~workqueues-make-get_online_cpus-useable-for-work-func-fix
    > +++ a/include/linux/cpu.h
    > @@ -69,10 +69,11 @@ static inline void unregister_cpu_notifi
    > #endif
    >
    > int cpu_up(unsigned int cpu);
    > -
    > extern void cpu_hotplug_init(void);
    > +extern void cpu_maps_update_begin(void);
    > +extern void cpu_maps_update_done(void);
    >
    > -#else
    > +#else /* CONFIG_SMP */
    >
    > static inline int register_cpu_notifier(struct notifier_block *nb)
    > {
    > @@ -87,10 +88,16 @@ static inline void cpu_hotplug_init(void
    > {
    > }
    >
    > +static inline void cpu_maps_update_begin(void)
    > +{
    > +}
    > +
    > +static inline void cpu_maps_update_done(void)
    > +{
    > +}


    Oh thanks...

    This also means that we can't kill workqueue_lock (as I was going to do),
    we still need it for !CONFIG_HOTPLUG_CPU case.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread