[PATCH 1/7] work_on_cpu: helper for doing task on a CPU. - Kernel

This is a discussion on [PATCH 1/7] work_on_cpu: helper for doing task on a CPU. - Kernel ; Several places in the kernel do the following: saved_mask = current->cpus_allowed; set_cpus_allowed_ptr(current, &cpumask_of_cpu( pr ->id)); somefunc(); /* restore the previous state */ set_cpus_allowed_ptr(current, &saved_mask); This is bad, because a process's cpumask is observable and manipulatable by userspace and should not ...

+ Reply to Thread
Results 1 to 20 of 20

Thread: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

  1. [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.


    Several places in the kernel do the following:

    saved_mask = current->cpus_allowed;
    set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
    somefunc();
    /* restore the previous state */
    set_cpus_allowed_ptr(current, &saved_mask);

    This is bad, because a process's cpumask is observable and
    manipulatable by userspace and should not be toyed with.

    We have the infrastructure, this just creates a nice wrapper to
    encourage its use.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    ---
    include/linux/workqueue.h | 8 +++++++
    kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
    2 files changed, 56 insertions(+)

    diff -r b72e0cbdd249 include/linux/workqueue.h
    --- a/include/linux/workqueue.h Thu Oct 23 00:39:36 2008 +1100
    +++ b/include/linux/workqueue.h Thu Oct 23 10:53:51 2008 +1100
    @@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
    cancel_delayed_work_sync(work);
    }

    +#ifndef CONFIG_SMP
    +static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    +{
    + return fn(arg);
    +}
    +#else
    +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
    +#endif /* CONFIG_SMP */
    #endif
    diff -r b72e0cbdd249 kernel/workqueue.c
    --- a/kernel/workqueue.c Thu Oct 23 00:39:36 2008 +1100
    +++ b/kernel/workqueue.c Thu Oct 23 10:53:51 2008 +1100
    @@ -970,6 +970,54 @@ undo:
    return ret;
    }

    +#ifdef CONFIG_SMP
    +struct work_for_cpu {
    + struct work_struct work;
    + long (*fn)(void *);
    + void *arg;
    + long ret;
    + struct completion done;
    +};
    +
    +static void do_work_for_cpu(struct work_struct *w)
    +{
    + struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
    +
    + wfc->ret = wfc->fn(wfc->arg);
    + complete(&wfc->done);
    +}
    +
    +/**
    + * work_on_cpu - run a function in user context on a particular cpu
    + * @cpu: the cpu to run on
    + * @fn: the function to run
    + * @arg: the function arg
    + *
    + * This will return -EINVAL in the cpu is not online, or the return value
    + * of @fn otherwise.
    + */
    +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    +{
    + struct work_for_cpu wfc;
    +
    + INIT_WORK(&wfc.work, do_work_for_cpu);
    + init_completion(&wfc.done);
    + wfc.fn = fn;
    + wfc.arg = arg;
    + get_online_cpus();
    + if (unlikely(!cpu_online(cpu))) {
    + wfc.ret = -EINVAL;
    + complete(&wfc.done);
    + } else
    + schedule_work_on(cpu, &wfc.work);
    + put_online_cpus();
    + wait_for_completion(&wfc.done);
    +
    + return wfc.ret;
    +}
    +EXPORT_SYMBOL_GPL(work_on_cpu);
    +#endif /* CONFIG_SMP */
    +
    void __init init_workqueues(void)
    {
    cpu_populated_map = cpu_online_map;

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Thu, Oct 23, 2008 at 01:01:28AM +0000, Rusty Russell wrote:
    >
    > Several places in the kernel do the following:
    >
    > saved_mask = current->cpus_allowed;
    > set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
    > somefunc();
    > /* restore the previous state */
    > set_cpus_allowed_ptr(current, &saved_mask);
    >
    > This is bad, because a process's cpumask is observable and
    > manipulatable by userspace and should not be toyed with.
    >
    > We have the infrastructure, this just creates a nice wrapper to
    > encourage its use.
    >
    > Signed-off-by: Rusty Russell
    > Cc: Ingo Molnar
    > ---
    > include/linux/workqueue.h | 8 +++++++
    > kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
    > 2 files changed, 56 insertions(+)
    >
    > diff -r b72e0cbdd249 include/linux/workqueue.h
    > --- a/include/linux/workqueue.h Thu Oct 23 00:39:36 2008 +1100
    > +++ b/include/linux/workqueue.h Thu Oct 23 10:53:51 2008 +1100
    > @@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
    > cancel_delayed_work_sync(work);
    >



    > + */
    > +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    > +{
    > + struct work_for_cpu wfc;
    > +
    > + INIT_WORK(&wfc.work, do_work_for_cpu);
    > + init_completion(&wfc.done);
    > + wfc.fn = fn;
    > + wfc.arg = arg;
    > + get_online_cpus();
    > + if (unlikely(!cpu_online(cpu))) {
    > + wfc.ret = -EINVAL;
    > + complete(&wfc.done);
    > + } else
    > + schedule_work_on(cpu, &wfc.work);


    Won't this cause priority inversion in case of real-time tasks ?

    > + put_online_cpus();
    > + wait_for_completion(&wfc.done);
    > +
    > + return wfc.ret;
    > +}
    > +EXPORT_SYMBOL_GPL(work_on_cpu);
    > +#endif /* CONFIG_SMP */
    > +
    > void __init init_workqueues(void)
    > {
    > cpu_populated_map = cpu_online_map;
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On 10/23, Rusty Russell wrote:
    >
    > +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    > +{
    > + struct work_for_cpu wfc;
    > +
    > + INIT_WORK(&wfc.work, do_work_for_cpu);
    > + init_completion(&wfc.done);
    > + wfc.fn = fn;
    > + wfc.arg = arg;
    > + get_online_cpus();
    > + if (unlikely(!cpu_online(cpu))) {
    > + wfc.ret = -EINVAL;
    > + complete(&wfc.done);
    > + } else
    > + schedule_work_on(cpu, &wfc.work);


    I do not claim this is wrong, but imho the code is a bit lisleading and
    needs a comment (or the "fix", please see below).

    Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
    to another cpu.

    > + put_online_cpus();
    > + wait_for_completion(&wfc.done);


    Actually you don't need work_for_cpu->done, you can use flush_work().

    IOW, I'd suggest

    long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    {
    struct work_for_cpu wfc;

    INIT_WORK(&wfc.work, do_work_for_cpu);
    wfc.fn = fn;
    wfc.arg = arg;
    wfc.ret = -EINVAL;

    get_online_cpus();
    if (likely(cpu_online(cpu))) {
    schedule_work_on(cpu, &wfc.work);
    flush_work(&wfc.work);
    }
    put_online_cpus();

    return wfc.ret;
    }

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Thu, Oct 23, 2008 at 11:40:36AM +0200, Oleg Nesterov wrote:
    > On 10/23, Rusty Russell wrote:
    > >
    > > +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    > > +{
    > > + struct work_for_cpu wfc;
    > > +
    > > + INIT_WORK(&wfc.work, do_work_for_cpu);
    > > + init_completion(&wfc.done);
    > > + wfc.fn = fn;
    > > + wfc.arg = arg;
    > > + get_online_cpus();
    > > + if (unlikely(!cpu_online(cpu))) {
    > > + wfc.ret = -EINVAL;
    > > + complete(&wfc.done);
    > > + } else
    > > + schedule_work_on(cpu, &wfc.work);

    >
    > I do not claim this is wrong, but imho the code is a bit lisleading and
    > needs a comment (or the "fix", please see below).
    >
    > Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
    > to another cpu.


    True.

    >
    > > + put_online_cpus();
    > > + wait_for_completion(&wfc.done);

    >
    > Actually you don't need work_for_cpu->done, you can use flush_work().
    >
    > IOW, I'd suggest
    >
    > long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    > {
    > struct work_for_cpu wfc;
    >
    > INIT_WORK(&wfc.work, do_work_for_cpu);
    > wfc.fn = fn;
    > wfc.arg = arg;
    > wfc.ret = -EINVAL;
    >
    > get_online_cpus();
    > if (likely(cpu_online(cpu))) {
    > schedule_work_on(cpu, &wfc.work);
    > flush_work(&wfc.work);
    > }


    OK, how about doing the following? That will solve the problem
    of deadlock you pointed out in patch 6.

    get_online_cpus();
    if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
    schedule_work_on(cpu, &wfc.work);
    flush_work(&wfc.work);
    } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
    /*
    * We're the CPU-Hotplug thread. Call the
    * function synchronously so that we don't
    * deadlock with any pending work-item blocked
    * on get_online_cpus()
    */
    cpumask_t orignal_mask = current->cpus_allowed;
    set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
    wfc.ret = fn(arg);
    set_cpus_allowed_ptr(current, &original_mask);

    }
    > put_online_cpus();
    >
    > return wfc.ret;
    > }
    >
    > Oleg.
    >


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Thursday 23 October 2008 20:40:36 Oleg Nesterov wrote:
    > Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
    > to another cpu.


    Thanks Oleg. It is wrong. I'll use your suggestion; it's simpler.

    Cheers,
    Rusty.

    work_on_cpu: helper for doing task on a CPU.

    Several places in the kernel do the following:

    saved_mask = current->cpus_allowed;
    set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
    somefunc();
    /* restore the previous state */
    set_cpus_allowed_ptr(current, &saved_mask);

    This is bad, because a process's cpumask is observable and
    manipulatable by userspace and should not be toyed with.

    We have the infrastructure, this just creates a nice wrapper to
    encourage its use.

    Signed-off-by: Rusty Russell
    Cc: Ingo Molnar
    ---
    include/linux/workqueue.h | 8 ++++++++
    kernel/workqueue.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
    2 files changed, 53 insertions(+)

    diff -r 494880dd124b include/linux/workqueue.h
    --- a/include/linux/workqueue.h Fri Oct 24 00:29:43 2008 +1100
    +++ b/include/linux/workqueue.h Fri Oct 24 01:16:52 2008 +1100
    @@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
    cancel_delayed_work_sync(work);
    }

    +#ifndef CONFIG_SMP
    +static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    +{
    + return fn(arg);
    +}
    +#else
    +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
    +#endif /* CONFIG_SMP */
    #endif
    diff -r 494880dd124b kernel/workqueue.c
    --- a/kernel/workqueue.c Fri Oct 24 00:29:43 2008 +1100
    +++ b/kernel/workqueue.c Fri Oct 24 01:16:52 2008 +1100
    @@ -970,6 +970,51 @@ undo:
    return ret;
    }

    +#ifdef CONFIG_SMP
    +struct work_for_cpu {
    + struct work_struct work;
    + long (*fn)(void *);
    + void *arg;
    + long ret;
    +};
    +
    +static void do_work_for_cpu(struct work_struct *w)
    +{
    + struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
    +
    + wfc->ret = wfc->fn(wfc->arg);
    +}
    +
    +/**
    + * work_on_cpu - run a function in user context on a particular cpu
    + * @cpu: the cpu to run on
    + * @fn: the function to run
    + * @arg: the function arg
    + *
    + * This will return -EINVAL in the cpu is not online, or the return value
    + * of @fn otherwise.
    + */
    +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    +{
    + struct work_for_cpu wfc;
    +
    + INIT_WORK(&wfc.work, do_work_for_cpu);
    + wfc.fn = fn;
    + wfc.arg = arg;
    + get_online_cpus();
    + if (unlikely(!cpu_online(cpu)))
    + wfc.ret = -EINVAL;
    + else {
    + schedule_work_on(cpu, &wfc.work);
    + flush_workqueue(&wfc.work);
    + }
    + put_online_cpus();
    +
    + return wfc.ret;
    +}
    +EXPORT_SYMBOL_GPL(work_on_cpu);
    +#endif /* CONFIG_SMP */
    +
    void __init init_workqueues(void)
    {
    cpu_populated_map = cpu_online_map;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On 10/23, Gautham R Shenoy wrote:
    >
    > On Thu, Oct 23, 2008 at 11:40:36AM +0200, Oleg Nesterov wrote:
    > >
    > > IOW, I'd suggest
    > >
    > > long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
    > > {
    > > struct work_for_cpu wfc;
    > >
    > > INIT_WORK(&wfc.work, do_work_for_cpu);
    > > wfc.fn = fn;
    > > wfc.arg = arg;
    > > wfc.ret = -EINVAL;
    > >
    > > get_online_cpus();
    > > if (likely(cpu_online(cpu))) {
    > > schedule_work_on(cpu, &wfc.work);
    > > flush_work(&wfc.work);
    > > }

    >
    > OK, how about doing the following? That will solve the problem
    > of deadlock you pointed out in patch 6.
    >
    > get_online_cpus();
    > if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
    > schedule_work_on(cpu, &wfc.work);
    > flush_work(&wfc.work);
    > } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
    > /*
    > * We're the CPU-Hotplug thread. Call the
    > * function synchronously so that we don't
    > * deadlock with any pending work-item blocked
    > * on get_online_cpus()
    > */
    > cpumask_t orignal_mask = current->cpus_allowed;
    > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
    > wfc.ret = fn(arg);
    > set_cpus_allowed_ptr(current, &original_mask);


    Not sure I understand...

    _cpu_up() does raw_notifier_call_chain(CPU_ONLINE) after __cpu_up(),
    at this point per_cpu(cpu_state) == CPU_ONLINE.

    (OK, this is not exactly true, start_secondary() updates cpu_online_map
    and only then cpu_state = CPU_ONLINE, but __cpu_up() waits for
    cpu_online(cpu) == T).


    Anyway, personally I dislike this special case. We must not use work_on_cpu()
    if we hold the lock which can be used by some work_struct, cpu_hotplug is not
    special at all.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. do_boot_cpu can deadlock?

    Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?

    It is called from _cpu_up() under cpu_hotplug_begin(), and it
    waits for c_idle.work. Again, if we have the pending work which
    needs get_online_cpus() we seem to have problems.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: do_boot_cpu can deadlock?

    On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
    > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
    >
    > It is called from _cpu_up() under cpu_hotplug_begin(), and it
    > waits for c_idle.work. Again, if we have the pending work which
    > needs get_online_cpus() we seem to have problems.


    Good point. Though this code gets triggered mostly during boot time when
    the CPUs are being brought online for the first time. If we have some
    work-item pending at that time, which needs get_online_cpus(), we could
    possibly see this deadlock.

    >
    > Oleg.


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: do_boot_cpu can deadlock?

    [Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
    | On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
    | > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
    | >
    | > It is called from _cpu_up() under cpu_hotplug_begin(), and it
    | > waits for c_idle.work. Again, if we have the pending work which
    | > needs get_online_cpus() we seem to have problems.
    |
    | Good point. Though this code gets triggered mostly during boot time when
    | the CPUs are being brought online for the first time. If we have some
    | work-item pending at that time, which needs get_online_cpus(), we could
    | possibly see this deadlock.
    |
    | >
    | > Oleg.
    |
    | --
    | Thanks and Regards
    | gautham
    |

    May I ask? If I understand right we do use this part of do_boot_cpu

    if (!keventd_up() || current_is_keventd())
    c_idle.work.func(&c_idle.work);
    else {
    schedule_work(&c_idle.work);
    wait_for_completion(&c_idle.done);
    }

    if only we've been called the first time after power on. And all
    subsequent call of this do_boot_cpu would lead to

    if (c_idle.idle) {
    c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
    (THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
    init_idle(c_idle.idle, cpu);
    goto do_rest;
    }

    ie go to do_rest and no wait_for_completion/schedule_work at all.
    Did I miss something? *Sorry* in advance if the question is quite
    not related. This work-pending situation is in 'possible' scenario
    only (ie we don't have such a callers for now... yet)?

    - Cyrill -
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Friday 24 October 2008 01:36:05 Gautham R Shenoy wrote:
    > OK, how about doing the following? That will solve the problem
    > of deadlock you pointed out in patch 6.
    >
    > get_online_cpus();
    > if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
    > schedule_work_on(cpu, &wfc.work);
    > flush_work(&wfc.work);
    > } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
    > /*
    > * We're the CPU-Hotplug thread. Call the
    > * function synchronously so that we don't
    > * deadlock with any pending work-item blocked
    > * on get_online_cpus()
    > */
    > cpumask_t orignal_mask = current->cpus_allowed;
    > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
    > wfc.ret = fn(arg);
    > set_cpus_allowed_ptr(current, &original_mask);
    > }


    Hi Gautham, Oleg,

    Unfortunately that's exactly what I'm trying to get away from: another cpumask
    on the stack

    The cpu hotplug thread is just whoever wrote 0 to "online" in sys. And in
    fact it already plays with its cpumask, which should be fixed too.

    I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
    never use work_on_cpu in the hotplug cpu path. Then we use
    smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
    cpu hotplug path to use schedule_work_on() itself rather than playing games
    with cpumask.

    If you agree, I'll spin the patches...

    Thanks for the brainpower,
    Rusty.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
    > On Friday 24 October 2008 01:36:05 Gautham R Shenoy wrote:
    > > OK, how about doing the following? That will solve the problem
    > > of deadlock you pointed out in patch 6.
    > >
    > > get_online_cpus();
    > > if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
    > > schedule_work_on(cpu, &wfc.work);
    > > flush_work(&wfc.work);
    > > } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
    > > /*
    > > * We're the CPU-Hotplug thread. Call the
    > > * function synchronously so that we don't
    > > * deadlock with any pending work-item blocked
    > > * on get_online_cpus()
    > > */
    > > cpumask_t orignal_mask = current->cpus_allowed;
    > > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
    > > wfc.ret = fn(arg);
    > > set_cpus_allowed_ptr(current, &original_mask);
    > > }

    >
    > Hi Gautham, Oleg,
    >
    > Unfortunately that's exactly what I'm trying to get away from: another cpumask
    > on the stack


    Oh, okay, understood.
    >
    > The cpu hotplug thread is just whoever wrote 0 to "online" in sys. And in
    > fact it already plays with its cpumask, which should be fixed too.


    Right, we do that during cpu_offline to ensure that we don't run on the
    cpu which is to be offlined.

    >
    > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
    > never use work_on_cpu in the hotplug cpu path. Then we use
    > smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
    > cpu hotplug path to use schedule_work_on() itself rather than playing games
    > with cpumask.
    >
    > If you agree, I'll spin the patches...


    How about the following?

    We go with this method, but instead of piggybacking on
    the generic kevents workqueue, we create our own on_each_cpu_wq, for this
    purpose.

    Since the worker threads of this workqueue run only the work-items
    queued by us, and since we take care of the cpu-hotplug locking _before_
    queueing the work item, we can be sure that we won't deadlock
    since no work-item when running can block on get_online_cpus().

    This would hold good even for those work-items queued from
    the CPU-Hotplug notifier call path.

    Thus we can have the same semantics everywhere, rather than having
    multiple cases.

    Does this make sense?
    >
    > Thanks for the brainpower,
    > Rusty.
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: do_boot_cpu can deadlock?

    On 10/23, Cyrill Gorcunov wrote:
    >
    > [Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
    > | On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
    > | > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
    > | >
    > | > It is called from _cpu_up() under cpu_hotplug_begin(), and it
    > | > waits for c_idle.work. Again, if we have the pending work which
    > | > needs get_online_cpus() we seem to have problems.
    > |
    > | Good point. Though this code gets triggered mostly during boot time when
    > | the CPUs are being brought online for the first time. If we have some
    > | work-item pending at that time, which needs get_online_cpus(), we could
    > | possibly see this deadlock.
    > |
    > | >
    > | > Oleg.
    > |
    > | --
    > | Thanks and Regards
    > | gautham
    > |
    >
    > May I ask? If I understand right we do use this part of do_boot_cpu
    >
    > if (!keventd_up() || current_is_keventd())
    > c_idle.work.func(&c_idle.work);
    > else {
    > schedule_work(&c_idle.work);
    > wait_for_completion(&c_idle.done);
    > }
    >
    > if only we've been called the first time after power on. And all
    > subsequent call of this do_boot_cpu would lead to
    >
    > if (c_idle.idle) {
    > c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
    > (THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
    > init_idle(c_idle.idle, cpu);
    > goto do_rest;
    > }
    >
    > ie go to do_rest and no wait_for_completion/schedule_work at all.
    > Did I miss something? *Sorry* in advance if the question is quite
    > not related. This work-pending situation is in 'possible' scenario
    > only (ie we don't have such a callers for now... yet)?


    There are no problems during boot time, afaics.

    kernel_init() calls smp_init() before do_basic_setup()->init_workqueues().
    This means that do_boot_cpu() won't use workqueues due to !keventd_up().

    But let's suppose we boot with maxcpus=1, and then bring up another CPU.
    Or we really add the new physical CPU (I don't really know if this is
    possible on x86).

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: do_boot_cpu can deadlock?

    On Fri, Oct 24, 2008 at 11:33:42AM +0200, Oleg Nesterov wrote:
    > On 10/23, Cyrill Gorcunov wrote:
    > >
    > > [Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
    > > | On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
    > > | > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
    > > | >
    > > | > It is called from _cpu_up() under cpu_hotplug_begin(), and it
    > > | > waits for c_idle.work. Again, if we have the pending work which
    > > | > needs get_online_cpus() we seem to have problems.
    > > |
    > > | Good point. Though this code gets triggered mostly during boot time when
    > > | the CPUs are being brought online for the first time. If we have some
    > > | work-item pending at that time, which needs get_online_cpus(), we could
    > > | possibly see this deadlock.
    > > |
    > > | >
    > > | > Oleg.
    > > |
    > > | --
    > > | Thanks and Regards
    > > | gautham
    > > |
    > >
    > > May I ask? If I understand right we do use this part of do_boot_cpu
    > >
    > > if (!keventd_up() || current_is_keventd())
    > > c_idle.work.func(&c_idle.work);
    > > else {
    > > schedule_work(&c_idle.work);
    > > wait_for_completion(&c_idle.done);
    > > }
    > >
    > > if only we've been called the first time after power on. And all
    > > subsequent call of this do_boot_cpu would lead to
    > >
    > > if (c_idle.idle) {
    > > c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
    > > (THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
    > > init_idle(c_idle.idle, cpu);
    > > goto do_rest;
    > > }
    > >
    > > ie go to do_rest and no wait_for_completion/schedule_work at all.
    > > Did I miss something? *Sorry* in advance if the question is quite
    > > not related. This work-pending situation is in 'possible' scenario
    > > only (ie we don't have such a callers for now... yet)?

    >
    > There are no problems during boot time, afaics.
    >
    > kernel_init() calls smp_init() before do_basic_setup()->init_workqueues().
    > This means that do_boot_cpu() won't use workqueues due to !keventd_up().
    >
    > But let's suppose we boot with maxcpus=1, and then bring up another CPU.
    > Or we really add the new physical CPU (I don't really know if this is
    > possible on x86).


    Even I am not sure if physical hotplug is possible.

    But think about the virtualization case when we want to add
    additional CPU to a KVM guest at runtime? This
    is not such a rare use-case. It could dead-lock that time, No?

    >
    > Oleg.


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On 10/24, Gautham R Shenoy wrote:
    >
    > On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
    > >
    > > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
    > > never use work_on_cpu in the hotplug cpu path. Then we use
    > > smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
    > > cpu hotplug path to use schedule_work_on() itself rather than playing games
    > > with cpumask.
    > >
    > > If you agree, I'll spin the patches...

    >
    > How about the following?
    >
    > We go with this method, but instead of piggybacking on
    > the generic kevents workqueue, we create our own on_each_cpu_wq, for this
    > purpose.


    Gautham, Rusty, I am a bit lost on this discussion...

    Why should we care about this deadlock? Just do not use work_on_cpu() from
    the hotplug cpu path, that is all.

    Once again, the "cpu_hotplug_begin()" lock is not special. You can't use
    work_on_cpu() under (say) rtnl_lock() for the same reason, this lock is
    used by work->func() too.

    Perhaps I missed something, and work_on_cpu() is really important for
    cpu-hotplug path?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: do_boot_cpu can deadlock?

    On 10/24, Gautham R Shenoy wrote:
    >
    > On Fri, Oct 24, 2008 at 11:33:42AM +0200, Oleg Nesterov wrote:
    >
    > > But let's suppose we boot with maxcpus=1, and then bring up another CPU.
    > > Or we really add the new physical CPU (I don't really know if this is
    > > possible on x86).

    >
    > Even I am not sure if physical hotplug is possible.
    >
    > But think about the virtualization case when we want to add
    > additional CPU to a KVM guest at runtime? This
    > is not such a rare use-case. It could dead-lock that time, No?


    virtualization/KVM is a black magic to me I don't know how/if it is
    possible to add CPU at runtime.

    Anyway, booting with maxcpus=1 allows us to bring up another CPU later,
    and idle_thread_array[cpu] == NULL in that case, yes?

    Perhaps smp_prepare_cpus() can do fork_idle() for_each_possible_cpu() ?
    Or we can change do_boot_cpu() to use kthread_run() to create the idle
    thread.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Friday 24 October 2008 21:29:57 Oleg Nesterov wrote:
    > On 10/24, Gautham R Shenoy wrote:
    > > On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
    > > > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to
    > > > ensure we never use work_on_cpu in the hotplug cpu path. Then we use
    > > > smp_call_function() for that hard intel_cacheinfo case. Finally, we
    > > > fix the cpu hotplug path to use schedule_work_on() itself rather than
    > > > playing games with cpumask.
    > > >
    > > > If you agree, I'll spin the patches...

    > >
    > > How about the following?
    > >
    > > We go with this method, but instead of piggybacking on
    > > the generic kevents workqueue, we create our own on_each_cpu_wq, for this
    > > purpose.

    >
    > Gautham, Rusty, I am a bit lost on this discussion...
    >
    > Why should we care about this deadlock? Just do not use work_on_cpu() from
    > the hotplug cpu path, that is all.


    No, I agree with you (Oleg). Gautham's proposal would work, but at the cost
    of yet another thread per CPU

    Since we know how to handle the one problematic case Oleg spotted, *and* we
    know how to BUG_ON to make sure noone introduces new ones, I think this is
    clearest.

    Thanks,
    Rusty.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Fri, Oct 24, 2008 at 12:29:57PM +0200, Oleg Nesterov wrote:
    > On 10/24, Gautham R Shenoy wrote:
    > >
    > > On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
    > > >
    > > > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
    > > > never use work_on_cpu in the hotplug cpu path. Then we use
    > > > smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
    > > > cpu hotplug path to use schedule_work_on() itself rather than playing games
    > > > with cpumask.
    > > >
    > > > If you agree, I'll spin the patches...

    > >
    > > How about the following?
    > >
    > > We go with this method, but instead of piggybacking on
    > > the generic kevents workqueue, we create our own on_each_cpu_wq, for this
    > > purpose.

    >
    > Gautham, Rusty, I am a bit lost on this discussion...
    >
    > Why should we care about this deadlock? Just do not use work_on_cpu() from
    > the hotplug cpu path, that is all.
    >
    > Once again, the "cpu_hotplug_begin()" lock is not special. You can't use
    > work_on_cpu() under (say) rtnl_lock() for the same reason, this lock is
    > used by work->func() too.
    >
    > Perhaps I missed something, and work_on_cpu() is really important for
    > cpu-hotplug path?


    Rusty, Oleg,

    Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
    is a good thing. But maintaining it can be difficult.

    We've seen that in the past with the cpucontrol mutex.
    We had clear rules that functions which get called in
    cpu-hotplug callback paths, shouldn't take this mutex. But with
    functions that were called in the cpu-hotplug notifier
    path as well as normal paths, it created a whole locking mess,
    and took quite some time to fix.

    Similarly, right now, we can have a BUG_ON() which notifies us whenever
    someone ends up calling a function that invokes work_on_cpu() from the
    CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.

    On the other hand, if we have a mechanism that's guaranteed to work
    irrespective of the callpaths, why not use that ?

    I am not opposed to the proposed design.
    Just voicing out an alternative thought. I could be completely wrong :-)

    >
    > Oleg.
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On 10/24, Gautham R Shenoy wrote:
    >
    > Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
    > is a good thing. But maintaining it can be difficult.
    >
    > We've seen that in the past with the cpucontrol mutex.
    > We had clear rules that functions which get called in
    > cpu-hotplug callback paths, shouldn't take this mutex. But with
    > functions that were called in the cpu-hotplug notifier
    > path as well as normal paths, it created a whole locking mess,
    > and took quite some time to fix.
    >
    > Similarly, right now, we can have a BUG_ON() which notifies us whenever
    > someone ends up calling a function that invokes work_on_cpu() from the
    > CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.
    >
    > On the other hand, if we have a mechanism that's guaranteed to work
    > irrespective of the callpaths, why not use that ?


    If we add another wq for work_on_cpu(), then we add another hard-to-maintain
    rule: get_online_cpus() must not be used by any work which can be queued
    on that wq. And, yet another per-cpu thread...

    Personally I don't even think we need a BUG_ON() in work_on_cpu(), because
    I don't think cpu-hotplug path is so special.

    Not that I have a strong opinion though.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On Fri, Oct 24, 2008 at 03:25:09PM +0200, Oleg Nesterov wrote:
    > On 10/24, Gautham R Shenoy wrote:
    > >
    > > Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
    > > is a good thing. But maintaining it can be difficult.
    > >
    > > We've seen that in the past with the cpucontrol mutex.
    > > We had clear rules that functions which get called in
    > > cpu-hotplug callback paths, shouldn't take this mutex. But with
    > > functions that were called in the cpu-hotplug notifier
    > > path as well as normal paths, it created a whole locking mess,
    > > and took quite some time to fix.
    > >
    > > Similarly, right now, we can have a BUG_ON() which notifies us whenever
    > > someone ends up calling a function that invokes work_on_cpu() from the
    > > CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.
    > >
    > > On the other hand, if we have a mechanism that's guaranteed to work
    > > irrespective of the callpaths, why not use that ?

    >
    > If we add another wq for work_on_cpu(), then we add another hard-to-maintain
    > rule: get_online_cpus() must not be used by any work which can be queued
    > on that wq. And, yet another per-cpu thread...


    No, we don't have that rule!

    Because using Rusty's function with a seperate workqueue,
    we queue the work item as follows:

    get_online_cpus();
    queue_work_on(cpu, &on_each_cpu_wq, &wfc.work);
    flush_work(&wfc.work);
    put_online_cpus();

    The very fact that we've successfully queued the work-item means that
    no cpu-hotplug operation can occur till our work item finishes
    execution.

    Hence the work can use get_online_cpus()!

    Yes, we end up using additional resources in the form of another per-cpu
    threads. But is that so much of an issue?

    >
    > Personally I don't even think we need a BUG_ON() in work_on_cpu(), because
    > I don't think cpu-hotplug path is so special.
    >
    > Not that I have a strong opinion though.
    >
    > Oleg.


    --
    Thanks and Regards
    gautham
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

    On 10/24, Gautham R Shenoy wrote:
    >
    > On Fri, Oct 24, 2008 at 03:25:09PM +0200, Oleg Nesterov wrote:
    > >
    > > If we add another wq for work_on_cpu(), then we add another hard-to-maintain
    > > rule: get_online_cpus() must not be used by any work which can be queued
    > > on that wq. And, yet another per-cpu thread...

    >
    > No, we don't have that rule!
    >
    > Because using Rusty's function with a seperate workqueue,
    > we queue the work item as follows:
    >
    > get_online_cpus();
    > queue_work_on(cpu, &on_each_cpu_wq, &wfc.work);
    > flush_work(&wfc.work);
    > put_online_cpus();
    >
    > The very fact that we've successfully queued the work-item means that
    > no cpu-hotplug operation can occur till our work item finishes
    > execution.


    Ah yes, thanks for correcting me.

    > Yes, we end up using additional resources in the form of another per-cpu
    > threads. But is that so much of an issue?


    Well, don't ask me... but the only reason why we need these threads
    is that we want to make work_on_cpu() useable from cpu-hotplug path,
    a bit strange

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread