[PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data - Kernel

This is a discussion on [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data - Kernel ; From: Hiroshi Shimamoto The following assignment in smp_call_function_many() may cause unexpected behavior, when !CPUMASK_OFFSTACK. data->cpumask = allbutself; Because it copys pointer of stack and the value will be modified after exit from smp_call_function_many(). The type of cpumask field of call_function_data ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

  1. [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    From: Hiroshi Shimamoto

    The following assignment in smp_call_function_many() may cause unexpected
    behavior, when !CPUMASK_OFFSTACK.
    data->cpumask = allbutself;

    Because it copys pointer of stack and the value will be modified after
    exit from smp_call_function_many().

    The type of cpumask field of call_function_data structure should be
    cpumask_var_t and an operation to assign is needed.

    Signed-off-by: Hiroshi Shimamoto
    ---
    include/linux/cpumask.h | 9 +++++++++
    kernel/smp.c | 4 ++--
    2 files changed, 11 insertions(+), 2 deletions(-)

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index d1f22ee..7bfc0f1 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -511,6 +511,10 @@ typedef struct cpumask *cpumask_var_t;

    bool alloc_cpumask_var(cpumask_var_t *mask, gfp_t flags);
    void free_cpumask_var(cpumask_var_t mask);
    +static inline void assign_cpumask_var(cpumask_var_t *dst, cpumask_var_t src)
    +{
    + *dst = src;
    +}

    #else
    typedef struct cpumask cpumask_var_t[1];
    @@ -524,6 +528,11 @@ static inline void free_cpumask_var(cpumask_var_t mask)
    {
    }

    +static inline void assign_cpumask_var(cpumask_var_t *dst, cpumask_var_t src)
    +{
    + (*dst)[0] = src[0];
    +}
    +
    #endif /* CONFIG_CPUMASK_OFFSTACK */

    /*
    diff --git a/kernel/smp.c b/kernel/smp.c
    index dccbb42..da98191 100644
    --- a/kernel/smp.c
    +++ b/kernel/smp.c
    @@ -24,7 +24,7 @@ struct call_function_data {
    struct call_single_data csd;
    spinlock_t lock;
    unsigned int refs;
    - struct cpumask *cpumask;
    + cpumask_var_t cpumask;
    struct rcu_head rcu_head;
    };

    @@ -370,7 +370,7 @@ void smp_call_function_many(const struct cpumask *mask,
    data->csd.func = func;
    data->csd.info = info;
    data->refs = num_cpus;
    - data->cpumask = allbutself;
    + assign_cpumask_var(&data->cpumask, allbutself);

    spin_lock_irqsave(&call_function_lock, flags);
    list_add_tail_rcu(&data->csd.list, &call_function_queue);
    --
    1.5.6

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    > From: Hiroshi Shimamoto
    >
    > The following assignment in smp_call_function_many() may cause unexpected
    > behavior, when !CPUMASK_OFFSTACK.
    > data->cpumask = allbutself;
    >
    > Because it copys pointer of stack and the value will be modified after
    > exit from smp_call_function_many().


    Good catch!

    > The type of cpumask field of call_function_data structure should be
    > cpumask_var_t and an operation to assign is needed.


    This makes the lifetime rules dependent on the config option, which is
    complicated.

    Your insight into this issue is appreciated: this code is not simple!

    How's this version instead? It puts the cpumask at the end of the kmalloc,
    and falls back to smp_call_function_single instead of doing obscure
    quiescing stuff.

    (Compiles, untested).

    Thanks!
    Rusty.

    diff -r 60e2190a18cd kernel/smp.c
    --- a/kernel/smp.c Fri Oct 24 20:22:52 2008 +1100
    +++ b/kernel/smp.c Fri Oct 24 22:02:59 2008 +1100
    @@ -24,8 +24,8 @@ struct call_function_data {
    struct call_single_data csd;
    spinlock_t lock;
    unsigned int refs;
    - struct cpumask *cpumask;
    struct rcu_head rcu_head;
    + unsigned long cpumask_bits[];
    };

    struct call_single_queue {
    @@ -109,13 +109,13 @@ void generic_smp_call_function_interrupt
    list_for_each_entry_rcu(data, &call_function_queue, csd.list) {
    int refs;

    - if (!cpumask_test_cpu(cpu, data->cpumask))
    + if (!cpumask_test_cpu(cpu, to_cpumask(data->cpumask_bits)))
    continue;

    data->csd.func(data->csd.info);

    spin_lock(&data->lock);
    - cpumask_clear_cpu(cpu, data->cpumask);
    + cpumask_clear_cpu(cpu, to_cpumask(data->cpumask_bits));
    WARN_ON(data->refs == 0);
    data->refs--;
    refs = data->refs;
    @@ -265,42 +265,6 @@ void __smp_call_function_single(int cpu,
    generic_exec_single(cpu, data);
    }

    -/* Dummy function */
    -static void quiesce_dummy(void *unused)
    -{
    -}
    -
    -/*
    - * Ensure stack based data used in call function mask is safe to free.
    - *
    - * This is needed by smp_call_function_many when using on-stack data, because
    - * a single call function queue is shared by all CPUs, and any CPU may pick up
    - * the data item on the queue at any time before it is deleted. So we need to
    - * ensure that all CPUs have transitioned through a quiescent state after
    - * this call.
    - *
    - * This is a very slow function, implemented by sending synchronous IPIs to
    - * all possible CPUs. For this reason, we have to alloc data rather than use
    - * stack based data even in the case of synchronous calls. The stack based
    - * data is then just used for deadlock/oom fallback which will be very rare.
    - *
    - * If a faster scheme can be made, we could go back to preferring stack based
    - * data -- the data allocation/free is non-zero cost.
    - */
    -static void smp_call_function_mask_quiesce_stack(const struct cpumask *mask)
    -{
    - struct call_single_data data;
    - int cpu;
    -
    - data.func = quiesce_dummy;
    - data.info = NULL;
    -
    - for_each_cpu(cpu, mask) {
    - data.flags = CSD_FLAG_WAIT;
    - generic_exec_single(cpu, &data);
    - }
    -}
    -
    /**
    * smp_call_function_many(): Run a function on a set of other CPUs.
    * @mask: The set of cpus to run on (only runs on online subset).
    @@ -320,73 +284,59 @@ void smp_call_function_many(const struct
    void (*func)(void *), void *info,
    bool wait)
    {
    - struct call_function_data d;
    - struct call_function_data *data = NULL;
    - cpumask_var_t allbutself;
    + struct call_function_data *data;
    unsigned long flags;
    - int cpu, num_cpus;
    - int slowpath = 0;
    + int cpu, next_cpu;

    /* Can deadlock when called with interrupts disabled */
    WARN_ON(irqs_disabled());

    - if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
    + /* So, what's a CPU they want? Ignoring this one. */
    + cpu = cpumask_first_and(mask, cpu_online_mask);
    + if (cpu == smp_processor_id())
    + cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    + /* Nothing? We're done. */
    + if (cpu >= nr_cpu_ids)
    + return;
    +
    + /* Do we have another CPU which isn't us? */
    + next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    + if (cpu == smp_processor_id())
    + next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
    +
    + /* Nope! Fastpath: do that cpu by itself. */
    + if (next_cpu >= nr_cpu_ids)
    + smp_call_function_single(cpu, func, info, wait);
    +
    + data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
    + if (unlikely(!data)) {
    /* Slow path. */
    for_each_online_cpu(cpu) {
    if (cpumask_test_cpu(cpu, mask))
    smp_call_function_single(cpu, func, info, wait);
    }
    - return;
    - }
    - cpumask_and(allbutself, cpu_online_mask, mask);
    - cpumask_clear_cpu(smp_processor_id(), allbutself);
    - num_cpus = cpumask_weight(allbutself);
    -
    - /*
    - * If zero CPUs, return. If just a single CPU, turn this request
    - * into a targetted single call instead since it's faster.
    - */
    - if (!num_cpus)
    - return;
    - else if (num_cpus == 1) {
    - cpu = cpumask_first(allbutself);
    - smp_call_function_single(cpu, func, info, wait);
    - goto out;
    - }
    -
    - data = kmalloc(sizeof(*data), GFP_ATOMIC);
    - if (data) {
    - data->csd.flags = CSD_FLAG_ALLOC;
    - if (wait)
    - data->csd.flags |= CSD_FLAG_WAIT;
    - } else {
    - data = &d;
    - data->csd.flags = CSD_FLAG_WAIT;
    - wait = 1;
    - slowpath = 1;
    + return;
    }

    spin_lock_init(&data->lock);
    + data->csd.flags = CSD_FLAG_ALLOC;
    + if (wait)
    + data->csd.flags |= CSD_FLAG_WAIT;
    data->csd.func = func;
    data->csd.info = info;
    - data->refs = num_cpus;
    - data->cpumask = allbutself;
    + cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);
    + data->refs = cpumask_weight(to_cpumask(data->cpumask_bits));

    spin_lock_irqsave(&call_function_lock, flags);
    list_add_tail_rcu(&data->csd.list, &call_function_queue);
    spin_unlock_irqrestore(&call_function_lock, flags);

    /* Send a message to all CPUs in the map */
    - arch_send_call_function_ipi((cpumask_t)*allbutself );
    + arch_send_call_function_ipi(*to_cpumask(data->cpumask_bits));

    /* optionally wait for the CPUs to complete */
    - if (wait) {
    + if (wait)
    csd_flag_wait(&data->csd);
    - if (unlikely(slowpath))
    - smp_call_function_mask_quiesce_stack(allbutself);
    - }
    -out:
    - free_cpumask_var(allbutself);
    }
    EXPORT_SYMBOL(smp_call_function_many);




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    > From: Hiroshi Shimamoto


    Ingo, because of these concerns I recommend you revert
    d4de5ac3b5e70928c86e3e5ac311f16cbf2e9ab3 (cpumask: smp_call_function_many())
    for now, and apply this less contentious version.

    Subject: cpumask: smp_call_function_many()
    From: Rusty Russell

    Transition from cpumask_t-taking smp_call_function_mask() to a new
    smp_call_function_many() which takes a struct cpumask * instead.

    (Naming is inspired by smp_call_function_single).

    Unfortunately, converting the function implementation properly is
    non-trivial, but what we care about is the API, so this simply wraps
    it.

    Note that the new one returns void: the old one couldn't fail either
    unless there was a logic bug.

    The old smp_call_function_mask() isn't marked __deprecated, because
    sparc64 builds with -Werr in arch/sparc (thanks Stephen).

    Signed-off-by: Rusty Russell

    diff -r cdd2da35b209 include/linux/smp.h
    --- a/include/linux/smp.h Fri Oct 24 14:10:08 2008 +1100
    +++ b/include/linux/smp.h Fri Oct 24 22:08:55 2008 +1100
    @@ -64,8 +64,17 @@ extern void smp_cpus_done(unsigned int m
    * Call a function on all other processors
    */
    int smp_call_function(void(*func)(void *info), void *info, int wait);
    +/* Deprecated: use smp_call_function_many() which uses a cpumask ptr. */
    int smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
    int wait);
    +
    +static inline void smp_call_function_many(const struct cpumask *mask,
    + void (*func)(void *info), void *info,
    + int wait)
    +{
    + smp_call_function_mask(*mask, func, info, wait);
    +}
    +
    int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
    int wait);
    void __smp_call_function_single(int cpuid, struct call_single_data *data);


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    Rusty Russell wrote:
    > On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    >> From: Hiroshi Shimamoto
    >>
    >> The following assignment in smp_call_function_many() may cause unexpected
    >> behavior, when !CPUMASK_OFFSTACK.
    >> data->cpumask = allbutself;
    >>
    >> Because it copys pointer of stack and the value will be modified after
    >> exit from smp_call_function_many().

    >
    > Good catch!
    >
    >> The type of cpumask field of call_function_data structure should be
    >> cpumask_var_t and an operation to assign is needed.

    >
    > This makes the lifetime rules dependent on the config option, which is
    > complicated.


    Okay, I see.

    >
    > Your insight into this issue is appreciated: this code is not simple!
    >
    > How's this version instead? It puts the cpumask at the end of the kmalloc,
    > and falls back to smp_call_function_single instead of doing obscure
    > quiescing stuff.
    >
    > (Compiles, untested).


    comments below, not tested.

    >
    > Thanks!
    > Rusty.
    >
    > diff -r 60e2190a18cd kernel/smp.c
    > --- a/kernel/smp.c Fri Oct 24 20:22:52 2008 +1100
    > +++ b/kernel/smp.c Fri Oct 24 22:02:59 2008 +1100
    > @@ -24,8 +24,8 @@ struct call_function_data {
    > struct call_single_data csd;
    > spinlock_t lock;
    > unsigned int refs;
    > - struct cpumask *cpumask;
    > struct rcu_head rcu_head;
    > + unsigned long cpumask_bits[];
    > };
    >
    > struct call_single_queue {
    > @@ -109,13 +109,13 @@ void generic_smp_call_function_interrupt
    > list_for_each_entry_rcu(data, &call_function_queue, csd.list) {
    > int refs;
    >
    > - if (!cpumask_test_cpu(cpu, data->cpumask))
    > + if (!cpumask_test_cpu(cpu, to_cpumask(data->cpumask_bits)))
    > continue;
    >
    > data->csd.func(data->csd.info);
    >
    > spin_lock(&data->lock);
    > - cpumask_clear_cpu(cpu, data->cpumask);
    > + cpumask_clear_cpu(cpu, to_cpumask(data->cpumask_bits));
    > WARN_ON(data->refs == 0);
    > data->refs--;
    > refs = data->refs;
    > @@ -265,42 +265,6 @@ void __smp_call_function_single(int cpu,
    > generic_exec_single(cpu, data);
    > }
    >
    > -/* Dummy function */
    > -static void quiesce_dummy(void *unused)
    > -{
    > -}
    > -
    > -/*
    > - * Ensure stack based data used in call function mask is safe to free.
    > - *
    > - * This is needed by smp_call_function_many when using on-stack data, because
    > - * a single call function queue is shared by all CPUs, and any CPU may pick up
    > - * the data item on the queue at any time before it is deleted. So we need to
    > - * ensure that all CPUs have transitioned through a quiescent state after
    > - * this call.
    > - *
    > - * This is a very slow function, implemented by sending synchronous IPIs to
    > - * all possible CPUs. For this reason, we have to alloc data rather than use
    > - * stack based data even in the case of synchronous calls. The stack based
    > - * data is then just used for deadlock/oom fallback which will be very rare.
    > - *
    > - * If a faster scheme can be made, we could go back to preferring stack based
    > - * data -- the data allocation/free is non-zero cost.
    > - */
    > -static void smp_call_function_mask_quiesce_stack(const struct cpumask *mask)
    > -{
    > - struct call_single_data data;
    > - int cpu;
    > -
    > - data.func = quiesce_dummy;
    > - data.info = NULL;
    > -
    > - for_each_cpu(cpu, mask) {
    > - data.flags = CSD_FLAG_WAIT;
    > - generic_exec_single(cpu, &data);
    > - }
    > -}
    > -
    > /**
    > * smp_call_function_many(): Run a function on a set of other CPUs.
    > * @mask: The set of cpus to run on (only runs on online subset).
    > @@ -320,73 +284,59 @@ void smp_call_function_many(const struct
    > void (*func)(void *), void *info,
    > bool wait)
    > {
    > - struct call_function_data d;
    > - struct call_function_data *data = NULL;
    > - cpumask_var_t allbutself;
    > + struct call_function_data *data;
    > unsigned long flags;
    > - int cpu, num_cpus;
    > - int slowpath = 0;
    > + int cpu, next_cpu;
    >
    > /* Can deadlock when called with interrupts disabled */
    > WARN_ON(irqs_disabled());
    >
    > - if (!alloc_cpumask_var(&allbutself, GFP_ATOMIC)) {
    > + /* So, what's a CPU they want? Ignoring this one. */
    > + cpu = cpumask_first_and(mask, cpu_online_mask);
    > + if (cpu == smp_processor_id())
    > + cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    > + /* Nothing? We're done. */
    > + if (cpu >= nr_cpu_ids)
    > + return;
    > +
    > + /* Do we have another CPU which isn't us? */
    > + next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);


    I'm not sure,
    if next_cpu == smp_processor_id() &&
    cpumask_next_and(next_cpu, ...) >= nr_cpu_ids

    we can go fastpath, right?

    > + if (cpu == smp_processor_id())


    so, next_cpu == smp_processor_id() ?

    > + next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
    > +
    > + /* Nope! Fastpath: do that cpu by itself. */
    > + if (next_cpu >= nr_cpu_ids)
    > + smp_call_function_single(cpu, func, info, wait);


    first path, should return here?

    > +
    > + data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
    > + if (unlikely(!data)) {
    > /* Slow path. */
    > for_each_online_cpu(cpu) {
    > if (cpumask_test_cpu(cpu, mask))


    I guess, another issue, should skip when cpu == smp_processor_id().

    > smp_call_function_single(cpu, func, info, wait);
    > }
    > - return;
    > - }
    > - cpumask_and(allbutself, cpu_online_mask, mask);
    > - cpumask_clear_cpu(smp_processor_id(), allbutself);
    > - num_cpus = cpumask_weight(allbutself);
    > -
    > - /*
    > - * If zero CPUs, return. If just a single CPU, turn this request
    > - * into a targetted single call instead since it's faster.
    > - */
    > - if (!num_cpus)
    > - return;
    > - else if (num_cpus == 1) {
    > - cpu = cpumask_first(allbutself);
    > - smp_call_function_single(cpu, func, info, wait);
    > - goto out;
    > - }
    > -
    > - data = kmalloc(sizeof(*data), GFP_ATOMIC);
    > - if (data) {
    > - data->csd.flags = CSD_FLAG_ALLOC;
    > - if (wait)
    > - data->csd.flags |= CSD_FLAG_WAIT;
    > - } else {
    > - data = &d;
    > - data->csd.flags = CSD_FLAG_WAIT;
    > - wait = 1;
    > - slowpath = 1;
    > + return;
    > }
    >
    > spin_lock_init(&data->lock);
    > + data->csd.flags = CSD_FLAG_ALLOC;
    > + if (wait)
    > + data->csd.flags |= CSD_FLAG_WAIT;
    > data->csd.func = func;
    > data->csd.info = info;
    > - data->refs = num_cpus;
    > - data->cpumask = allbutself;
    > + cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);


    I guess, clear itself is needed.

    thanks,
    Hiroshi Shimamoto

    > + data->refs = cpumask_weight(to_cpumask(data->cpumask_bits));
    >
    > spin_lock_irqsave(&call_function_lock, flags);
    > list_add_tail_rcu(&data->csd.list, &call_function_queue);
    > spin_unlock_irqrestore(&call_function_lock, flags);
    >
    > /* Send a message to all CPUs in the map */
    > - arch_send_call_function_ipi((cpumask_t)*allbutself );
    > + arch_send_call_function_ipi(*to_cpumask(data->cpumask_bits));
    >
    > /* optionally wait for the CPUs to complete */
    > - if (wait) {
    > + if (wait)
    > csd_flag_wait(&data->csd);
    > - if (unlikely(slowpath))
    > - smp_call_function_mask_quiesce_stack(allbutself);
    > - }
    > -out:
    > - free_cpumask_var(allbutself);
    > }
    > EXPORT_SYMBOL(smp_call_function_many);
    >
    >
    >
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    On Saturday 25 October 2008 08:46:08 Hiroshi Shimamoto wrote:
    > Rusty Russell wrote:
    > > (Compiles, untested).

    >
    > comments below, not tested.


    > > + /* So, what's a CPU they want? Ignoring this one. */
    > > + cpu = cpumask_first_and(mask, cpu_online_mask);
    > > + if (cpu == smp_processor_id())
    > > + cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    > > + /* Nothing? We're done. */
    > > + if (cpu >= nr_cpu_ids)
    > > + return;
    > > +
    > > + /* Do we have another CPU which isn't us? */
    > > + next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);

    >
    > I'm not sure,
    > if next_cpu == smp_processor_id() &&
    > cpumask_next_and(next_cpu, ...) >= nr_cpu_ids
    >
    > we can go fastpath, right?


    Yes, that was intent, and after your fix below it should work.

    > > + if (cpu == smp_processor_id())

    >
    > so, next_cpu == smp_processor_id() ?


    Oh, yes. That is correct.

    > > + next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
    > > +
    > > + /* Nope! Fastpath: do that cpu by itself. */
    > > + if (next_cpu >= nr_cpu_ids)
    > > + smp_call_function_single(cpu, func, info, wait);

    >
    > first path, should return here?


    First path is when no online CPUs are in mask at all. This is when one
    online CPU is in mask.

    > > /* Slow path. */
    > > for_each_online_cpu(cpu) {
    > > if (cpumask_test_cpu(cpu, mask))

    >
    > I guess, another issue, should skip when cpu == smp_processor_id().


    Yes, thanks!

    > > + data->csd.flags = CSD_FLAG_ALLOC;
    > > + if (wait)
    > > + data->csd.flags |= CSD_FLAG_WAIT;
    > > data->csd.func = func;
    > > data->csd.info = info;
    > > - data->refs = num_cpus;
    > > - data->cpumask = allbutself;
    > > + cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);

    >
    > I guess, clear itself is needed.


    Indeed, thanks.

    Here is updated patch, also tested, and I forced the kmalloc failure path
    to test that too...

    From: Rusty Russell
    cpumask: smp_call_function_many()

    Actually change smp_call_function_mask() to smp_call_function_many().

    S390 has its own version, so we do trivial conversion on that too.

    We have to do some dancing to figure out if 0 or 1 other cpus are in
    the mask supplied and the online mask without allocating a tmp
    cpumask. It's still fairly cheap.

    We allocate the cpumask at the end of the call_function_data
    structure: if allocation fails we fallback to smp_call_function_single
    rather than using the baroque quiescing code.

    (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis
    Cc: Hiroshi Shimamoto
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    ---
    arch/s390/include/asm/smp.h | 3
    arch/s390/kernel/smp.c | 9 +-
    include/linux/smp.h | 15 ++--
    kernel/smp.c | 137 +++++++++++++++-----------------------------
    4 files changed, 60 insertions(+), 104 deletions(-)

    diff -r f03d70de1da6 arch/s390/include/asm/smp.h
    --- a/arch/s390/include/asm/smp.h Sun Oct 26 19:17:26 2008 +1100
    +++ b/arch/s390/include/asm/smp.h Mon Oct 27 09:16:22 2008 +1100
    @@ -90,9 +90,6 @@ extern int __cpu_up (unsigned int cpu);

    extern struct mutex smp_cpu_state_mutex;
    extern int smp_cpu_polarization[];
    -
    -extern int smp_call_function_mask(cpumask_t mask, void (*func)(void *),
    - void *info, int wait);
    #endif

    #ifndef CONFIG_SMP
    diff -r f03d70de1da6 arch/s390/kernel/smp.c
    --- a/arch/s390/kernel/smp.c Sun Oct 26 19:17:26 2008 +1100
    +++ b/arch/s390/kernel/smp.c Mon Oct 27 09:16:22 2008 +1100
    @@ -199,7 +199,7 @@ EXPORT_SYMBOL(smp_call_function_single);
    EXPORT_SYMBOL(smp_call_function_single);

    /**
    - * smp_call_function_mask(): Run a function on a set of other CPUs.
    + * smp_call_function_many(): Run a function on a set of other CPUs.
    * @mask: The set of cpus to run on. Must not include the current cpu.
    * @func: The function to run. This must be fast and non-blocking.
    * @info: An arbitrary pointer to pass to the function.
    @@ -213,16 +213,17 @@ EXPORT_SYMBOL(smp_call_function_single);
    * You must not call this function with disabled interrupts or from a
    * hardware interrupt handler or from a bottom half handler.
    */
    -int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
    - int wait)
    +int smp_call_function_many(const struct cpumask *maskp,
    + void (*func)(void *), void *info, bool wait)
    {
    + cpumask_t mask = *maskp;
    spin_lock(&call_lock);
    cpu_clear(smp_processor_id(), mask);
    __smp_call_function_map(func, info, wait, mask);
    spin_unlock(&call_lock);
    return 0;
    }
    -EXPORT_SYMBOL(smp_call_function_mask);
    +EXPORT_SYMBOL(smp_call_function_many);

    void smp_send_stop(void)
    {
    diff -r f03d70de1da6 include/linux/smp.h
    --- a/include/linux/smp.h Sun Oct 26 19:17:26 2008 +1100
    +++ b/include/linux/smp.h Mon Oct 27 09:16:22 2008 +1100
    @@ -64,15 +64,16 @@ extern void smp_cpus_done(unsigned int m
    * Call a function on all other processors
    */
    int smp_call_function(void(*func)(void *info), void *info, int wait);
    -/* Deprecated: use smp_call_function_many() which uses a cpumask ptr. */
    -int smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
    - int wait);
    +void smp_call_function_many(const struct cpumask *mask,
    + void (*func)(void *info), void *info, bool wait);

    -static inline void smp_call_function_many(const struct cpumask *mask,
    - void (*func)(void *info), void *info,
    - int wait)
    +/* Deprecated: Use smp_call_function_many which takes a pointer to the mask. */
    +static inline int
    +smp_call_function_mask(cpumask_t mask, void(*func)(void *info), void *info,
    + int wait)
    {
    - smp_call_function_mask(*mask, func, info, wait);
    + smp_call_function_many(&mask, func, info, wait);
    + return 0;
    }

    int smp_call_function_single(int cpuid, void (*func) (void *info), void *info,
    diff -r f03d70de1da6 kernel/smp.c
    --- a/kernel/smp.c Sun Oct 26 19:17:26 2008 +1100
    +++ b/kernel/smp.c Mon Oct 27 09:16:22 2008 +1100
    @@ -24,8 +24,8 @@ struct call_function_data {
    struct call_single_data csd;
    spinlock_t lock;
    unsigned int refs;
    - cpumask_t cpumask;
    struct rcu_head rcu_head;
    + unsigned long cpumask_bits[];
    };

    struct call_single_queue {
    @@ -109,13 +109,13 @@ void generic_smp_call_function_interrupt
    list_for_each_entry_rcu(data, &call_function_queue, csd.list) {
    int refs;

    - if (!cpu_isset(cpu, data->cpumask))
    + if (!cpumask_test_cpu(cpu, to_cpumask(data->cpumask_bits)))
    continue;

    data->csd.func(data->csd.info);

    spin_lock(&data->lock);
    - cpu_clear(cpu, data->cpumask);
    + cpumask_clear_cpu(cpu, to_cpumask(data->cpumask_bits));
    WARN_ON(data->refs == 0);
    data->refs--;
    refs = data->refs;
    @@ -265,50 +265,12 @@ void __smp_call_function_single(int cpu,
    generic_exec_single(cpu, data);
    }

    -/* Dummy function */
    -static void quiesce_dummy(void *unused)
    -{
    -}
    -
    -/*
    - * Ensure stack based data used in call function mask is safe to free.
    - *
    - * This is needed by smp_call_function_mask when using on-stack data, because
    - * a single call function queue is shared by all CPUs, and any CPU may pick up
    - * the data item on the queue at any time before it is deleted. So we need to
    - * ensure that all CPUs have transitioned through a quiescent state after
    - * this call.
    - *
    - * This is a very slow function, implemented by sending synchronous IPIs to
    - * all possible CPUs. For this reason, we have to alloc data rather than use
    - * stack based data even in the case of synchronous calls. The stack based
    - * data is then just used for deadlock/oom fallback which will be very rare.
    - *
    - * If a faster scheme can be made, we could go back to preferring stack based
    - * data -- the data allocation/free is non-zero cost.
    - */
    -static void smp_call_function_mask_quiesce_stack(cpumask_t mask)
    -{
    - struct call_single_data data;
    - int cpu;
    -
    - data.func = quiesce_dummy;
    - data.info = NULL;
    -
    - for_each_cpu_mask(cpu, mask) {
    - data.flags = CSD_FLAG_WAIT;
    - generic_exec_single(cpu, &data);
    - }
    -}
    -
    /**
    - * smp_call_function_mask(): Run a function on a set of other CPUs.
    - * @mask: The set of cpus to run on.
    + * smp_call_function_many(): Run a function on a set of other CPUs.
    + * @mask: The set of cpus to run on (only runs on online subset).
    * @func: The function to run. This must be fast and non-blocking.
    * @info: An arbitrary pointer to pass to the function.
    * @wait: If true, wait (atomically) until function has completed on other CPUs.
    - *
    - * Returns 0 on success, else a negative status code.
    *
    * If @wait is true, then returns once @func has returned. Note that @wait
    * will be implicitly turned on in case of allocation failures, since
    @@ -318,71 +280,68 @@ static void smp_call_function_mask_quies
    * hardware interrupt handler or from a bottom half handler. Preemption
    * must be disabled when calling this function.
    */
    -int smp_call_function_mask(cpumask_t mask, void (*func)(void *), void *info,
    - int wait)
    +void smp_call_function_many(const struct cpumask *mask,
    + void (*func)(void *), void *info,
    + bool wait)
    {
    - struct call_function_data d;
    - struct call_function_data *data = NULL;
    - cpumask_t allbutself;
    + struct call_function_data *data;
    unsigned long flags;
    - int cpu, num_cpus;
    - int slowpath = 0;
    + int cpu, next_cpu;

    /* Can deadlock when called with interrupts disabled */
    WARN_ON(irqs_disabled());

    - cpu = smp_processor_id();
    - allbutself = cpu_online_map;
    - cpu_clear(cpu, allbutself);
    - cpus_and(mask, mask, allbutself);
    - num_cpus = cpus_weight(mask);
    + /* So, what's a CPU they want? Ignoring this one. */
    + cpu = cpumask_first_and(mask, cpu_online_mask);
    + if (cpu == smp_processor_id())
    + cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    + /* No online cpus? We're done. */
    + if (cpu >= nr_cpu_ids)
    + return;

    - /*
    - * If zero CPUs, return. If just a single CPU, turn this request
    - * into a targetted single call instead since it's faster.
    - */
    - if (!num_cpus)
    - return 0;
    - else if (num_cpus == 1) {
    - cpu = first_cpu(mask);
    - return smp_call_function_single(cpu, func, info, wait);
    - }
    + /* Do we have another CPU which isn't us? */
    + next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
    + if (next_cpu == smp_processor_id())
    + next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);

    - data = kmalloc(sizeof(*data), GFP_ATOMIC);
    - if (data) {
    - data->csd.flags = CSD_FLAG_ALLOC;
    - if (wait)
    - data->csd.flags |= CSD_FLAG_WAIT;
    - } else {
    - data = &d;
    - data->csd.flags = CSD_FLAG_WAIT;
    - wait = 1;
    - slowpath = 1;
    + /* Fastpath: do that cpu by itself. */
    + if (next_cpu >= nr_cpu_ids)
    + smp_call_function_single(cpu, func, info, wait);
    +
    + data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC);
    + if (unlikely(!data)) {
    + /* Slow path. */
    + for_each_online_cpu(cpu) {
    + if (cpu == smp_processor_id())
    + continue;
    + if (cpumask_test_cpu(cpu, mask))
    + smp_call_function_single(cpu, func, info, wait);
    + }
    + return;
    }

    spin_lock_init(&data->lock);
    + data->csd.flags = CSD_FLAG_ALLOC;
    + if (wait)
    + data->csd.flags |= CSD_FLAG_WAIT;
    data->csd.func = func;
    data->csd.info = info;
    - data->refs = num_cpus;
    - data->cpumask = mask;
    + cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask);
    + cpumask_clear_cpu(smp_processor_id(), to_cpumask(data->cpumask_bits));
    + data->refs = cpumask_weight(to_cpumask(data->cpumask_bits));

    spin_lock_irqsave(&call_function_lock, flags);
    list_add_tail_rcu(&data->csd.list, &call_function_queue);
    spin_unlock_irqrestore(&call_function_lock, flags);

    /* Send a message to all CPUs in the map */
    - arch_send_call_function_ipi(mask);
    + arch_send_call_function_ipi(*to_cpumask(data->cpumask_bits));

    /* optionally wait for the CPUs to complete */
    - if (wait) {
    + if (wait)
    csd_flag_wait(&data->csd);
    - if (unlikely(slowpath))
    - smp_call_function_mask_quiesce_stack(mask);
    - }
    -
    - return 0;
    }
    -EXPORT_SYMBOL(smp_call_function_mask);
    +EXPORT_SYMBOL(smp_call_function_many);

    /**
    * smp_call_function(): Run a function on all other CPUs.
    @@ -390,7 +349,7 @@ EXPORT_SYMBOL(smp_call_function_mask);
    * @info: An arbitrary pointer to pass to the function.
    * @wait: If true, wait (atomically) until function has completed on other CPUs.
    *
    - * Returns 0 on success, else a negative status code.
    + * Returns 0.
    *
    * If @wait is true, then returns once @func has returned; otherwise
    * it returns just before the target cpu calls @func. In case of allocation
    @@ -401,12 +360,10 @@ EXPORT_SYMBOL(smp_call_function_mask);
    */
    int smp_call_function(void (*func)(void *), void *info, int wait)
    {
    - int ret;
    -
    preempt_disable();
    - ret = smp_call_function_mask(cpu_online_map, func, info, wait);
    + smp_call_function_many(cpu_online_mask, func, info, wait);
    preempt_enable();
    - return ret;
    + return 0;
    }
    EXPORT_SYMBOL(smp_call_function);

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data


    * Rusty Russell wrote:

    > On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    > > From: Hiroshi Shimamoto

    >
    > Ingo, because of these concerns I recommend you revert
    > d4de5ac3b5e70928c86e3e5ac311f16cbf2e9ab3 (cpumask:
    > smp_call_function_many()) for now, and apply this less contentious
    > version.


    ok - applied it to tip/cpus4096-v2, thanks Rusty!

    If there's any chance for this in v2.6.28 then only if we disable the
    dynamic API branch altogether [CONFIG_MAXCPUS] and keep that for
    v2.6.29. This means we'd bring in the API changes which should have
    trivial impact only - and none of the riskier changes.

    Hm?

    Andrew, have we missed the boat on this?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data


    * Ingo Molnar wrote:

    >
    > * Rusty Russell wrote:
    >
    > > On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    > > > From: Hiroshi Shimamoto

    > >
    > > Ingo, because of these concerns I recommend you revert
    > > d4de5ac3b5e70928c86e3e5ac311f16cbf2e9ab3 (cpumask:
    > > smp_call_function_many()) for now, and apply this less contentious
    > > version.

    >
    > ok - applied it to tip/cpus4096-v2, thanks Rusty!
    >
    > If there's any chance for this in v2.6.28 then only if we disable
    > the dynamic API branch altogether [CONFIG_MAXCPUS] and keep that for
    > v2.6.29. This means we'd bring in the API changes which should have
    > trivial impact only - and none of the riskier changes.


    in any case, i've started testing tip/cpus4096-v2 again on x86 - the
    problem with d4de5a above was the only outstanding known issue, right?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data


    * Ingo Molnar wrote:

    > > If there's any chance for this in v2.6.28 then only if we disable
    > > the dynamic API branch altogether [CONFIG_MAXCPUS] and keep that
    > > for v2.6.29. This means we'd bring in the API changes which should
    > > have trivial impact only - and none of the riskier changes.

    >
    > in any case, i've started testing tip/cpus4096-v2 again on x86 - the
    > problem with d4de5a above was the only outstanding known issue,
    > right?


    plus there are also the fixlets below.

    Ingo

    -------------->
    From a14b735fb7a82bb6561449dda4067365af7ee95c Mon Sep 17 00:00:00 2001
    From: Rusty Russell
    Date: Thu, 23 Oct 2008 14:35:31 -0700
    Subject: [PATCH] cpumask: fixlets

    [ from Mike ]

    Here are the only changes I could find from Rusty's last patches that
    apply to tip/cpus4096-v2.

    * Fix NR_CPUS reference in arch/powerpc/platforms/cell/spu_base.c

    * modify arch/x86/Kconfig so CONFIG_NR_CPUS is always defined. Also it
    does not prompt if MAXSMP is set.

    * change include/linux/threads.h so CONFIG_NR_CPUS is defined for those
    arch's that do not define it.

    Signed-of-by: Rusty Russell
    Signed-of-by: Mike Travis
    Signed-off-by: Ingo Molnar
    ---
    arch/powerpc/platforms/cell/spu_base.c | 9 ++++++---
    arch/x86/Kconfig | 2 +-
    include/linux/threads.h | 16 ++++++++--------
    3 files changed, 15 insertions(+), 12 deletions(-)

    diff --git a/arch/powerpc/platforms/cell/spu_base.c b/arch/powerpc/platforms/cell/spu_base.c
    index a5bdb89..a876904 100644
    --- a/arch/powerpc/platforms/cell/spu_base.c
    +++ b/arch/powerpc/platforms/cell/spu_base.c
    @@ -111,10 +111,13 @@ void spu_flush_all_slbs(struct mm_struct *mm)
    */
    static inline void mm_needs_global_tlbie(struct mm_struct *mm)
    {
    - int nr = (NR_CPUS > 1) ? NR_CPUS : NR_CPUS + 1;
    -
    /* Global TLBIE broadcast required with SPEs. */
    - __cpus_setall(&mm->cpu_vm_mask, nr);
    + if (NR_CPUS > 1)
    + cpumask_setall(&mm->cpu_vm_mask);
    + else {
    + cpumask_set_cpu(0, &mm->cpu_vm_mask);
    + cpumask_set_cpu(1, &mm->cpu_vm_mask);
    + }
    }

    void spu_associate_mm(struct spu *spu, struct mm_struct *mm)
    diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
    index d574cd6..a901f59 100644
    --- a/arch/x86/Kconfig
    +++ b/arch/x86/Kconfig
    @@ -585,9 +585,9 @@ config MAXSMP
    If unsure, say N.

    config NR_CPUS
    - depends on SMP
    int "Maximum number of CPUs" if SMP && !MAXSMP
    range 2 512 if SMP && !MAXSMP
    + default "1" if !SMP
    default "4096" if MAXSMP
    default "32" if X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000
    default "8"
    diff --git a/include/linux/threads.h b/include/linux/threads.h
    index 38d1a5d..052b12b 100644
    --- a/include/linux/threads.h
    +++ b/include/linux/threads.h
    @@ -8,17 +8,17 @@
    */

    /*
    - * Maximum supported processors that can run under SMP. This value is
    - * set via configure setting. The maximum is equal to the size of the
    - * bitmasks used on that platform, i.e. 32 or 64. Setting this smaller
    - * saves quite a bit of memory.
    + * Maximum supported processors. Setting this smaller saves quite a
    + * bit of memory. Use nr_cpu_ids instead of this except for static bitmaps.
    */
    -#ifdef CONFIG_SMP
    -#define NR_CPUS CONFIG_NR_CPUS
    -#else
    -#define NR_CPUS 1
    +#ifndef CONFIG_NR_CPUS
    +/* FIXME: This should be fixed in the arch's Kconfig */
    +#define CONFIG_NR_CPUS 1
    #endif

    +/* Places which use this should consider cpumask_var_t. */
    +#define NR_CPUS CONFIG_NR_CPUS
    +
    #define MIN_THREADS_LEFT_FOR_ROOT 4

    /*
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data


    * Ingo Molnar wrote:

    > in any case, i've started testing tip/cpus4096-v2 again on x86 - the
    > problem with d4de5a above was the only outstanding known issue, right?


    the sched_init() slab corruption bug is still there, i just triggered it
    on two separate test-systems:

    [ 0.510620] CPU1 attaching sched-domain:
    [ 0.512007] domain 0: span 0-1 level CPU
    [ 0.517730] groups: 1 0
    [ 0.520528] ================================================== ===========================
    [ 0.524002] BUG kmalloc-8: Wrong object count. Counter is 11 but counted were 50
    [ 0.524002] -----------------------------------------------------------------------------
    [ 0.524002]

    i've pushed out that specific tree: tip/tmp.cpus4096-v2.broken, which
    you should be able to reproduce via:

    git remote update
    git checkout -b tmp.test tip/tmp.cpus4096-v2.broken

    config attached. This config should just run through a 'make oldconfig'
    fine and if the bzImage is built, it should produce the slab corruption
    messages on any typical 64-bit PC. I've attached the boot log below as
    well. (You can see the gcc version i used in the bootup log as well.)

    I'll disable MAXSMP in these testruns - just to establish the stability
    without any of the dynamic-cpumask_t stuff.

    Ingo

    [ 0.000000] BIOS EBDA/lowmem at: 0009f800/0009f800
    [ 0.000000] Initializing cgroup subsys cpuset
    [ 0.000000] Linux version 2.6.28-rc2-tip-00767-gd1142e8 (mingo@dione) (gcc version 4.2.3) #45748 SMP Mon Oct 27 14:24:20 CET 2008
    [ 0.000000] Command line: root=/dev/sda6 earlyprintk=serial,ttyS0,115200,keep console=tty debug initcall_debug apic=verbose sysrq_always_enabled ignore_loglevel selinux=0 nmi_watchdog=2 idle=poll panic=1
    [ 0.000000] KERNEL supported cpus:
    [ 0.000000] Intel GenuineIntel
    [ 0.000000] AMD AuthenticAMD
    [ 0.000000] Centaur CentaurHauls
    [ 0.000000] BIOS-provided physical RAM map:
    [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
    [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
    [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
    [ 0.000000] BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
    [ 0.000000] BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
    [ 0.000000] BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
    [ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
    [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
    [ 0.000000] console [earlyser0] enabled
    [ 0.000000] debug: ignoring loglevel setting.
    [ 0.000000] using polling idle threads.
    [ 0.000000] DMI 2.3 present.
    [ 0.000000] Phoenix BIOS detected: BIOS may corrupt low RAM, working it around.
    [ 0.000000] last_pfn = 0x3fff0 max_arch_pfn = 0x3ffffffff
    [ 0.000000] init_memory_mapping
    [ 0.000000] 0000000000 - 003fff0000 page 4k
    [ 0.000000] kernel direct mapping tables up to 3fff0000 @ 29c8000-2bca000
    [ 0.000000] last_map_addr: 3fff0000 end: 3fff0000
    [ 0.000000] ACPI: RSDP 000F76F0, 0014 (r0 Nvidia)
    [ 0.000000] ACPI: RSDT 3FFF3040, 0034 (r1 Nvidia AWRDACPI 42302E31 AWRD 0)
    [ 0.000000] ACPI: FACP 3FFF30C0, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD 0)
    [ 0.000000] ACPI: DSDT 3FFF3180, 6264 (r1 NVIDIA AWRDACPI 1000 MSFT 100000E)
    [ 0.000000] ACPI: FACS 3FFF0000, 0040
    [ 0.000000] ACPI: SRAT 3FFF9500, 00A0 (r1 AMD HAMMER 1 AMD 1)
    [ 0.000000] ACPI: MCFG 3FFF9600, 003C (r1 Nvidia AWRDACPI 42302E31 AWRD 0)
    [ 0.000000] ACPI: APIC 3FFF9440, 007C (r1 Nvidia AWRDACPI 42302E31 AWRD 0)
    [ 0.000000] ACPI: Local APIC address 0xfee00000
    [ 0.000000] (5 early reservations) ==> bootmem [0000000000 - 003fff0000]
    [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
    [ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
    [ 0.000000] #2 [0000200000 - 00029c7e38] TEXT DATA BSS ==> [0000200000 - 00029c7e38]
    [ 0.000000] #3 [000009f800 - 0000100000] BIOS reserved ==> [000009f800 - 0000100000]
    [ 0.000000] #4 [00029c8000 - 0002bc8000] PGTABLE ==> [00029c8000 - 0002bc8000]
    [ 0.000000] Scan SMP from ffff880000000000 for 1024 bytes.
    [ 0.000000] Scan SMP from ffff88000009fc00 for 1024 bytes.
    [ 0.000000] Scan SMP from ffff8800000f0000 for 65536 bytes.
    [ 0.000000] found SMP MP-table at [ffff8800000f5680] 000f5680
    [ 0.000000] [ffffe20000000000-ffffe200019fffff] PMD -> [ffff880002e00000-ffff8800047fffff] on node 0
    [ 0.000000] Zone PFN ranges:
    [ 0.000000] DMA 0x00000010 -> 0x00001000
    [ 0.000000] DMA32 0x00001000 -> 0x00100000
    [ 0.000000] Normal 0x00100000 -> 0x00100000
    [ 0.000000] Movable zone start PFN for each node
    [ 0.000000] early_node_map[2] active PFN ranges
    [ 0.000000] 0: 0x00000010 -> 0x0000009f
    [ 0.000000] 0: 0x00000100 -> 0x0003fff0
    [ 0.000000] On node 0 totalpages: 262015
    [ 0.000000] DMA zone: 104 pages used for memmap
    [ 0.000000] DMA zone: 101 pages reserved
    [ 0.000000] DMA zone: 3778 pages, LIFO batch:0
    [ 0.000000] DMA32 zone: 6552 pages used for memmap
    [ 0.000000] DMA32 zone: 251480 pages, LIFO batch:31
    [ 0.000000] Normal zone: 0 pages used for memmap
    [ 0.000000] Movable zone: 0 pages used for memmap
    [ 0.000000] Nvidia board detected. Ignoring ACPI timer override.
    [ 0.000000] If you got timer trouble try acpi_use_timer_override
    [ 0.000000] ACPI: PM-Timer IO Port: 0x4008
    [ 0.000000] ACPI: Local APIC address 0xfee00000
    [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
    [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
    [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
    [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
    [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
    [ 0.000000] IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23
    [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
    [ 0.000000] ACPI: BIOS IRQ0 pin2 override ignored.
    [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
    [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
    [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge)
    [ 0.000000] ACPI: IRQ9 used by override.
    [ 0.000000] ACPI: IRQ14 used by override.
    [ 0.000000] ACPI: IRQ15 used by override.
    [ 0.000000] Using ACPI (MADT) for SMP configuration information
    [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
    [ 0.000000] mapped APIC to ffffffffff5fc000 (fee00000)
    [ 0.000000] mapped IOAPIC to ffffffffff5fb000 (fec00000)
    [ 0.000000] PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
    [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
    [ 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
    [ 0.000000] Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
    [ 0.000000] PERCPU: Allocating 1900544 bytes of per cpu data
    [ 0.000000] NR_CPUS:4096 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
    [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 255258
    [ 0.000000] Kernel command line: root=/dev/sda6 earlyprintk=serial,ttyS0,115200,keep console=tty debug initcall_debug apic=verbose sysrq_always_enabled ignore_loglevel selinux=0 nmi_watchdog=2 idle=poll panic=1
    [ 0.000000] debug: sysrq always enabled.
    [ 0.000000] Initializing CPU#0
    [ 0.000000] RCU-based detection of stalled CPUs is enabled.
    [ 0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
    [ 0.000000] Fast TSC calibration using PIT
    [ 0.000000] Detected 2010.302 MHz processor.
    [ 0.004000] spurious 8259A interrupt: IRQ7.
    [ 0.004000] Console: colour VGA+ 80x25
    [ 0.004000] console [tty0] enabled
    [ 0.004000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
    [ 0.004000] ... MAX_LOCKDEP_SUBCLASSES: 8
    [ 0.004000] ... MAX_LOCK_DEPTH: 48
    [ 0.004000] ... MAX_LOCKDEP_KEYS: 8191
    [ 0.004000] ... CLASSHASH_SIZE: 4096
    [ 0.004000] ... MAX_LOCKDEP_ENTRIES: 8192
    [ 0.004000] ... MAX_LOCKDEP_CHAINS: 16384
    [ 0.004000] ... CHAINHASH_SIZE: 8192
    [ 0.004000] memory used by lock dependency info: 4351 kB
    [ 0.004000] per task-struct memory footprint: 2688 bytes
    [ 0.004000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
    [ 0.004000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
    [ 0.004000] allocated 10485760 bytes of page_cgroup
    [ 0.004000] please try cgroup_disable=memory option if you don't want
    [ 0.004000] Checking aperture...
    [ 0.004000] No AGP bridge found
    [ 0.004000] Node 0: aperture @ 20000000 size 32 MB
    [ 0.004000] Aperture pointing to e820 RAM. Ignoring.
    [ 0.004000] Memory: 962348k/1048512k available (8300k kernel code, 452k absent, 85124k reserved, 12145k data, 2444k init)
    [ 0.004000] SLUB: Genslabs=12, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
    [ 0.004018] Calibrating delay loop (skipped), value calculated using timer frequency.. 4020.60 BogoMIPS (lpj=8041208)
    [ 0.012108] Security Framework initialized
    [ 0.016012] SELinux: Initializing.
    [ 0.020150] SELinux: Starting in permissive mode
    [ 0.024095] Mount-cache hash table entries: 256
    [ 0.033317] Initializing cgroup subsys ns
    [ 0.036024] Initializing cgroup subsys memory
    [ 0.040041] Initializing cgroup subsys devices
    [ 0.044028] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
    [ 0.048007] CPU: L2 Cache: 512K (64 bytes/line)
    [ 0.052007] tseg: 0000000000
    [ 0.056024] CPU: Physical Processor ID: 0
    [ 0.060007] CPU: Processor Core ID: 0
    [ 0.065643] ACPI: Core revision 20080926
    [ 0.105845] ftrace: converting mcount calls to 0f 1f 44 00 00
    [ 0.108010] ftrace: allocating 24346 entries in 191 pages
    [ 0.116342] Setting APIC routing to flat
    [ 0.120010] enabled ExtINT on CPU#0
    [ 0.124175] ENABLING IO-APIC IRQs
    [ 0.128006] init IO_APIC IRQs
    [ 0.130950] IOAPIC[0]: Set routing entry (2-0 -> 0x30 -> IRQ 0 Mode:0 Active:0)
    [ 0.132014] IOAPIC[0]: Set routing entry (2-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
    [ 0.136012] IOAPIC[0]: Set routing entry (2-3 -> 0x33 -> IRQ 3 Mode:0 Active:0)
    [ 0.140011] IOAPIC[0]: Set routing entry (2-4 -> 0x34 -> IRQ 4 Mode:0 Active:0)
    [ 0.144011] IOAPIC[0]: Set routing entry (2-5 -> 0x35 -> IRQ 5 Mode:0 Active:0)
    [ 0.148011] IOAPIC[0]: Set routing entry (2-6 -> 0x36 -> IRQ 6 Mode:0 Active:0)
    [ 0.152011] IOAPIC[0]: Set routing entry (2-7 -> 0x37 -> IRQ 7 Mode:0 Active:0)
    [ 0.156011] IOAPIC[0]: Set routing entry (2-8 -> 0x38 -> IRQ 8 Mode:0 Active:0)
    [ 0.160011] IOAPIC[0]: Set routing entry (2-9 -> 0x39 -> IRQ 9 Mode:1 Active:0)
    [ 0.164012] IOAPIC[0]: Set routing entry (2-10 -> 0x3a -> IRQ 10 Mode:0 Active:0)
    [ 0.168011] IOAPIC[0]: Set routing entry (2-11 -> 0x3b -> IRQ 11 Mode:0 Active:0)
    [ 0.172011] IOAPIC[0]: Set routing entry (2-12 -> 0x3c -> IRQ 12 Mode:0 Active:0)
    [ 0.176011] IOAPIC[0]: Set routing entry (2-13 -> 0x3d -> IRQ 13 Mode:0 Active:0)
    [ 0.180011] IOAPIC[0]: Set routing entry (2-14 -> 0x3e -> IRQ 14 Mode:0 Active:0)
    [ 0.184011] IOAPIC[0]: Set routing entry (2-15 -> 0x3f -> IRQ 15 Mode:0 Active:0)
    [ 0.188010] 2-16 2-17 2-18 2-19 2-20 2-21 2-22 2-23 (apicid-pin) not connected
    [ 0.195539] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
    [ 0.237588] CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
    [ 0.244005] Using local APIC timer interrupts.
    [ 0.244007] calibrating APIC timer ...
    [ 0.252001] ... lapic delta = 1256509
    [ 0.252001] ... PM timer delta = 357974
    [ 0.252001] ... PM timer result ok
    [ 0.252001] ..... delta 1256509
    [ 0.252001] ..... mult: 53963277
    [ 0.252001] ..... calibration result: 804165
    [ 0.252001] ..... CPU clock speed is 2010.1658 MHz.
    [ 0.252001] ..... host bus clock speed is 201.0165 MHz.
    [ 0.252025] calling migration_init+0x0/0x5b @ 1
    [ 0.256159] initcall migration_init+0x0/0x5b returned 1 after 0 usecs
    [ 0.260007] initcall migration_init+0x0/0x5b returned with error code 1
    [ 0.264005] calling spawn_ksoftirqd+0x0/0x58 @ 1
    [ 0.268128] initcall spawn_ksoftirqd+0x0/0x58 returned 0 after 0 usecs
    [ 0.272007] calling init_call_single_data+0x0/0x78 @ 1
    [ 0.276007] initcall init_call_single_data+0x0/0x78 returned 0 after 0 usecs
    [ 0.280006] calling relay_init+0x0/0x14 @ 1
    [ 0.284007] initcall relay_init+0x0/0x14 returned 0 after 0 usecs
    [ 0.288006] calling tracer_alloc_buffers+0x0/0x170 @ 1
    [ 0.293945] initcall tracer_alloc_buffers+0x0/0x170 returned 0 after 0 usecs
    [ 0.296292] lockdep: fixing up alternatives.
    [ 0.300181] Booting processor 1 APIC 0x1 ip 0x6000
    [ 0.004000] Initializing CPU#1
    [ 0.004000] masked ExtINT on CPU#1
    [ 0.004000] Calibrating delay using timer specific routine.. 4020.87 BogoMIPS (lpj=8041759)
    [ 0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
    [ 0.004000] CPU: L2 Cache: 512K (64 bytes/line)
    [ 0.004000] CPU: Physical Processor ID: 0
    [ 0.004000] CPU: Processor Core ID: 1
    [ 0.396174] CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
    [ 0.408037] Brought up 2 CPUs
    [ 0.412006] Total of 2 processors activated (8041.48 BogoMIPS).
    [ 0.416080] Testing NMI watchdog ... OK.
    [ 0.501354] CPU0 attaching sched-domain:
    [ 0.504009] domain 0: span 0-1 level CPU
    [ 0.508005] groups: 0 1
    [ 0.510620] CPU1 attaching sched-domain:
    [ 0.512007] domain 0: span 0-1 level CPU
    [ 0.517730] groups: 1 0
    [ 0.520528] ================================================== ===========================
    [ 0.524002] BUG kmalloc-8: Wrong object count. Counter is 11 but counted were 50
    [ 0.524002] -----------------------------------------------------------------------------
    [ 0.524002]
    [ 0.524002] INFO: Slab 0xffffe200019cc270 objects=51 used=11 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.524002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.524002] Call Trace:
    [ 0.524002] [] slab_err+0x99/0xa7
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.524002] [] ? rq_attach_root+0xc0/0xc9
    [ 0.524002] [] ? cpu_attach_domain+0x5a2/0x5d7
    [ 0.524002] [] ? slab_pad_check+0xa7/0x11f
    [ 0.524002] [] on_freelist+0x1bd/0x1ff
    [ 0.524002] [] __slab_free+0x1a5/0x2fc
    [ 0.524002] [] ? free_cpumask_var+0x9/0xb
    [ 0.524002] [] kfree+0xf0/0x128
    [ 0.524002] [] ? free_cpumask_var+0x9/0xb
    [ 0.524002] [] free_cpumask_var+0x9/0xb
    [ 0.524002] [] __build_sched_domains+0x5de/0x616
    [ 0.524002] [] sched_init_smp+0xa0/0x23a
    [ 0.524002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.524002] [] ? check_object+0x15a/0x20b
    [ 0.524002] [] ? init_object+0x6c/0x74
    [ 0.524002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.524002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.524002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.524002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.524002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.524002] [] kernel_init+0x192/0x216
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] child_rip+0xa/0x11
    [ 0.524002] [] ? restore_args+0x0/0x30
    [ 0.524002] [] ? kernel_init+0x0/0x216
    [ 0.524002] [] ? child_rip+0x0/0x11
    [ 0.524002] FIX kmalloc-8: Object count adjusted.
    [ 0.524002] ================================================== ===========================
    [ 0.524002] BUG kmalloc-8: Redzone overwritten
    [ 0.524002] -----------------------------------------------------------------------------
    [ 0.524002]
    [ 0.524002] INFO: 0xffff88003f806328-0xffff88003f80632f. First byte 0x0 instead of 0xcc
    [ 0.524002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.524002] INFO: Object 0xffff88003f806320 @offset=800 fp=0x0000000000000000
    [ 0.524002]
    [ 0.524002] Bytes b4 0xffff88003f806310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [ 0.524002] Object 0xffff88003f806320: 02 00 00 00 00 00 00 00 ........
    [ 0.524002] Redzone 0xffff88003f806328: 00 00 00 00 00 00 00 00 ........
    [ 0.524002] Padding 0xffff88003f806368: 00 00 00 00 00 00 00 00 ........
    [ 0.524002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.524002] Call Trace:
    [ 0.524002] [] print_trailer+0x11c/0x125
    [ 0.524002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.524002] [] check_object+0x61/0x20b
    [ 0.524002] [] __slab_free+0x1c3/0x2fc
    [ 0.524002] [] ? free_cpumask_var+0x9/0xb
    [ 0.524002] [] kfree+0xf0/0x128
    [ 0.524002] [] ? free_cpumask_var+0x9/0xb
    [ 0.524002] [] free_cpumask_var+0x9/0xb
    [ 0.524002] [] __build_sched_domains+0x5de/0x616
    [ 0.524002] [] sched_init_smp+0xa0/0x23a
    [ 0.524002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.524002] [] ? check_object+0x15a/0x20b
    [ 0.524002] [] ? init_object+0x6c/0x74
    [ 0.524002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.524002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.524002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.524002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.524002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.524002] [] kernel_init+0x192/0x216
    [ 0.524002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.524002] [] child_rip+0xa/0x11
    [ 0.524002] [] ? restore_args+0x0/0x30
    [ 0.524002] [] ? kernel_init+0x0/0x216
    [ 0.524002] [] ? child_rip+0x0/0x11
    [ 0.524002] FIX kmalloc-8: Restoring 0xffff88003f806328-0xffff88003f80632f=0xcc
    [ 0.524002]
    [ 0.524006] ================================================== ===========================
    [ 0.528002] BUG kmalloc-8: Redzone overwritten
    [ 0.528002] -----------------------------------------------------------------------------
    [ 0.528002]
    [ 0.528002] INFO: 0xffff88003f8062d8-0xffff88003f8062df. First byte 0x0 instead of 0xcc
    [ 0.528002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.528002] INFO: Object 0xffff88003f8062d0 @offset=720 fp=0x0000000000000000
    [ 0.528002]
    [ 0.528002] Bytes b4 0xffff88003f8062c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [ 0.528002] Object 0xffff88003f8062d0: 03 00 00 00 00 00 00 00 ........
    [ 0.528002] Redzone 0xffff88003f8062d8: 00 00 00 00 00 00 00 00 ........
    [ 0.528002] Padding 0xffff88003f806318: 00 00 00 00 00 00 00 00 ........
    [ 0.528002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.528002] Call Trace:
    [ 0.528002] [] print_trailer+0x11c/0x125
    [ 0.528002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.528002] [] check_object+0x61/0x20b
    [ 0.528002] [] __slab_free+0x1c3/0x2fc
    [ 0.528002] [] ? free_cpumask_var+0x9/0xb
    [ 0.528002] [] kfree+0xf0/0x128
    [ 0.528002] [] ? free_cpumask_var+0x9/0xb
    [ 0.528002] [] free_cpumask_var+0x9/0xb
    [ 0.528002] [] __build_sched_domains+0x5e7/0x616
    [ 0.528002] [] sched_init_smp+0xa0/0x23a
    [ 0.528002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.528002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.528002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.528002] [] ? check_object+0x15a/0x20b
    [ 0.528002] [] ? init_object+0x6c/0x74
    [ 0.528002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.528002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.528002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.528002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.528002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.528002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.528002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.528002] [] kernel_init+0x192/0x216
    [ 0.528002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.528002] [] child_rip+0xa/0x11
    [ 0.528002] [] ? restore_args+0x0/0x30
    [ 0.528002] [] ? kernel_init+0x0/0x216
    [ 0.528002] [] ? child_rip+0x0/0x11
    [ 0.528002] FIX kmalloc-8: Restoring 0xffff88003f8062d8-0xffff88003f8062df=0xcc
    [ 0.528002]
    [ 0.528006] ================================================== ===========================
    [ 0.532002] BUG kmalloc-8: Redzone overwritten
    [ 0.532002] -----------------------------------------------------------------------------
    [ 0.532002]
    [ 0.532002] INFO: 0xffff88003f806288-0xffff88003f80628f. First byte 0x0 instead of 0xcc
    [ 0.532002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.532002] INFO: Object 0xffff88003f806280 @offset=640 fp=0x0000000000000000
    [ 0.532002]
    [ 0.532002] Bytes b4 0xffff88003f806270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [ 0.532002] Object 0xffff88003f806280: 00 00 00 00 00 00 00 00 ........
    [ 0.532002] Redzone 0xffff88003f806288: 00 00 00 00 00 00 00 00 ........
    [ 0.532002] Padding 0xffff88003f8062c8: 00 00 00 00 00 00 00 00 ........
    [ 0.532002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.532002] Call Trace:
    [ 0.532002] [] print_trailer+0x11c/0x125
    [ 0.532002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.532002] [] check_object+0x61/0x20b
    [ 0.532002] [] __slab_free+0x1c3/0x2fc
    [ 0.532002] [] ? free_cpumask_var+0x9/0xb
    [ 0.532002] [] kfree+0xf0/0x128
    [ 0.532002] [] ? free_cpumask_var+0x9/0xb
    [ 0.532002] [] free_cpumask_var+0x9/0xb
    [ 0.532002] [] __build_sched_domains+0x5f0/0x616
    [ 0.532002] [] sched_init_smp+0xa0/0x23a
    [ 0.532002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.532002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.532002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.532002] [] ? check_object+0x15a/0x20b
    [ 0.532002] [] ? init_object+0x6c/0x74
    [ 0.532002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.532002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.532002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.532002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.532002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.532002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.532002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.532002] [] kernel_init+0x192/0x216
    [ 0.532002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.532002] [] child_rip+0xa/0x11
    [ 0.532002] [] ? restore_args+0x0/0x30
    [ 0.532002] [] ? kernel_init+0x0/0x216
    [ 0.532002] [] ? child_rip+0x0/0x11
    [ 0.532002] FIX kmalloc-8: Restoring 0xffff88003f806288-0xffff88003f80628f=0xcc
    [ 0.532002]
    [ 0.532006] ================================================== ===========================
    [ 0.536002] BUG kmalloc-8: Redzone overwritten
    [ 0.536002] -----------------------------------------------------------------------------
    [ 0.536002]
    [ 0.536002] INFO: 0xffff88003f806238-0xffff88003f80623f. First byte 0x0 instead of 0xcc
    [ 0.536002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.536002] INFO: Object 0xffff88003f806230 @offset=560 fp=0x0000000000000000
    [ 0.536002]
    [ 0.536002] Bytes b4 0xffff88003f806220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [ 0.536002] Object 0xffff88003f806230: 00 00 00 00 00 00 00 00 ........
    [ 0.536002] Redzone 0xffff88003f806238: 00 00 00 00 00 00 00 00 ........
    [ 0.536002] Padding 0xffff88003f806278: 00 00 00 00 00 00 00 00 ........
    [ 0.536002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.536002] Call Trace:
    [ 0.536002] [] print_trailer+0x11c/0x125
    [ 0.536002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.536002] [] check_object+0x61/0x20b
    [ 0.536002] [] __slab_free+0x1c3/0x2fc
    [ 0.536002] [] ? free_cpumask_var+0x9/0xb
    [ 0.536002] [] kfree+0xf0/0x128
    [ 0.536002] [] ? free_cpumask_var+0x9/0xb
    [ 0.536002] [] free_cpumask_var+0x9/0xb
    [ 0.536002] [] __build_sched_domains+0x5f9/0x616
    [ 0.536002] [] sched_init_smp+0xa0/0x23a
    [ 0.536002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.536002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.536002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.536002] [] ? check_object+0x15a/0x20b
    [ 0.536002] [] ? init_object+0x6c/0x74
    [ 0.536002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.536002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.536002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.536002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.536002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.536002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.536002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.536002] [] kernel_init+0x192/0x216
    [ 0.536002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.536002] [] child_rip+0xa/0x11
    [ 0.536002] [] ? restore_args+0x0/0x30
    [ 0.536002] [] ? kernel_init+0x0/0x216
    [ 0.536002] [] ? child_rip+0x0/0x11
    [ 0.536002] FIX kmalloc-8: Restoring 0xffff88003f806238-0xffff88003f80623f=0xcc
    [ 0.536002]
    [ 0.536006] ================================================== ===========================
    [ 0.540002] BUG kmalloc-8: Redzone overwritten
    [ 0.540002] -----------------------------------------------------------------------------
    [ 0.540002]
    [ 0.540002] INFO: 0xffff88003f8061e8-0xffff88003f8061ef. First byte 0x0 instead of 0xcc
    [ 0.540002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.540002] INFO: Object 0xffff88003f8061e0 @offset=480 fp=0x0000000000000000
    [ 0.540002]
    [ 0.540002] Bytes b4 0xffff88003f8061d0: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
    [ 0.540002] Object 0xffff88003f8061e0: 03 00 00 00 00 00 00 00 ........
    [ 0.540002] Redzone 0xffff88003f8061e8: 00 00 00 00 00 00 00 00 ........
    [ 0.540002] Padding 0xffff88003f806228: 00 00 00 00 00 00 00 00 ........
    [ 0.540002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.540002] Call Trace:
    [ 0.540002] [] print_trailer+0x11c/0x125
    [ 0.540002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.540002] [] check_object+0x61/0x20b
    [ 0.540002] [] __slab_free+0x1c3/0x2fc
    [ 0.540002] [] ? free_cpumask_var+0x9/0xb
    [ 0.540002] [] kfree+0xf0/0x128
    [ 0.540002] [] ? free_cpumask_var+0x9/0xb
    [ 0.540002] [] free_cpumask_var+0x9/0xb
    [ 0.540002] [] __build_sched_domains+0x602/0x616
    [ 0.540002] [] sched_init_smp+0xa0/0x23a
    [ 0.540002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.540002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.540002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.540002] [] ? check_object+0x15a/0x20b
    [ 0.540002] [] ? init_object+0x6c/0x74
    [ 0.540002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.540002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.540002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.540002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.540002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.540002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.540002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.540002] [] kernel_init+0x192/0x216
    [ 0.540002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.540002] [] child_rip+0xa/0x11
    [ 0.540002] [] ? restore_args+0x0/0x30
    [ 0.540002] [] ? kernel_init+0x0/0x216
    [ 0.540002] [] ? child_rip+0x0/0x11
    [ 0.540002] FIX kmalloc-8: Restoring 0xffff88003f8061e8-0xffff88003f8061ef=0xcc
    [ 0.540002]
    [ 0.540009] ================================================== ===========================
    [ 0.544002] BUG kmalloc-8: Redzone overwritten
    [ 0.544002] -----------------------------------------------------------------------------
    [ 0.544002]
    [ 0.544002] INFO: 0xffff88003f806378-0xffff88003f80637f. First byte 0x0 instead of 0xbb
    [ 0.544002] INFO: Slab 0xffffe200019cc270 objects=51 used=50 fp=0xffff88003f806370 flags=0x40000000000000c3
    [ 0.544002] INFO: Object 0xffff88003f806370 @offset=880 fp=0x0000000000000000
    [ 0.544002]
    [ 0.544002] Bytes b4 0xffff88003f806360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    [ 0.544002] Object 0xffff88003f806370: 00 00 00 00 00 00 00 00 ........
    [ 0.544002] Redzone 0xffff88003f806378: 00 00 00 00 00 00 00 00 ........
    [ 0.544002] Padding 0xffff88003f8063b8: 00 00 00 00 00 00 00 00 ........
    [ 0.544002] Pid: 1, comm: swapper Not tainted 2.6.28-rc2-tip-00767-gd1142e8 #45748
    [ 0.544002] Call Trace:
    [ 0.544002] [] print_trailer+0x11c/0x125
    [ 0.544002] [] check_bytes_and_report+0xa5/0xcc
    [ 0.544002] [] ? register_sched_domain_sysctl+0xe0/0x439
    [ 0.544002] [] check_object+0x61/0x20b
    [ 0.544002] [] __slab_alloc+0x3f7/0x4fc
    [ 0.544002] [] ? register_sched_domain_sysctl+0xe0/0x439
    [ 0.544002] [] ? register_sched_domain_sysctl+0xe0/0x439
    [ 0.544002] [] __kmalloc_track_caller+0xa9/0x109
    [ 0.544002] [] kstrdup+0x2f/0xca
    [ 0.544002] [] register_sched_domain_sysctl+0xe0/0x439
    [ 0.544002] [] sched_init_smp+0xa5/0x23a
    [ 0.544002] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [ 0.544002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.544002] [] ? check_bytes_and_report+0x3d/0xcc
    [ 0.544002] [] ? check_object+0x15a/0x20b
    [ 0.544002] [] ? init_object+0x6c/0x74
    [ 0.544002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.544002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.544002] [] ? trace_hardirqs_on+0xd/0xf
    [ 0.544002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.544002] [] ? check_nmi_watchdog+0x20a/0x23b
    [ 0.544002] [] ? native_smp_cpus_done+0x177/0x182
    [ 0.544002] [] ? cpu_maps_update_done+0x15/0x17
    [ 0.544002] [] kernel_init+0x192/0x216
    [ 0.544002] [] ? trace_hardirqs_on_caller+0x11a/0x145
    [ 0.544002] [] child_rip+0xa/0x11
    [ 0.544002] [] ? restore_args+0x0/0x30
    [ 0.544002] [] ? kernel_init+0x0/0x216
    [ 0.544002] [] ? child_rip+0x0/0x11
    [ 0.544002] FIX kmalloc-8: Restoring 0xffff88003f806378-0xffff88003f80637f=0xbb
    [ 0.544002]
    [ 0.544002] FIX kmalloc-8: Marking all objects used
    [ 0.544278] device: 'platform': device_add
    [ 0.548046] PM: Adding info for No Buslatform
    [ 0.552460] khelper used greatest stack depth: 5288 bytes left
    [ 0.556059] bus: 'platform': registered
    [ 0.560014] Registering sysdev class 'cpu'
    [ 0.568290] calling net_ns_init+0x0/0x140 @ 1
    [ 0.572006] net_namespace: 1112 bytes
    [ 0.576030] initcall net_ns_init+0x0/0x140 returned 0 after 3906 usecs
    [ 0.580019] calling init_smp_flush+0x0/0x72 @ 1
    [ 0.584008] initcall init_smp_flush+0x0/0x72 returned 0 after 0 usecs
    [ 0.588010] calling print_banner+0x0/0xe @ 1
    [ 0.592005] Booting paravirtualized kernel on bare hardware


  10. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    On Monday 27 October 2008 23:55:12 Ingo Molnar wrote:
    > * Rusty Russell wrote:
    > > On Friday 24 October 2008 15:47:20 Hiroshi Shimamoto wrote:
    > > > From: Hiroshi Shimamoto

    > >
    > > Ingo, because of these concerns I recommend you revert
    > > d4de5ac3b5e70928c86e3e5ac311f16cbf2e9ab3 (cpumask:
    > > smp_call_function_many()) for now, and apply this less contentious
    > > version.

    >
    > ok - applied it to tip/cpus4096-v2, thanks Rusty!
    >
    > If there's any chance for this in v2.6.28 then only if we disable the
    > dynamic API branch altogether [CONFIG_MAXCPUS] and keep that for
    > v2.6.29. This means we'd bring in the API changes which should have
    > trivial impact only - and none of the riskier changes.


    Agreed. The important thing is to get the new APIs in place so we can feed
    the updates to various maintainers (esp. arch maintainers).

    Thanks,
    Rusty.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    Ingo Molnar wrote:
    > * Ingo Molnar wrote:
    >
    >> in any case, i've started testing tip/cpus4096-v2 again on x86 - the
    >> problem with d4de5a above was the only outstanding known issue, right?

    >
    > the sched_init() slab corruption bug is still there, i just triggered it
    > on two separate test-systems:
    >
    > [ 0.510620] CPU1 attaching sched-domain:
    > [ 0.512007] domain 0: span 0-1 level CPU
    > [ 0.517730] groups: 1 0
    > [ 0.520528] ================================================== ===========================
    > [ 0.524002] BUG kmalloc-8: Wrong object count. Counter is 11 but counted were 50
    > [ 0.524002] -----------------------------------------------------------------------------
    > [ 0.524002]


    Hm,

    I think kmalloc-8 is too small.
    In this case, struct cpumask is defined;

    struct cpumask {
    DECLARE_BITMAP(bits, NR_CPUS);
    };

    So, storing cpumask such as cpu_core_map, cpu_sibling_map and sd->span etc.
    requires NR_CPUS bits. In Ingo's config, it needs 4096 bits.

    At alloc_cpumask_var uses cpumask_size() for kmalloc(),

    bool alloc_cpumask_var(cpumask_var_t *mask, gfp_t flags)
    {
    if (likely(slab_is_available()))
    *mask = kmalloc(cpumask_size(), flags);

    cpumask_size() looks nr_cpumask_bits and it defined as follows;

    #define nr_cpumask_bits nr_cpu_ids

    it's CONFIG_NR_CPUS > BITS_PER_LONG case.
    And now nr_cpu_ids is 2 on this boot log.

    ....
    > [ 0.000000] PERCPU: Allocating 1900544 bytes of per cpu data
    > [ 0.000000] NR_CPUS:4096 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1


    So, kmalloc(8, flags) for cpumask_var_t at alloc_cpumask_var().
    But the content is treated as cpumask_t, it causes slab corruption
    with overwritten when the mask data is copied.

    For example, cpu_to_core_group()

    static int
    cpu_to_core_group(int cpu, const cpumask_t *cpu_map, struct sched_group **sg,
    cpumask_t *mask)
    {
    int group;

    *mask = per_cpu(cpu_sibling_map, cpu);

    this copies 0x200 bytes (= 4096 bits), compiled my environment as follows;
    ffffffff80251c56 :
    cpu_to_core_group():
    ffffffff80251c56: 55 push %rbp
    ffffffff80251c57: 48 63 ff movslq %edi,%rdi
    ffffffff80251c5a: 48 89 e5 mov %rsp,%rbp
    ffffffff80251c5d: 41 55 push %r13
    ffffffff80251c5f: 49 89 d5 mov %rdx,%r13
    ffffffff80251c62: ba 00 02 00 00 mov $0x200,%edx
    ffffffff80251c67: 41 54 push %r12
    ffffffff80251c69: 49 89 f4 mov %rsi,%r12
    ffffffff80251c6c: 48 c7 c6 00 c1 c8 81 mov $0xffffffff81c8c100,%rsi
    ffffffff80251c73: 53 push %rbx
    ffffffff80251c74: 48 89 cb mov %rcx,%rbx
    ffffffff80251c77: 48 83 ec 08 sub $0x8,%rsp
    ffffffff80251c7b: 48 8b 05 3e d0 98 01 mov 0x198d03e(%rip),%rax # ffffffff81bdecc0 <_cpu_pda>
    ffffffff80251c82: 48 8b 04 f8 mov (%rax,%rdi,8),%rax
    ffffffff80251c86: 48 89 cf mov %rcx,%rdi
    ffffffff80251c89: 48 03 70 08 add 0x8(%rax),%rsi
    ffffffff80251c8d: e8 de 29 25 00 callq ffffffff804a4670 <__memcpy>

    the 3rd parameter of __memcpy is rdx = 0x200.

    So, I guess, we need
    kmalloc(BITS_TO_LONGS(NR_CPUS), flags)
    at alloc_cpumask_var().

    Or change cpumask handling in sched.c etc?
    I've no idea for this more, now.

    thanks,
    Hiroshi Shimamoto
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    On Tuesday 28 October 2008 10:07:26 Hiroshi Shimamoto wrote:
    > So, kmalloc(8, flags) for cpumask_var_t at alloc_cpumask_var().
    > But the content is treated as cpumask_t, it causes slab corruption
    > with overwritten when the mask data is copied.


    Yes. This is another one.

    Ingo, please fold.

    Subject: Fix slab corruption when using CONFIG_CPUMASK_OFFSTACK

    Found by Hiroshi Shimamoto; using assignment on a cpumask_t causes slab
    corruption, as we do not allocate sizeof(struct cpumask).

    It looks like we will have to stick with allocating all NR_CPUS bits until the
    end of the patchset (in the future) where we have eliminated all the
    cpumask_t assignments.

    (Note: this ban will be enforced by compiler when we get rid of the 'struct
    cpumask' definition, which is what we're slowly working towards).

    Signed-off-by: Rusty Russell

    diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
    index d1f22ee..004da56 100644
    --- a/include/linux/cpumask.h
    +++ b/include/linux/cpumask.h
    @@ -182,7 +182,8 @@ int cpumask_any_but(const struct cpumask *mask, unsigned int cpu);

    static inline size_t cpumask_size(void)
    {
    - return BITS_TO_LONGS(nr_cpumask_bits) * sizeof(long);
    + /* FIXME: Use nr_cpumask_bits once all cpumask_t assignments banished */
    + return BITS_TO_LONGS(NR_CPUS) * sizeof(long);
    }

    /* Deprecated. */

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH -tip/cpus4096-v2] cpumask: fix cpumask of call_function_data

    Rusty Russell wrote:
    ....
    > From: Rusty Russell
    > cpumask: smp_call_function_many()
    >
    > Actually change smp_call_function_mask() to smp_call_function_many().
    >
    > S390 has its own version, so we do trivial conversion on that too.
    >
    > We have to do some dancing to figure out if 0 or 1 other cpus are in
    > the mask supplied and the online mask without allocating a tmp
    > cpumask. It's still fairly cheap.
    >
    > We allocate the cpumask at the end of the call_function_data
    > structure: if allocation fails we fallback to smp_call_function_single
    > rather than using the baroque quiescing code.
    >
    > (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
    >
    > Signed-off-by: Rusty Russell
    > Signed-off-by: Mike Travis
    > Cc: Hiroshi Shimamoto
    > Cc: schwidefsky@de.ibm.com
    > Cc: heiko.carstens@de.ibm.com


    Hi Rusty,

    I'd like to know which tree this patch is against for.

    thanks,
    Hiroshi Shimamoto
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread