2.6.28-rc3-git6: Reported regressions from 2.6.27 - Kernel

This is a discussion on 2.6.28-rc3-git6: Reported regressions from 2.6.27 - Kernel ; On Mon, 2008-11-10 at 18:47 -0800, Linus Torvalds wrote: > > On Tue, 11 Nov 2008, Benjamin Herrenschmidt wrote: > > > > In any case, I doesn't seem to be directly related to those radeonfb > > changes, though ...

+ Reply to Thread
Page 4 of 5 FirstFirst ... 2 3 4 5 LastLast
Results 61 to 80 of 85

Thread: 2.6.28-rc3-git6: Reported regressions from 2.6.27

  1. Re: [Bug #11875] radeonfb lockup in .28-rc (bisected)

    On Mon, 2008-11-10 at 18:47 -0800, Linus Torvalds wrote:
    >
    > On Tue, 11 Nov 2008, Benjamin Herrenschmidt wrote:
    > >
    > > In any case, I doesn't seem to be directly related to those radeonfb
    > > changes, though a clash with X like that is indeed more likely to
    > > actually happen if radeonfb relies more heavily on acceleration.

    >
    > Just a silly question, without actually looking at the code - since you
    > now do acceleration in radeonfb, do you wait for everything to drain
    > before you switch consoles?


    radeonfb has been doing acceleration for some time :-) Just not color
    expansion, only blits and solid fills (so basically scrolling). That is
    a lot less common though and thus it's possible that existing races
    didn't show up until now.

    It does drain the engine in various cases, typically mode change,
    blanking, sync callback. fbcon core should at least sync if not blank
    when switching to KD_GRAPHICS (or at least used to, I need to double
    check). I have additional guards also that disable use of the engine
    when sleeping.

    > There could be races that depend on timing, where perhaps X is unhappy
    > about being entered with the acceleration engine busy, or conversely the
    > radeonfb code is unhappy about perhaps some still-in-progress X thing that
    > hasn't been synchronously waited for..


    Yes. From what's been reported, the more likely thing would be a race
    when switching away from X.

    > Before, radeonfb_imageblit() would always end up doing a
    > "radeon_engine_idle()", so in practice, I think just about any fbcon
    > access ended up idling the engine. Now, we can probably do a lot more
    > without syncronizing - maybe there's insufficient synchronization at the
    > switch-over from X to text-mode or vice versa?


    Switch over from X should restore KD_TEXT which should turn to a call to
    set_par() that idles the engine before anything gets written to the
    screen, but those code path are intricated between the VT code and fbcon
    and things may well be subtely broken. I'll dig later today after I'm
    done with some other emergency.

    At one point, I fixed a crapload of VT bugs where things were done
    without any locking, nowadays, everything should pretty much be covered
    by the console semaphore, but maybe there's still a problem there.
    Another area to look at is X itself. I've had problems with X (or the
    DRM) still whacking the card after handing back the console to the
    kernel in the past, so it wouldn't surprise me if there was something
    bogus there too.

    I also had problems with fbcon trying to draw before it re-initialized
    the card (ie, it -should- call set_par before any new draw operation
    when switching back from KD_GRAPHICS, if not, we don't properly get to
    reconfigure the engine before we try to use it, which can be fatal), but
    those were fixed last time I looked.

    Anyway, I'll dig and let you know what I find.

    Cheers,
    Ben.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [Bug #11947] 2.6.28-rc VC switching with Intel graphics broken

    Rafael J. Wysocki wrote:
    >
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11947
    > Subject : 2.6.28-rc VC switching with Intel graphics broken
    > Submitter : Romano Giannetti
    > Date : 2008-11-03 12:10 (7 days old)
    > Handled-By : Jesse Barnes


    Still here in 2.6.28-rc4. Complete lock switching back from a VC to X.

    Romano


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [Bug #11875] radeonfb lockup in .28-rc (bisected)

    It looks like you are observing the same failure mode that I do.

    Andreas.

    --
    Andreas Schwab, SuSE Labs, schwab@suse.de
    SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
    PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
    "And now for something completely different."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine


    * Rafael J. Wysocki wrote:

    > However, it is reproducible by doing
    >
    > # echo core > /sys/power/pm_test
    >
    > and repeating
    >
    > # echo disk > /sys/power/state
    >
    > for a couple of times, in which case the last two lines printed to the console
    > before a (solid) hang are:
    >
    > SMP alternatives: switching to SMP code
    > Booting processor 1 APIC 0x1 ip 0x6000
    >
    > So, it evidently fails while re-enabling the non-boot CPU and not
    > during disabling it as I thought before.
    >
    > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > issue is not reproducible any more.


    [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]

    Seems like the new kernel/stop_machine.c logic has a race for the test
    sequence above. (Below is the bisected commit again, maybe the race is
    visible via email review as well.)

    Ingo

    -------------->
    From c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc Mon Sep 17 00:00:00 2001
    From: Heiko Carstens
    Date: Mon, 13 Oct 2008 23:50:10 +0200
    Subject: [PATCH] stop_machine: use workqueues instead of kernel threads

    Convert stop_machine to a workqueue based approach. Instead of using kernel
    threads for stop_machine we now use a an rt workqueue to synchronize all
    cpus.
    This has the advantage that all needed per cpu threads are already created
    when stop_machine gets called. And therefore a call to stop_machine won't
    fail anymore. This is needed for s390 which needs a mechanism to synchronize
    all cpus without allocating any memory.
    As Rusty pointed out free_module() needs a non-failing stop_machine interface
    as well.

    As a side effect the stop_machine code gets simplified.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Rusty Russell
    ---
    kernel/stop_machine.c | 111 ++++++++++++++++++-------------------------------
    1 files changed, 41 insertions(+), 70 deletions(-)

    diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
    index af3c7ce..0e688c6 100644
    --- a/kernel/stop_machine.c
    +++ b/kernel/stop_machine.c
    @@ -37,9 +37,13 @@ struct stop_machine_data {
    /* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
    static unsigned int num_threads;
    static atomic_t thread_ack;
    -static struct completion finished;
    static DEFINE_MUTEX(lock);

    +static struct workqueue_struct *stop_machine_wq;
    +static struct stop_machine_data active, idle;
    +static const cpumask_t *active_cpus;
    +static void *stop_machine_work;
    +
    static void set_state(enum stopmachine_state newstate)
    {
    /* Reset ack counter. */
    @@ -51,21 +55,25 @@ static void set_state(enum stopmachine_state newstate)
    /* Last one to ack a state moves to the next state. */
    static void ack_state(void)
    {
    - if (atomic_dec_and_test(&thread_ack)) {
    - /* If we're the last one to ack the EXIT, we're finished. */
    - if (state == STOPMACHINE_EXIT)
    - complete(&finished);
    - else
    - set_state(state + 1);
    - }
    + if (atomic_dec_and_test(&thread_ack))
    + set_state(state + 1);
    }

    -/* This is the actual thread which stops the CPU. It exits by itself rather
    - * than waiting for kthread_stop(), because it's easier for hotplug CPU. */
    -static int stop_cpu(struct stop_machine_data *smdata)
    +/* This is the actual function which stops the CPU. It runs
    + * in the context of a dedicated stopmachine workqueue. */
    +static void stop_cpu(struct work_struct *unused)
    {
    enum stopmachine_state curstate = STOPMACHINE_NONE;
    -
    + struct stop_machine_data *smdata = &idle;
    + int cpu = smp_processor_id();
    +
    + if (!active_cpus) {
    + if (cpu == first_cpu(cpu_online_map))
    + smdata = &active;
    + } else {
    + if (cpu_isset(cpu, *active_cpus))
    + smdata = &active;
    + }
    /* Simple state machine */
    do {
    /* Chill out and ensure we re-read stopmachine_state. */
    @@ -90,7 +98,6 @@ static int stop_cpu(struct stop_machine_data *smdata)
    } while (curstate != STOPMACHINE_EXIT);

    local_irq_enable();
    - do_exit(0);
    }

    /* Callback for CPUs which aren't supposed to do anything. */
    @@ -101,78 +108,34 @@ static int chill(void *unused)

    int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
    {
    - int i, err;
    - struct stop_machine_data active, idle;
    - struct task_struct **threads;
    + struct work_struct *sm_work;
    + int i;

    + /* Set up initial state. */
    + mutex_lock(&lock);
    + num_threads = num_online_cpus();
    + active_cpus = cpus;
    active.fn = fn;
    active.data = data;
    active.fnret = 0;
    idle.fn = chill;
    idle.data = NULL;

    - /* This could be too big for stack on large machines. */
    - threads = kcalloc(NR_CPUS, sizeof(threads[0]), GFP_KERNEL);
    - if (!threads)
    - return -ENOMEM;
    -
    - /* Set up initial state. */
    - mutex_lock(&lock);
    - init_completion(&finished);
    - num_threads = num_online_cpus();
    set_state(STOPMACHINE_PREPARE);

    - for_each_online_cpu(i) {
    - struct stop_machine_data *smdata = &idle;
    - struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
    -
    - if (!cpus) {
    - if (i == first_cpu(cpu_online_map))
    - smdata = &active;
    - } else {
    - if (cpu_isset(i, *cpus))
    - smdata = &active;
    - }
    -
    - threads[i] = kthread_create((void *)stop_cpu, smdata, "kstop%u",
    - i);
    - if (IS_ERR(threads[i])) {
    - err = PTR_ERR(threads[i]);
    - threads[i] = NULL;
    - goto kill_threads;
    - }
    -
    - /* Place it onto correct cpu. */
    - kthread_bind(threads[i], i);
    -
    - /* Make it highest prio. */
    - if (sched_setscheduler_nocheck(threads[i], SCHED_FIFO, &param))
    - BUG();
    - }
    -
    - /* We've created all the threads. Wake them all: hold this CPU so one
    + /* Schedule the stop_cpu work on all cpus: hold this CPU so one
    * doesn't hit this CPU until we're ready. */
    get_cpu();
    - for_each_online_cpu(i)
    - wake_up_process(threads[i]);
    -
    + for_each_online_cpu(i) {
    + sm_work = percpu_ptr(stop_machine_work, i);
    + INIT_WORK(sm_work, stop_cpu);
    + queue_work_on(i, stop_machine_wq, sm_work);
    + }
    /* This will release the thread on our CPU. */
    put_cpu();
    - wait_for_completion(&finished);
    + flush_workqueue(stop_machine_wq);
    mutex_unlock(&lock);
    -
    - kfree(threads);
    -
    return active.fnret;
    -
    -kill_threads:
    - for_each_online_cpu(i)
    - if (threads[i])
    - kthread_stop(threads[i]);
    - mutex_unlock(&lock);
    -
    - kfree(threads);
    - return err;
    }

    int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
    @@ -187,3 +150,11 @@ int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)
    return ret;
    }
    EXPORT_SYMBOL_GPL(stop_machine);
    +
    +static int __init stop_machine_init(void)
    +{
    + stop_machine_wq = create_rt_workqueue("kstop");
    + stop_machine_work = alloc_percpu(struct work_struct);
    + return 0;
    +}
    +early_initcall(stop_machine_init);
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 11:52:14AM +0100, Ingo Molnar wrote:
    >
    > * Rafael J. Wysocki wrote:
    >
    > > However, it is reproducible by doing
    > >
    > > # echo core > /sys/power/pm_test
    > >
    > > and repeating
    > >
    > > # echo disk > /sys/power/state
    > >
    > > for a couple of times, in which case the last two lines printed to the console
    > > before a (solid) hang are:
    > >
    > > SMP alternatives: switching to SMP code
    > > Booting processor 1 APIC 0x1 ip 0x6000
    > >
    > > So, it evidently fails while re-enabling the non-boot CPU and not
    > > during disabling it as I thought before.
    > >
    > > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > > issue is not reproducible any more.

    >
    > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    >
    > Seems like the new kernel/stop_machine.c logic has a race for the test
    > sequence above. (Below is the bisected commit again, maybe the race is
    > visible via email review as well.)


    FWIW, I tried to reproduce this on s390 and got the following:

    A process that would do nothing but onlining/offlining cpus would get
    stuck after a while:

    0 schedule+842 [0x342522]
    1 schedule_timeout+200 [0x342ec4]
    2 wait_for_common+362 [0x341fd6]
    3 wait_for_completion+54 [0x342146]
    4 __synchronize_sched+80 [0x81670]
    5 cpu_down+172 [0x33c030]
    6 store_online+96 [0x33c488]
    7 sysdev_store+52 [0x1bda84]
    8 sysfs_write_file+242 [0x1350ba]
    9 vfs_write+176 [0xd2028]
    10 sys_write+82 [0xd21ea]
    11 sysc_noemu+16 [0x269d8]

    All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    I login into the system or generate a console interrupt.
    I'm going to look into the dump and see if I can figure out what is
    broken here.
    Dunno if it is the same bug or something else.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [Bug #11875] radeonfb lockup in .28-rc (bisected)

    On Tue, 2008-11-11 at 10:31 +0100, Andreas Schwab wrote:
    > It looks like you are observing the same failure mode that I do.


    Yup, once, haven't reproduced it ever since though :-(

    Ben.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 12:31:34PM +0100, Heiko Carstens wrote:
    > On Tue, Nov 11, 2008 at 11:52:14AM +0100, Ingo Molnar wrote:
    > >
    > > * Rafael J. Wysocki wrote:
    > >
    > > > However, it is reproducible by doing
    > > >
    > > > # echo core > /sys/power/pm_test
    > > >
    > > > and repeating
    > > >
    > > > # echo disk > /sys/power/state
    > > >
    > > > for a couple of times, in which case the last two lines printed to the console
    > > > before a (solid) hang are:
    > > >
    > > > SMP alternatives: switching to SMP code
    > > > Booting processor 1 APIC 0x1 ip 0x6000
    > > >
    > > > So, it evidently fails while re-enabling the non-boot CPU and not
    > > > during disabling it as I thought before.
    > > >
    > > > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > > > issue is not reproducible any more.

    > >
    > > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    > >
    > > Seems like the new kernel/stop_machine.c logic has a race for the test
    > > sequence above. (Below is the bisected commit again, maybe the race is
    > > visible via email review as well.)

    >
    > FWIW, I tried to reproduce this on s390 and got the following:
    >
    > A process that would do nothing but onlining/offlining cpus would get
    > stuck after a while:
    >
    > 0 schedule+842 [0x342522]
    > 1 schedule_timeout+200 [0x342ec4]
    > 2 wait_for_common+362 [0x341fd6]
    > 3 wait_for_completion+54 [0x342146]
    > 4 __synchronize_sched+80 [0x81670]
    > 5 cpu_down+172 [0x33c030]
    > 6 store_online+96 [0x33c488]
    > 7 sysdev_store+52 [0x1bda84]
    > 8 sysfs_write_file+242 [0x1350ba]
    > 9 vfs_write+176 [0xd2028]
    > 10 sys_write+82 [0xd21ea]
    > 11 sysc_noemu+16 [0x269d8]
    >
    > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > I login into the system or generate a console interrupt.
    > I'm going to look into the dump and see if I can figure out what is
    > broken here.
    > Dunno if it is the same bug or something else.


    [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]

    Steven, Paul, any idea what could cause the hang? I think I would
    get lost in the RCU code...
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine


    * Heiko Carstens wrote:

    > On Tue, Nov 11, 2008 at 12:31:34PM +0100, Heiko Carstens wrote:
    > > On Tue, Nov 11, 2008 at 11:52:14AM +0100, Ingo Molnar wrote:
    > > >
    > > > * Rafael J. Wysocki wrote:
    > > >
    > > > > However, it is reproducible by doing
    > > > >
    > > > > # echo core > /sys/power/pm_test
    > > > >
    > > > > and repeating
    > > > >
    > > > > # echo disk > /sys/power/state
    > > > >
    > > > > for a couple of times, in which case the last two lines printed to the console
    > > > > before a (solid) hang are:
    > > > >
    > > > > SMP alternatives: switching to SMP code
    > > > > Booting processor 1 APIC 0x1 ip 0x6000
    > > > >
    > > > > So, it evidently fails while re-enabling the non-boot CPU and not
    > > > > during disabling it as I thought before.
    > > > >
    > > > > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > > > > issue is not reproducible any more.
    > > >
    > > > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    > > >
    > > > Seems like the new kernel/stop_machine.c logic has a race for the test
    > > > sequence above. (Below is the bisected commit again, maybe the race is
    > > > visible via email review as well.)

    > >
    > > FWIW, I tried to reproduce this on s390 and got the following:
    > >
    > > A process that would do nothing but onlining/offlining cpus would get
    > > stuck after a while:
    > >
    > > 0 schedule+842 [0x342522]
    > > 1 schedule_timeout+200 [0x342ec4]
    > > 2 wait_for_common+362 [0x341fd6]
    > > 3 wait_for_completion+54 [0x342146]
    > > 4 __synchronize_sched+80 [0x81670]
    > > 5 cpu_down+172 [0x33c030]
    > > 6 store_online+96 [0x33c488]
    > > 7 sysdev_store+52 [0x1bda84]
    > > 8 sysfs_write_file+242 [0x1350ba]
    > > 9 vfs_write+176 [0xd2028]
    > > 10 sys_write+82 [0xd21ea]
    > > 11 sysc_noemu+16 [0x269d8]
    > >
    > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > > I login into the system or generate a console interrupt.
    > > I'm going to look into the dump and see if I can figure out what is
    > > broken here.
    > > Dunno if it is the same bug or something else.

    >
    > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
    >
    > Steven, Paul, any idea what could cause the hang? I think I would
    > get lost in the RCU code...


    Cc:-ed Thomas - sometimes "RCU hangs" happen due to nohz confusion:
    because no timer IRQ happens so there's nothing to drive the RCU
    machinery.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 11:52 AM, Ingo Molnar wrote:
    > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]


    Heh. I am not expert, but I looked at the code. The obvious suspicious
    thing to see is the use of unpaired barriers? Maybe like this:

    47 static void set_state(enum stopmachine_state newstate)
    48 {
    49 /* Reset ack counter. */
    50 atomic_set(&thread_ack, num_threads);
    51 smp_wmb();

    + /* force ordering between thread_ack/state */

    52 state = newstate;
    53 }
    54
    55 /* Last one to ack a state moves to the next state. */
    56 static void ack_state(void)
    57 {
    58 if (atomic_dec_and_test(&thread_ack))

    Maybe
    + /* force ordering between thread_ack/state */
    + smp_rmb();
    here?

    59 set_state(state + 1);
    60 }
    61

    Or maybe I am wrong. But Documentation/memory-barriers.txt is rather
    explicit on this point.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, 2008-11-11 at 14:36 +0100, Vegard Nossum wrote:
    > On Tue, Nov 11, 2008 at 11:52 AM, Ingo Molnar wrote:
    > > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]

    >
    > Heh. I am not expert, but I looked at the code. The obvious suspicious
    > thing to see is the use of unpaired barriers? Maybe like this:
    >
    > 47 static void set_state(enum stopmachine_state newstate)
    > 48 {
    > 49 /* Reset ack counter. */
    > 50 atomic_set(&thread_ack, num_threads);
    > 51 smp_wmb();
    >
    > + /* force ordering between thread_ack/state */
    >
    > 52 state = newstate;
    > 53 }
    > 54
    > 55 /* Last one to ack a state moves to the next state. */
    > 56 static void ack_state(void)
    > 57 {
    > 58 if (atomic_dec_and_test(&thread_ack))
    >
    > Maybe
    > + /* force ordering between thread_ack/state */
    > + smp_rmb();
    > here?


    all atomic ops that have return values imply a full barrier, iirc

    > 59 set_state(state + 1);
    > 60 }
    > 61
    >
    > Or maybe I am wrong. But Documentation/memory-barriers.txt is rather
    > explicit on this point.
    >
    >
    > Vegard
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 2:36 PM, Vegard Nossum wrote:
    > On Tue, Nov 11, 2008 at 11:52 AM, Ingo Molnar wrote:
    >> [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]

    >
    > Heh. I am not expert, but I looked at the code. The obvious suspicious
    > thing to see is the use of unpaired barriers? Maybe like this:


    ....

    > 55 /* Last one to ack a state moves to the next state. */
    > 56 static void ack_state(void)
    > 57 {
    > 58 if (atomic_dec_and_test(&thread_ack))
    >
    > Maybe
    > + /* force ordering between thread_ack/state */
    > + smp_rmb();
    > here?


    Oops, I am wrong (after a small investigation).

    "1490 Any atomic operation that modifies some state in memory and
    returns information
    1491 about the state (old or new) implies an SMP-conditional general
    memory barrier
    1492 (smp_mb()) on each side of the actual operation (with the exception of
    1493 explicit lock operations, described later). These include:
    1494
    ....
    1503 atomic_dec_and_test();"

    Won't fix the problem at hand, but maybe something like this would be
    nice for future generations :-)

    diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
    index 0e688c6..6796bb1 100644
    --- a/kernel/stop_machine.c
    +++ b/kernel/stop_machine.c
    @@ -55,6 +55,7 @@ static void set_state(enum stopmachine_state newstate)
    /* Last one to ack a state moves to the next state. */
    static void ack_state(void)
    {
    + /* Implicit memory barrier; no smp_rmb() needed */
    if (atomic_dec_and_test(&thread_ack))
    set_state(state + 1);
    }


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 01:42:01PM +0100, Heiko Carstens wrote:
    > On Tue, Nov 11, 2008 at 12:31:34PM +0100, Heiko Carstens wrote:
    > > On Tue, Nov 11, 2008 at 11:52:14AM +0100, Ingo Molnar wrote:
    > > >
    > > > * Rafael J. Wysocki wrote:
    > > >
    > > > > However, it is reproducible by doing
    > > > >
    > > > > # echo core > /sys/power/pm_test
    > > > >
    > > > > and repeating
    > > > >
    > > > > # echo disk > /sys/power/state
    > > > >
    > > > > for a couple of times, in which case the last two lines printed to the console
    > > > > before a (solid) hang are:
    > > > >
    > > > > SMP alternatives: switching to SMP code
    > > > > Booting processor 1 APIC 0x1 ip 0x6000
    > > > >
    > > > > So, it evidently fails while re-enabling the non-boot CPU and not
    > > > > during disabling it as I thought before.
    > > > >
    > > > > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > > > > issue is not reproducible any more.
    > > >
    > > > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    > > >
    > > > Seems like the new kernel/stop_machine.c logic has a race for the test
    > > > sequence above. (Below is the bisected commit again, maybe the race is
    > > > visible via email review as well.)

    > >
    > > FWIW, I tried to reproduce this on s390 and got the following:
    > >
    > > A process that would do nothing but onlining/offlining cpus would get
    > > stuck after a while:
    > >
    > > 0 schedule+842 [0x342522]
    > > 1 schedule_timeout+200 [0x342ec4]
    > > 2 wait_for_common+362 [0x341fd6]
    > > 3 wait_for_completion+54 [0x342146]
    > > 4 __synchronize_sched+80 [0x81670]
    > > 5 cpu_down+172 [0x33c030]
    > > 6 store_online+96 [0x33c488]
    > > 7 sysdev_store+52 [0x1bda84]
    > > 8 sysfs_write_file+242 [0x1350ba]
    > > 9 vfs_write+176 [0xd2028]
    > > 10 sys_write+82 [0xd21ea]
    > > 11 sysc_noemu+16 [0x269d8]
    > >
    > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > > I login into the system or generate a console interrupt.
    > > I'm going to look into the dump and see if I can figure out what is
    > > broken here.
    > > Dunno if it is the same bug or something else.

    >
    > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
    >
    > Steven, Paul, any idea what could cause the hang? I think I would
    > get lost in the RCU code...


    Hello, Heiko,

    Could you please apply the following debug patch (due to Jiangshan and
    myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    then mount debugfs after boot, for example, on /debug. This will
    create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    and "rcu_bh_data". Since you are still able to log in, could you
    please send the contents of these three files?

    Thanx, Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 11:52 AM, Ingo Molnar wrote:
    > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    >
    > Seems like the new kernel/stop_machine.c logic has a race for the test
    > sequence above. (Below is the bisected commit again, maybe the race is
    > visible via email review as well.)


    I try again.

    I think that the test for stop_machine_data in stop_cpu() should not
    have been moved from __stop_machine(). Because now cpu_online_map may
    change in-between calls to stop_cpu() (if the callback tries to
    online/offline CPUs), and the end result may be different.

    Maybe?


    Vegard
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 06:35:05AM -0800, Paul E. McKenney wrote:
    > On Tue, Nov 11, 2008 at 01:42:01PM +0100, Heiko Carstens wrote:
    > > On Tue, Nov 11, 2008 at 12:31:34PM +0100, Heiko Carstens wrote:
    > > > On Tue, Nov 11, 2008 at 11:52:14AM +0100, Ingo Molnar wrote:
    > > > >
    > > > > * Rafael J. Wysocki wrote:
    > > > >
    > > > > > However, it is reproducible by doing
    > > > > >
    > > > > > # echo core > /sys/power/pm_test
    > > > > >
    > > > > > and repeating
    > > > > >
    > > > > > # echo disk > /sys/power/state
    > > > > >
    > > > > > for a couple of times, in which case the last two lines printed to the console
    > > > > > before a (solid) hang are:
    > > > > >
    > > > > > SMP alternatives: switching to SMP code
    > > > > > Booting processor 1 APIC 0x1 ip 0x6000
    > > > > >
    > > > > > So, it evidently fails while re-enabling the non-boot CPU and not
    > > > > > during disabling it as I thought before.
    > > > > >
    > > > > > With commit c9583e55fa2b08a230c549bd1e3c0bde6c50d9cc reverted the
    > > > > > issue is not reproducible any more.
    > > > >
    > > > > [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    > > > >
    > > > > Seems like the new kernel/stop_machine.c logic has a race for the test
    > > > > sequence above. (Below is the bisected commit again, maybe the race is
    > > > > visible via email review as well.)
    > > >
    > > > FWIW, I tried to reproduce this on s390 and got the following:
    > > >
    > > > A process that would do nothing but onlining/offlining cpus would get
    > > > stuck after a while:
    > > >
    > > > 0 schedule+842 [0x342522]
    > > > 1 schedule_timeout+200 [0x342ec4]
    > > > 2 wait_for_common+362 [0x341fd6]
    > > > 3 wait_for_completion+54 [0x342146]
    > > > 4 __synchronize_sched+80 [0x81670]
    > > > 5 cpu_down+172 [0x33c030]
    > > > 6 store_online+96 [0x33c488]
    > > > 7 sysdev_store+52 [0x1bda84]
    > > > 8 sysfs_write_file+242 [0x1350ba]
    > > > 9 vfs_write+176 [0xd2028]
    > > > 10 sys_write+82 [0xd21ea]
    > > > 11 sysc_noemu+16 [0x269d8]
    > > >
    > > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > > > I login into the system or generate a console interrupt.
    > > > I'm going to look into the dump and see if I can figure out what is
    > > > broken here.
    > > > Dunno if it is the same bug or something else.

    > >
    > > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
    > >
    > > Steven, Paul, any idea what could cause the hang? I think I would
    > > get lost in the RCU code...

    >
    > Hello, Heiko,
    >
    > Could you please apply the following debug patch (due to Jiangshan and
    > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > then mount debugfs after boot, for example, on /debug. This will
    > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > and "rcu_bh_data". Since you are still able to log in, could you
    > please send the contents of these three files?
    >
    > Thanx, Paul


    This time with the patch actually attached... Thanks to Peter Z.
    for alerting me to my omission.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Lai Jiangshan
    ---

    diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
    index 4ab8436..735f35a 100644
    --- a/include/linux/rcuclassic.h
    +++ b/include/linux/rcuclassic.h
    @@ -54,6 +54,9 @@ struct rcu_ctrlblk {
    /* for current batch to proceed. */
    } ____cacheline_internodealigned_in_smp;

    +extern struct rcu_ctrlblk rcu_ctrlblk;
    +extern struct rcu_ctrlblk rcu_bh_ctrlblk;
    +
    /* Is batch a before batch b ? */
    static inline int rcu_batch_before(long a, long b)
    {
    @@ -76,6 +79,7 @@ struct rcu_data {
    long quiescbatch; /* Batch # for grace period */
    int passed_quiesc; /* User-mode/idle loop etc. */
    int qs_pending; /* core waits for quiesc state */
    + bool beenonline; /* CPU online at least once */

    /* 2) batch handling */
    long batch; /* Batch # for current RCU batch */
    diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
    index 9fdba03..ba32338 100644
    --- a/kernel/Kconfig.preempt
    +++ b/kernel/Kconfig.preempt
    @@ -68,7 +68,6 @@ config PREEMPT_RCU

    config RCU_TRACE
    bool "Enable tracing for RCU - currently stats in debugfs"
    - depends on PREEMPT_RCU
    select DEBUG_FS
    default y
    help
    diff --git a/kernel/Makefile b/kernel/Makefile
    index 4e1d7df..e0bfce7 100644
    --- a/kernel/Makefile
    +++ b/kernel/Makefile
    @@ -77,6 +77,8 @@ obj-$(CONFIG_CLASSIC_RCU) += rcuclassic.o
    obj-$(CONFIG_PREEMPT_RCU) += rcupreempt.o
    ifeq ($(CONFIG_PREEMPT_RCU),y)
    obj-$(CONFIG_RCU_TRACE) += rcupreempt_trace.o
    +else
    +obj-$(CONFIG_RCU_TRACE) += rcuclassic_trace.o
    endif
    obj-$(CONFIG_RELAY) += relay.o
    obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
    diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
    index aad93cd..06472fc 100644
    --- a/kernel/rcuclassic.c
    +++ b/kernel/rcuclassic.c
    @@ -57,13 +57,13 @@ EXPORT_SYMBOL_GPL(rcu_lock_map);


    /* Definition for rcupdate control block. */
    -static struct rcu_ctrlblk rcu_ctrlblk = {
    +struct rcu_ctrlblk rcu_ctrlblk = {
    .cur = -300,
    .completed = -300,
    .lock = __SPIN_LOCK_UNLOCKED(&rcu_ctrlblk.lock),
    .cpumask = CPU_MASK_NONE,
    };
    -static struct rcu_ctrlblk rcu_bh_ctrlblk = {
    +struct rcu_ctrlblk rcu_bh_ctrlblk = {
    .cur = -300,
    .completed = -300,
    .lock = __SPIN_LOCK_UNLOCKED(&rcu_bh_ctrlblk.lock),
    @@ -564,6 +564,7 @@ static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
    rdp->donetail = &rdp->donelist;
    rdp->quiescbatch = rcp->completed;
    rdp->qs_pending = 0;
    + rdp->beenonline = 1;
    rdp->cpu = cpu;
    rdp->blimit = blimit;
    }
    diff --git a/kernel/rcuclassic_trace.c b/kernel/rcuclassic_trace.c
    new file mode 100644
    index 0000000..b719048
    --- /dev/null
    +++ b/kernel/rcuclassic_trace.c
    @@ -0,0 +1,198 @@
    +/*
    + * Read-Copy Update tracing for classic implementation
    + *
    + * This program is free software; you can redistribute it and/or modify
    + * it under the terms of the GNU General Public License as published by
    + * the Free Software Foundation; either version 2 of the License, or
    + * (at your option) any later version.
    + *
    + * This program is distributed in the hope that it will be useful,
    + * but WITHOUT ANY WARRANTY; without even the implied warranty of
    + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    + * GNU General Public License for more details.
    + *
    + * You should have received a copy of the GNU General Public License
    + * along with this program; if not, write to the Free Software
    + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
    + *
    + * Copyright IBM Corporation, 2008
    + *
    + * Updated to use seqfile by Lai Jiangshan.
    + *
    + * Papers: http://www.rdrop.com/users/paulmck/RCU
    + *
    + * For detailed explanation of Read-Copy Update mechanism see -
    + * Documentation/RCU
    + *
    + */
    +#include
    +#include
    +#include
    +#include
    +
    +/* Print out rcu_data structures using seqfile facility. */
    +
    +static struct rcu_data *get_rcu_data_bh(int cpu)
    +{
    + return &per_cpu(rcu_bh_data, cpu);
    +}
    +
    +static struct rcu_data *get_rcu_data(int cpu)
    +{
    + return &per_cpu(rcu_data, cpu);
    +}
    +
    +static int show_rcu_data(struct seq_file *m, void *v)
    +{
    + struct rcu_data *rdp = v;
    +
    + if (!rdp->beenonline)
    + return 0;
    +
    + seq_printf(m, "processor\t: %d", rdp->cpu);
    + if (cpu_is_offline(rdp->cpu))
    + seq_puts(m, "!\n");
    + else
    + seq_puts(m, "\n");
    + seq_printf(m, "quiescbatch\t: %ld\n", rdp->quiescbatch);
    + seq_printf(m, "batch\t\t: %ld\n", rdp->batch);
    + seq_printf(m, "passed_quiesc\t: %d\n", rdp->passed_quiesc);
    + seq_printf(m, "qs_pending\t: %d\n", rdp->qs_pending);
    + seq_printf(m, "qlen\t\t: %ld\n", rdp->qlen);
    + seq_printf(m, "blimit\t\t: %ld\n", rdp->blimit);
    + seq_puts(m, "\n");
    + return 0;
    +}
    +
    +static void *c_start(struct seq_file *m, loff_t *pos)
    +{
    + typedef struct rcu_data *(*get_data_func)(int);
    +
    + if (*pos == 0) /* just in case, cpu 0 is not the first */
    + *pos = first_cpu(cpu_possible_map);
    + else
    + *pos = next_cpu_nr(*pos - 1, cpu_possible_map);
    + if ((*pos) < nr_cpu_ids)
    + return ((get_data_func)m->private)(*pos);
    + return NULL;
    +}
    +
    +static void *c_next(struct seq_file *m, void *v, loff_t *pos)
    +{
    + (*pos)++;
    + return c_start(m, pos);
    +}
    +
    +static void c_stop(struct seq_file *m, void *v)
    +{
    +}
    +
    +const struct seq_operations rcu_data_seq_op = {
    + .start = c_start,
    + .next = c_next,
    + .stop = c_stop,
    + .show = show_rcu_data,
    +};
    +
    +static int rcu_data_open(struct inode *inode, struct file *file)
    +{
    + int ret = seq_open(file, &rcu_data_seq_op);
    +
    + if (ret)
    + return ret;
    + ((struct seq_file *)file->private_data)->private = inode->i_private;
    + return 0;
    +}
    +
    +static const struct file_operations rcu_data_fops = {
    + .owner = THIS_MODULE,
    + .open = rcu_data_open,
    + .read = seq_read,
    + .llseek = seq_lseek,
    + .release = seq_release,
    +};
    +
    +/* Print out rcu_ctrlblk structures using seqfile facility. */
    +
    +static void print_one_rcu_ctrlblk(struct seq_file *m, struct rcu_ctrlblk *rcp)
    +{
    + seq_printf(m, "cur=%ld completed=%ld next_pending=%d s=%d\n\t",
    + rcp->cur, rcp->completed, rcp->next_pending, rcp->signaled);
    + seq_cpumask(m, &rcp->cpumask);
    + seq_puts(m, "\n");
    +}
    +
    +static int show_rcucb(struct seq_file *m, void *unused)
    +{
    + seq_puts(m, "rcu: ");
    + print_one_rcu_ctrlblk(m, &rcu_ctrlblk);
    + seq_puts(m, "rcu_bh: ");
    + print_one_rcu_ctrlblk(m, &rcu_bh_ctrlblk);
    + seq_puts(m, "online: ");
    + seq_cpumask(m, &cpu_online_map);
    + seq_puts(m, "\n");
    + return 0;
    +}
    +
    +static int rcucb_open(struct inode *inode, struct file *file)
    +{
    + return single_open(file, show_rcucb, NULL);
    +}
    +
    +static struct file_operations rcucb_fops = {
    + .owner = THIS_MODULE,
    + .open = rcucb_open,
    + .read = seq_read,
    + .llseek = seq_lseek,
    + .release = single_release,
    +};
    +
    +static struct dentry *rcudir, *rcu_bh_data_file, *rcu_data_file, *rcucb_file;
    +
    +static int __init rcuclassic_trace_init(void)
    +{
    + rcudir = debugfs_create_dir("rcu", NULL);
    + if (!rcudir)
    + goto out;
    +
    + rcu_bh_data_file = debugfs_create_file("rcu_bh_data", 0444, rcudir,
    + get_rcu_data_bh, &rcu_data_fops);
    + if (!rcu_bh_data_file)
    + goto out_rcudir;
    +
    + rcu_data_file = debugfs_create_file("rcu_data", 0444, rcudir,
    + get_rcu_data, &rcu_data_fops);
    + if (!rcu_data_file)
    + goto out_rcudata_bh_file;
    +
    + rcucb_file = debugfs_create_file("rcucb", 0444, rcudir,
    + NULL, &rcucb_fops);
    + if (!rcucb_file)
    + goto out_rcudata_file;
    + return 0;
    +
    +out_rcudata_file:
    + debugfs_remove(rcu_data_file);
    +out_rcudata_bh_file:
    + debugfs_remove(rcu_bh_data_file);
    +out_rcudir:
    + debugfs_remove(rcudir);
    +out:
    + return 1;
    +}
    +
    +static void __exit rcuclassic_trace_cleanup(void)
    +{
    + debugfs_remove(rcucb_file);
    + debugfs_remove(rcu_data_file);
    + debugfs_remove(rcu_bh_data_file);
    + debugfs_remove(rcudir);
    +}
    +
    +module_init(rcuclassic_trace_init);
    +module_exit(rcuclassic_trace_cleanup);
    +
    +MODULE_AUTHOR("Paul E. McKenney");
    +MODULE_DESCRIPTION("Read-Copy Update tracing for classic implementation");
    +MODULE_LICENSE("GPL");
    +
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 06:35:05AM -0800, Paul E. McKenney wrote:
    > > > A process that would do nothing but onlining/offlining cpus would get
    > > > stuck after a while:
    > > >
    > > > 0 schedule+842 [0x342522]
    > > > 1 schedule_timeout+200 [0x342ec4]
    > > > 2 wait_for_common+362 [0x341fd6]
    > > > 3 wait_for_completion+54 [0x342146]
    > > > 4 __synchronize_sched+80 [0x81670]
    > > > 5 cpu_down+172 [0x33c030]
    > > > 6 store_online+96 [0x33c488]
    > > > 7 sysdev_store+52 [0x1bda84]
    > > > 8 sysfs_write_file+242 [0x1350ba]
    > > > 9 vfs_write+176 [0xd2028]
    > > > 10 sys_write+82 [0xd21ea]
    > > > 11 sysc_noemu+16 [0x269d8]
    > > >
    > > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > > > I login into the system or generate a console interrupt.
    > > > I'm going to look into the dump and see if I can figure out what is
    > > > broken here.
    > > > Dunno if it is the same bug or something else.

    > >
    > > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
    > >
    > > Steven, Paul, any idea what could cause the hang? I think I would
    > > get lost in the RCU code...

    >
    > Hello, Heiko,
    >
    > Could you please apply the following debug patch (due to Jiangshan and
    > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > then mount debugfs after boot, for example, on /debug. This will
    > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > and "rcu_bh_data". Since you are still able to log in, could you
    > please send the contents of these three files?


    Hi Paul,

    could you attach the patch please?

    Does the patch also make sense if the system continues to work? That
    is the machine isn't stalled anymore as soon as I log in.
    On the other hand I do have a dump of the system and can look in
    whatever data structures you want. If that helps.

    Thanks,
    Heiko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    2008/11/11 Vegard Nossum :
    > On Tue, Nov 11, 2008 at 11:52 AM, Ingo Molnar wrote:
    >> [ Cc:-ed workqueue/locking/suspend-race-condition experts. ]
    >>
    >> Seems like the new kernel/stop_machine.c logic has a race for the test
    >> sequence above. (Below is the bisected commit again, maybe the race is
    >> visible via email review as well.)

    >
    > I try again.
    >
    > I think that the test for stop_machine_data in stop_cpu() should not
    > have been moved from __stop_machine().


    Do you mean the following test?

    if (!active_cpus) {
    if (cpu == first_cpu(cpu_online_map))
    smdata = &active;
    } else {
    if (cpu_isset(cpu, *active_cpus))
    smdata = &active;
    }

    > Because now cpu_online_map may
    > change in-between calls to stop_cpu() (if the callback tries to
    > online/offline CPUs), and the end result may be different.


    take_cpu_down() may not run earlier than stop_cpu() on all the cpus
    have completed the STOPMACHINE_DISABLE_IRQ step, iow. "state ==
    STOPMACHINE_RUN". By that moment, 'smdata' has been set up on all
    cpus... if this is the case you had in mind.


    >
    > Maybe?
    >
    >
    > Vegard
    >



    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On 11/11, Vegard Nossum wrote:
    >
    > I think that the test for stop_machine_data in stop_cpu() should not
    > have been moved from __stop_machine(). Because now cpu_online_map may
    > change in-between calls to stop_cpu() (if the callback tries to
    > online/offline CPUs), and the end result may be different.


    I don't think this is possible, the callback must not be called unless
    all threads ack (at least) the STOPMACHINE_PREPARE state.


    Off-topic question, __stop_machine() does:

    /* Schedule the stop_cpu work on all cpus: hold this CPU so one
    * doesn't hit this CPU until we're ready. */
    get_cpu();
    for_each_online_cpu(i) {
    sm_work = percpu_ptr(stop_machine_work, i);
    INIT_WORK(sm_work, stop_cpu);
    queue_work_on(i, stop_machine_wq, sm_work);
    }
    /* This will release the thread on our CPU. */
    put_cpu();

    Don't we actually need preempt_disable/preempt_enable instead of
    get/put cpu? (yes, there the same currently). We don't care about
    the CPU we are running on, and it can't go away until we queue all
    works. But we must ensure that stop_cpu() on the same CPU can't
    preempt us, right?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Q: force_quiescent_state && cpu_online_map

    I don't think this matters, but still...

    force_quiescent_state:

    * cpu_online_map is updated by the _cpu_down()
    * using __stop_machine(). Since we're in irqs disabled
    * section, __stop_machine() is not exectuting, hence
    * the cpu_online_map is stable.
    *
    * However, a cpu might have been offlined _just_ before
    * we disabled irqs while entering here.
    * And rcu subsystem might not yet have handled the CPU_DEAD
    * notification, leading to the offlined cpu's bit
    * being set in the rcp->cpumask.
    *
    * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
    * sending smp_reschedule() to an offlined CPU.
    */
    cpus_and(cpumask, rcp->cpumask, cpu_online_map);
    cpu_clear(rdp->cpu, cpumask);
    for_each_cpu_mask_nr(cpu, cpumask)
    smp_send_reschedule(cpu);

    However,

    // called by __stop_machine take_cpu_down()
    arch/x86/kernel/smpboot.c:cpu_disable_common()

    /*
    * HACK:
    * Allow any queued timer interrupts to get serviced
    * This is only a temporary solution until we cleanup
    * fixup_irqs as we do for IA64.
    */
    local_irq_enable();
    mdelay(1);
    local_irq_disable();
    ...
    remove_cpu_from_maps(cpu);

    So it is possible to send the ipi to the dying CPU. I know nothing
    about this low-level irq code, most probably this is harmless. We
    already did clear_local_APIC(), but I don't understand what it does.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 04:01:32PM +0100, Heiko Carstens wrote:
    > On Tue, Nov 11, 2008 at 06:35:05AM -0800, Paul E. McKenney wrote:
    > > > > A process that would do nothing but onlining/offlining cpus would get
    > > > > stuck after a while:
    > > > >
    > > > > 0 schedule+842 [0x342522]
    > > > > 1 schedule_timeout+200 [0x342ec4]
    > > > > 2 wait_for_common+362 [0x341fd6]
    > > > > 3 wait_for_completion+54 [0x342146]
    > > > > 4 __synchronize_sched+80 [0x81670]
    > > > > 5 cpu_down+172 [0x33c030]
    > > > > 6 store_online+96 [0x33c488]
    > > > > 7 sysdev_store+52 [0x1bda84]
    > > > > 8 sysfs_write_file+242 [0x1350ba]
    > > > > 9 vfs_write+176 [0xd2028]
    > > > > 10 sys_write+82 [0xd21ea]
    > > > > 11 sysc_noemu+16 [0x269d8]
    > > > >
    > > > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
    > > > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
    > > > > I login into the system or generate a console interrupt.
    > > > > I'm going to look into the dump and see if I can figure out what is
    > > > > broken here.
    > > > > Dunno if it is the same bug or something else.
    > > >
    > > > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
    > > >
    > > > Steven, Paul, any idea what could cause the hang? I think I would
    > > > get lost in the RCU code...

    > >
    > > Hello, Heiko,
    > >
    > > Could you please apply the following debug patch (due to Jiangshan and
    > > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > > then mount debugfs after boot, for example, on /debug. This will
    > > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > > and "rcu_bh_data". Since you are still able to log in, could you
    > > please send the contents of these three files?

    >
    > Hi Paul,
    >
    > could you attach the patch please?


    Peter Z. beat you to it. ;-)

    See previous email.

    > Does the patch also make sense if the system continues to work? That
    > is the machine isn't stalled anymore as soon as I log in.
    > On the other hand I do have a dump of the system and can look in
    > whatever data structures you want. If that helps.


    Ah!

    I would like to see the value of rcu_ctrlblk.cpumask and also the value
    of cpu_online_map. One guess would be that rcu_ctrlblk.cpumask has a
    bit set that is -not- set in cpu_online_map, which would indicate that
    RCU was incorrectly waiting on an offline CPU.

    On the other hand, if all the bits set in rcu_ctrlblk.cpumask are also
    set in cpu_online_map, then could you please dump out the instances of
    the rcu_data per-CPU variable that correspond to the bits set in
    rcu_ctrlblk.cpumask?

    Finally, if no bits are set in rcu_ctrlblk.cpumask, the question would
    be "why isn't the synchronize_sched() waking up?"

    BTW, I am assuming that you have the same config as Raphael, in other
    words, that you are running Classic RCU rather than preemptable RCU.

    The point of the patch is that it allows you to see this info by catting
    out the /debug/rcu files, at least assuming that the system is healthy
    enough to allow you to cat files. But if you already have a crash dump...

    Thanx, Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    > > Could you please apply the following debug patch (due to Jiangshan and
    > > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > > then mount debugfs after boot, for example, on /debug. This will
    > > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > > and "rcu_bh_data". Since you are still able to log in, could you
    > > please send the contents of these three files?
    > >
    > > Thanx, Paul

    >
    > This time with the patch actually attached... Thanks to Peter Z.
    > for alerting me to my omission.


    Well, your patch doesn't apply on git head. However I used preemptible
    RCU instead and had tracing enabled.

    This is the output of the three files after it stalled (and continued,
    because I caused an interrupt by sending a network packet) twice:

    [root@h0545001 rcu]# cat rcuctrs
    CPU last cur F M
    1 0 0 1 1
    3 0 0 1 1
    4 0 0 0 0
    5 0 0 0 1
    6 0 0 0 0
    ggp = 1640, state = waitack

    [root@h0545001 rcu]# cat rcugp
    oldggp=1652 newggp=1655

    [root@h0545001 rcu]# cat rcustats
    na=33948 nl=3 wa=33945 wl=0 da=33945 dl=0 dr=33945 di=0
    1=0 e1=0 i1=1674 ie1=4 g1=1670 a1=1920 ae1=251 a2=1669
    z1=1669 ze1=0 z2=1669 m1=4411 me1=2742 m2=1669

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 4 of 5 FirstFirst ... 2 3 4 5 LastLast