spinlock lockup on CPU#0 - Kernel

This is a discussion on spinlock lockup on CPU#0 - Kernel ; Hello, I recorded a .mov of a spinlock, but cant seem to send it out, so I have to manually write it down: BUG: spinlock lockup on CPU#0, swapper/0, c17fa6c0 Pid: 0. swapper Not tainted 2.6.25-03562-g3dc5063 #1 [ ] _raw_spin_lock+0xd5/0xf9 ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: spinlock lockup on CPU#0

  1. spinlock lockup on CPU#0

    Hello, I recorded a .mov of a spinlock, but cant seem to send it out,
    so I have to manually write it down:

    BUG: spinlock lockup on CPU#0, swapper/0, c17fa6c0
    Pid: 0. swapper Not tainted 2.6.25-03562-g3dc5063 #1
    [] _raw_spin_lock+0xd5/0xf9
    [] _spin_lock+0x8/0xa
    [] task_rq_lock+0x44/0x6b
    [] try_to_wake+0x2a/0x1c4
    [] default_wake_function+0xb/0xd
    [] __wake_up_common+0x2f/0x5a
    [] complete+0x2b/0x3e
    [] usb_api_blocking_completion+0x13/0x15
    [] usb_hcd_giveback_urb+0x52/0x82
    [] ehci_urb_done+0x6f/0x7c [ehci_hcd]
    [] qh_completions+0x2d7/0x348 [ehci_hcd]
    [] ehci_work+0x9c [ehci_hcd]
    [] ? sched_clock+0xb/0x1c
    [] ? __update_rq_clock+0x94/0x15a
    [] ehci_irq+0x138/0x15f [ehci_hcd]
    [ ] usb_hcd_irq+0x23/0x51
    [] handle_IRQ_event+0x2a/0x5a
    [] handle_fasteoi_irq+0x74/0xb6
    [] do_IRQ+0x71/0x8c
    [] common_interrupt+0x23/0x28
    [] ? sched_clock_idle_wakeup_event+0x5b/0x74
    [] acpi_idle_enter_bm+0x2a4/0x31f [processor]
    [] cpuidle_idle_call+0x5c/0x8c
    [] ? cpuidle_idle_call+0x0/0x8c
    [] cpu_idle+0xb1/0xd1
    [] reset_init+0x49/0x4b
    ================================================== ===


    Hopefully the numbers are right, and hopefully this provides enough
    info to help the kernel out
    regards;

    --
    Justin P. Mattock
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: spinlock lockup on CPU#0

    On Thu, 24 Apr 2008 01:41:32 +0000 "Justin Mattock" wrote:

    > Hello, I recorded a .mov of a spinlock, but cant seem to send it out,
    > so I have to manually write it down:
    >
    > BUG: spinlock lockup on CPU#0, swapper/0, c17fa6c0
    > Pid: 0. swapper Not tainted 2.6.25-03562-g3dc5063 #1
    > [] _raw_spin_lock+0xd5/0xf9
    > [] _spin_lock+0x8/0xa
    > [] task_rq_lock+0x44/0x6b
    > [] try_to_wake+0x2a/0x1c4
    > [] default_wake_function+0xb/0xd
    > [] __wake_up_common+0x2f/0x5a
    > [] complete+0x2b/0x3e
    > [] usb_api_blocking_completion+0x13/0x15
    > [] usb_hcd_giveback_urb+0x52/0x82
    > [] ehci_urb_done+0x6f/0x7c [ehci_hcd]
    > [] qh_completions+0x2d7/0x348 [ehci_hcd]
    > [] ehci_work+0x9c [ehci_hcd]
    > [] ? sched_clock+0xb/0x1c
    > [] ? __update_rq_clock+0x94/0x15a
    > [] ehci_irq+0x138/0x15f [ehci_hcd]
    > [ ] usb_hcd_irq+0x23/0x51
    > [] handle_IRQ_event+0x2a/0x5a
    > [] handle_fasteoi_irq+0x74/0xb6
    > [] do_IRQ+0x71/0x8c
    > [] common_interrupt+0x23/0x28
    > [] ? sched_clock_idle_wakeup_event+0x5b/0x74
    > [] acpi_idle_enter_bm+0x2a4/0x31f [processor]
    > [] cpuidle_idle_call+0x5c/0x8c
    > [] ? cpuidle_idle_call+0x0/0x8c
    > [] cpu_idle+0xb1/0xd1
    > [] reset_init+0x49/0x4b
    > ================================================== ===
    >
    >
    > Hopefully the numbers are right, and hopefully this provides enough
    > info to help the kernel out


    Well that's cute. At a guess I'd say that acpi_processor_idle() managed to
    call sched_clock_idle_wakeup_event() with local interrupts enabled. We
    took an interrupt with rq->lock held and things went downhill from there.

    Can you add this please, see if it triggers?


    --- a/kernel/sched.c~a
    +++ a/kernel/sched.c
    @@ -811,6 +811,7 @@ void sched_clock_idle_sleep_event(void)
    {
    struct rq *rq = cpu_rq(smp_processor_id());

    + WARN_ON(!irqs_disabled());
    spin_lock(&rq->lock);
    __update_rq_clock(rq);
    spin_unlock(&rq->lock);
    @@ -826,6 +827,7 @@ void sched_clock_idle_wakeup_event(u64 d
    struct rq *rq = cpu_rq(smp_processor_id());
    u64 now = sched_clock();

    + WARN_ON(!irqs_disabled());
    rq->idle_clock += delta_ns;
    /*
    * Override the previous timestamp and ignore all
    _

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: spinlock lockup on CPU#0

    On Sat, Apr 26, 2008 at 6:29 PM, Andrew Morton
    wrote:
    >
    > On Thu, 24 Apr 2008 01:41:32 +0000 "Justin Mattock" wrote:
    >
    > > Hello, I recorded a .mov of a spinlock, but cant seem to send it out,
    > > so I have to manually write it down:
    > >
    > > BUG: spinlock lockup on CPU#0, swapper/0, c17fa6c0
    > > Pid: 0. swapper Not tainted 2.6.25-03562-g3dc5063 #1
    > > [] _raw_spin_lock+0xd5/0xf9
    > > [] _spin_lock+0x8/0xa
    > > [] task_rq_lock+0x44/0x6b
    > > [] try_to_wake+0x2a/0x1c4
    > > [] default_wake_function+0xb/0xd
    > > [] __wake_up_common+0x2f/0x5a
    > > [] complete+0x2b/0x3e
    > > [] usb_api_blocking_completion+0x13/0x15
    > > [] usb_hcd_giveback_urb+0x52/0x82
    > > [] ehci_urb_done+0x6f/0x7c [ehci_hcd]
    > > [] qh_completions+0x2d7/0x348 [ehci_hcd]
    > > [] ehci_work+0x9c [ehci_hcd]
    > > [] ? sched_clock+0xb/0x1c
    > > [] ? __update_rq_clock+0x94/0x15a
    > > [] ehci_irq+0x138/0x15f [ehci_hcd]
    > > [ ] usb_hcd_irq+0x23/0x51
    > > [] handle_IRQ_event+0x2a/0x5a
    > > [] handle_fasteoi_irq+0x74/0xb6
    > > [] do_IRQ+0x71/0x8c
    > > [] common_interrupt+0x23/0x28
    > > [] ? sched_clock_idle_wakeup_event+0x5b/0x74
    > > [] acpi_idle_enter_bm+0x2a4/0x31f [processor]
    > > [] cpuidle_idle_call+0x5c/0x8c
    > > [] ? cpuidle_idle_call+0x0/0x8c
    > > [] cpu_idle+0xb1/0xd1
    > > [] reset_init+0x49/0x4b
    > > ================================================== ===
    > >
    > >
    > > Hopefully the numbers are right, and hopefully this provides enough
    > > info to help the kernel out

    >
    > Well that's cute. At a guess I'd say that acpi_processor_idle() managed to
    > call sched_clock_idle_wakeup_event() with local interrupts enabled. We
    > took an interrupt with rq->lock held and things went downhill from there.
    >
    > Can you add this please, see if it triggers?
    >
    >
    > --- a/kernel/sched.c~a
    > +++ a/kernel/sched.c
    > @@ -811,6 +811,7 @@ void sched_clock_idle_sleep_event(void)
    > {
    > struct rq *rq = cpu_rq(smp_processor_id());
    >
    > + WARN_ON(!irqs_disabled());
    > spin_lock(&rq->lock);
    > __update_rq_clock(rq);
    > spin_unlock(&rq->lock);
    > @@ -826,6 +827,7 @@ void sched_clock_idle_wakeup_event(u64 d
    > struct rq *rq = cpu_rq(smp_processor_id());
    > u64 now = sched_clock();
    >
    > + WARN_ON(!irqs_disabled());
    > rq->idle_clock += delta_ns;
    > /*
    > * Override the previous timestamp and ignore all
    > _
    >
    >


    Yeah, I dont mind adding this to see what happens.
    regards

    --
    Justin P. Mattock
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: spinlock lockup on CPU#0


    * Andrew Morton wrote:

    > > Hopefully the numbers are right, and hopefully this provides enough
    > > info to help the kernel out

    >
    > Well that's cute. At a guess I'd say that acpi_processor_idle()
    > managed to call sched_clock_idle_wakeup_event() with local interrupts
    > enabled. We took an interrupt with rq->lock held and things went
    > downhill from there.
    >
    > Can you add this please, see if it triggers?


    there's fixes pending in this area. The main fix would be the one below.

    Ingo

    ---------------->
    Subject: idle (arch, acpi and apm) and lockdep
    From: Peter Zijlstra
    Date: Fri, 25 Apr 2008 17:39:01 +0200

    On Fri, 2008-04-25 at 16:59 +0200, Peter Zijlstra wrote:
    > On Fri, 2008-04-25 at 14:47 +0000, Justin Mattock wrote:
    > > On Fri, Apr 25, 2008 at 12:11 PM, Peter Zijlstra wrote:
    > > > On Fri, 2008-04-25 at 00:24 +0000, Justin Mattock wrote:
    > > >
    > > > > [ 13.269763] =================================
    > > > > [ 13.270954] [ INFO: inconsistent lock state ]
    > > > > [ 13.271865] 2.6.25-04569-gb69d398 #3
    > > > > [ 13.272614] ---------------------------------
    > > > > [ 13.273521] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
    > > > > [ 13.274745] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
    > > > > [ 13.275787] (&rq->rq_lock_key){++..}, at: [] sched_clock_idle_wakeup_event+0x43/0x74
    > > > > [ 13.276859] {in-hardirq-W} state was registered at:
    > > > > [ 13.276859] [] __lock_acquire+0x419/0xb70
    > > > > [ 13.276859] [] lock_acquire+0x7f/0xa6
    > > > > [ 13.280188] [] _spin_lock+0x1c/0x49
    > > > > [ 13.280188] [] scheduler_tick+0x43/0x1bd
    > > > > [ 13.280188] [] update_process_times+0x3d/0x49
    > > > > [ 13.283524] [] tick_periodic+0x66/0x72
    > > > > [ 13.283524] [] tick_handle_periodic+0x19/0x6a
    > > > > [ 13.283524] [] timer_interrupt+0x47/0x4e
    > > > > [ 13.286855] [] handle_IRQ_event+0x1a/0x4f
    > > > > [ 13.286855] [] handle_level_irq+0x7f/0xca
    > > > > [ 13.286855] [] do_IRQ+0x71/0x8a
    > > > > [ 13.290190] [] common_interrupt+0x2e/0x34
    > > > > [ 13.290190] [] calibrate_delay+0x8f/0x276
    > > > > [ 13.290190] [] start_kernel+0x27c/0x2f8
    > > > > [ 13.293520] [] __init_begin+0x8/0xa
    > > > > [ 13.293520] [] 0xffffffff
    > > > > [ 13.293520] irq event stamp: 253965
    > > > > [ 13.296856] hardirqs last enabled at (253965): [] native_sched_clock+0xe7/0xff
    > > > > [ 13.296856] hardirqs last disabled at (253964): [] native_sched_clock+0x6d/0xff
    > > > > [ 13.300190] softirqs last enabled at (253958): [] __do_softirq+0xf9/0xff
    > > > > [ 13.300190] softirqs last disabled at (253953): [] do_softirq+0x4d/0x79
    > > > > [ 13.303522]
    > > > > [ 13.303522] other info that might help us debug this:
    > > > > [ 13.303522] no locks held by swapper/0.
    > > > > [ 13.303522]
    > > > > [ 13.303522] stack backtrace:
    > > > > [ 13.306852] Pid: 0, comm: swapper Not tainted 2.6.25-04569-gb69d398 #3
    > > > > [ 13.336851] [] print_usage_bug+0x106/0x113
    > > > > [ 13.340185] [] mark_lock+0x1ed/0x3a5
    > > > > [ 13.343519] [] __lock_acquire+0x48e/0xb70
    > > > > [ 13.346852] [] lock_acquire+0x7f/0xa6
    > > > > [ 13.350185] [] ? sched_clock_idle_wakeup_event+0x43/0x74
    > > > > [ 13.353519] [] _spin_lock+0x1c/0x49
    > > > > [ 13.360185] [] ? sched_clock_idle_wakeup_event+0x43/0x74
    > > > > [ 13.363518] [] sched_clock_idle_wakeup_event+0x43/0x74

    >
    > Got it:
    >
    > acpi_idle_do_entry()
    > acpi_processor_ffh_cstate_enter()
    > mwait_idle_with_hints() (32 bit)
    > local_irq_enable()
    >
    > sched_clock_idle_wakeup_event()
    >
    >
    > I think my recent idle patches should address this, no?
    >
    > > > > [ 13.366851] [] acpi_idle_enter_bm+0x2b3/0x333 [processor]
    > > > > [ 13.370184] [] cpuidle_idle_call+0x63/0x92
    > > > > [ 13.373517] [] ? cpuidle_idle_call+0x0/0x92
    > > > > [ 13.380184] [] cpu_idle+0xb6/0xd6
    > > > > [ 13.383517] [] rest_init+0x49/0x4b
    > > > > [ 13.386850] =======================


    (I just found out I failed to copy LKML on the last discussion about
    these patches)

    Signed-off-by: Ingo Molnar
    ---
    From: Peter Zijlstra

    OK, so 25-mm1 gave a lockdep error which made me look into this.

    The first thing that I noticed was the horrible mess; the second thing I
    saw was hacks like: 71e93d15612c61c2e26a169567becf088e71b8ff

    The problem is that arch idle routines are somewhat inconsitent with
    their IRQ state handling and instead of fixing _that_, we go paper over
    the problem.

    So the thing I've tried to do is set a standard for idle routines and
    fix them all up to adhere to that. So the rules are:

    idle routines are entered with IRQs disabled
    idle routines will exit with IRQs enabled

    Nearly all already did this in one form or another.

    Merge the 32 and 64 bit bits so they no longer have different bugs.

    As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
    irq-enable.

    Signed-off-by: Peter Zijlstra
    ---
    arch/x86/kernel/apm_32.c | 3 +
    arch/x86/kernel/process.c | 117 +++++++++++++++++++++++++++++++++++++++
    arch/x86/kernel/process_32.c | 118 +---------------------------------------
    arch/x86/kernel/process_64.c | 123 +-----------------------------------------
    drivers/acpi/processor_idle.c | 19 +++---
    include/asm-x86/processor.h | 1
    6 files changed, 137 insertions(+), 244 deletions(-)

    Index: linux/arch/x86/kernel/apm_32.c
    ================================================== =================
    --- linux.orig/arch/x86/kernel/apm_32.c
    +++ linux/arch/x86/kernel/apm_32.c
    @@ -904,6 +904,7 @@ recalc:
    original_pm_idle();
    else
    default_idle();
    + local_irq_disable();
    jiffies_since_last_check = jiffies - last_jiffies;
    if (jiffies_since_last_check > idle_period)
    goto recalc;
    @@ -911,6 +912,8 @@ recalc:

    if (apm_idle_done)
    apm_do_busy();
    +
    + local_irq_enable();
    }

    /**
    Index: linux/arch/x86/kernel/process.c
    ================================================== =================
    --- linux.orig/arch/x86/kernel/process.c
    +++ linux/arch/x86/kernel/process.c
    @@ -4,6 +4,8 @@
    #include
    #include
    #include
    +#include
    +#include

    struct kmem_cache *task_xstate_cachep;

    @@ -42,3 +44,118 @@ void arch_task_cache_init(void)
    __alignof__(union thread_xstate),
    SLAB_PANIC | SLAB_NOTRACK, NULL);
    }
    +
    +static void do_nothing(void *unused)
    +{
    +}
    +
    +/*
    + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    + * pm_idle and update to new pm_idle value. Required while changing pm_idle
    + * handler on SMP systems.
    + *
    + * Caller must have changed pm_idle to the new value before the call. Old
    + * pm_idle value will not be used by any CPU after the return of this function.
    + */
    +void cpu_idle_wait(void)
    +{
    + smp_mb();
    + /* kick all the CPUs so that they exit out of pm_idle */
    + smp_call_function(do_nothing, NULL, 0, 1);
    +}
    +EXPORT_SYMBOL_GPL(cpu_idle_wait);
    +
    +/*
    + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    + * which can obviate IPI to trigger checking of need_resched.
    + * We execute MONITOR against need_resched and enter optimized wait state
    + * through MWAIT. Whenever someone changes need_resched, we would be woken
    + * up from MWAIT (without an IPI).
    + *
    + * New with Core Duo processors, MWAIT can take some hints based on CPU
    + * capability.
    + */
    +void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    +{
    + if (!need_resched()) {
    + __monitor((void *)&current_thread_info()->flags, 0, 0);
    + smp_mb();
    + if (!need_resched())
    + __mwait(ax, cx);
    + }
    +}
    +
    +/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    +static void mwait_idle(void)
    +{
    + if (!need_resched()) {
    + __monitor((void *)&current_thread_info()->flags, 0, 0);
    + smp_mb();
    + if (!need_resched())
    + __sti_mwait(0, 0);
    + else
    + local_irq_enable();
    + } else
    + local_irq_enable();
    +}
    +
    +
    +static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    +{
    + if (force_mwait)
    + return 1;
    + /* Any C1 states supported? */
    + return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    +}
    +
    +/*
    + * On SMP it's slightly faster (but much more power-consuming!)
    + * to poll the ->work.need_resched flag instead of waiting for the
    + * cross-CPU IPI to arrive. Use this option with caution.
    + */
    +static void poll_idle(void)
    +{
    + local_irq_enable();
    + cpu_relax();
    +}
    +
    +void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    +{
    + static int selected;
    +
    + if (selected)
    + return;
    +#ifdef CONFIG_X86_SMP
    + if (pm_idle == poll_idle && smp_num_siblings > 1) {
    + printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    + " performance may degrade.\n");
    + }
    +#endif
    + if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    + /*
    + * Skip, if setup has overridden idle.
    + * One CPU supports mwait => All CPUs supports mwait
    + */
    + if (!pm_idle) {
    + printk(KERN_INFO "using mwait in idle threads.\n");
    + pm_idle = mwait_idle;
    + }
    + }
    + selected = 1;
    +}
    +
    +static int __init idle_setup(char *str)
    +{
    + if (!strcmp(str, "poll")) {
    + printk("using polling idle threads.\n");
    + pm_idle = poll_idle;
    + } else if (!strcmp(str, "mwait"))
    + force_mwait = 1;
    + else
    + return -1;
    +
    + boot_option_idle_override = 1;
    + return 0;
    +}
    +early_param("idle", idle_setup);
    +
    Index: linux/arch/x86/kernel/process_32.c
    ================================================== =================
    --- linux.orig/arch/x86/kernel/process_32.c
    +++ linux/arch/x86/kernel/process_32.c
    @@ -111,12 +111,10 @@ void default_idle(void)
    */
    smp_mb();

    - local_irq_disable();
    - if (!need_resched()) {
    + if (!need_resched())
    safe_halt(); /* enables interrupts racelessly */
    - local_irq_disable();
    - }
    - local_irq_enable();
    + else
    + local_irq_enable();
    current_thread_info()->status |= TS_POLLING;
    } else {
    local_irq_enable();
    @@ -128,17 +126,6 @@ void default_idle(void)
    EXPORT_SYMBOL(default_idle);
    #endif

    -/*
    - * On SMP it's slightly faster (but much more power-consuming!)
    - * to poll the ->work.need_resched flag instead of waiting for the
    - * cross-CPU IPI to arrive. Use this option with caution.
    - */
    -static void poll_idle(void)
    -{
    - local_irq_enable();
    - cpu_relax();
    -}
    -
    #ifdef CONFIG_HOTPLUG_CPU
    #include
    /* We don't actually take CPU down, just spin without interrupts. */
    @@ -196,6 +183,7 @@ void cpu_idle(void)
    if (cpu_is_offline(cpu))
    play_dead();

    + local_irq_disable();
    __get_cpu_var(irq_stat).idle_timestamp = jiffies;
    /* Don't trace irqs off for idle */
    stop_critical_timings();
    @@ -209,104 +197,6 @@ void cpu_idle(void)
    }
    }

    -static void do_nothing(void *unused)
    -{
    -}
    -
    -/*
    - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    - * pm_idle and update to new pm_idle value. Required while changing pm_idle
    - * handler on SMP systems.
    - *
    - * Caller must have changed pm_idle to the new value before the call. Old
    - * pm_idle value will not be used by any CPU after the return of this function.
    - */
    -void cpu_idle_wait(void)
    -{
    - smp_mb();
    - /* kick all the CPUs so that they exit out of pm_idle */
    - smp_call_function(do_nothing, NULL, 0, 1);
    -}
    -EXPORT_SYMBOL_GPL(cpu_idle_wait);
    -
    -/*
    - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    - * which can obviate IPI to trigger checking of need_resched.
    - * We execute MONITOR against need_resched and enter optimized wait state
    - * through MWAIT. Whenever someone changes need_resched, we would be woken
    - * up from MWAIT (without an IPI).
    - *
    - * New with Core Duo processors, MWAIT can take some hints based on CPU
    - * capability.
    - */
    -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    -{
    - if (!need_resched()) {
    - __monitor((void *)&current_thread_info()->flags, 0, 0);
    - smp_mb();
    - if (!need_resched())
    - __sti_mwait(ax, cx);
    - else
    - local_irq_enable();
    - } else
    - local_irq_enable();
    -}
    -
    -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    -static void mwait_idle(void)
    -{
    - local_irq_enable();
    - mwait_idle_with_hints(0, 0);
    -}
    -
    -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    -{
    - if (force_mwait)
    - return 1;
    - /* Any C1 states supported? */
    - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    -}
    -
    -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    -{
    - static int selected;
    -
    - if (selected)
    - return;
    -#ifdef CONFIG_X86_SMP
    - if (pm_idle == poll_idle && smp_num_siblings > 1) {
    - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    - " performance may degrade.\n");
    - }
    -#endif
    - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    - /*
    - * Skip, if setup has overridden idle.
    - * One CPU supports mwait => All CPUs supports mwait
    - */
    - if (!pm_idle) {
    - printk(KERN_INFO "using mwait in idle threads.\n");
    - pm_idle = mwait_idle;
    - }
    - }
    - selected = 1;
    -}
    -
    -static int __init idle_setup(char *str)
    -{
    - if (!strcmp(str, "poll")) {
    - printk("using polling idle threads.\n");
    - pm_idle = poll_idle;
    - } else if (!strcmp(str, "mwait"))
    - force_mwait = 1;
    - else
    - return -1;
    -
    - boot_option_idle_override = 1;
    - return 0;
    -}
    -early_param("idle", idle_setup);
    -
    void __show_regs(struct pt_regs *regs, int all)
    {
    unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
    Index: linux/arch/x86/kernel/process_64.c
    ================================================== =================
    --- linux.orig/arch/x86/kernel/process_64.c
    +++ linux/arch/x86/kernel/process_64.c
    @@ -107,26 +107,13 @@ void default_idle(void)
    * test NEED_RESCHED:
    */
    smp_mb();
    - local_irq_disable();
    - if (!need_resched()) {
    + if (!need_resched())
    safe_halt(); /* enables interrupts racelessly */
    - local_irq_disable();
    - }
    - local_irq_enable();
    + else
    + local_irq_enable();
    current_thread_info()->status |= TS_POLLING;
    }

    -/*
    - * On SMP it's slightly faster (but much more power-consuming!)
    - * to poll the ->need_resched flag instead of waiting for the
    - * cross-CPU IPI to arrive. Use this option with caution.
    - */
    -static void poll_idle(void)
    -{
    - local_irq_enable();
    - cpu_relax();
    -}
    -
    #ifdef CONFIG_HOTPLUG_CPU
    DECLARE_PER_CPU(int, cpu_state);

    @@ -207,110 +194,6 @@ void cpu_idle(void)
    }
    }

    -static void do_nothing(void *unused)
    -{
    -}
    -
    -/*
    - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    - * pm_idle and update to new pm_idle value. Required while changing pm_idle
    - * handler on SMP systems.
    - *
    - * Caller must have changed pm_idle to the new value before the call. Old
    - * pm_idle value will not be used by any CPU after the return of this function.
    - */
    -void cpu_idle_wait(void)
    -{
    - smp_mb();
    - /* kick all the CPUs so that they exit out of pm_idle */
    - smp_call_function(do_nothing, NULL, 0, 1);
    -}
    -EXPORT_SYMBOL_GPL(cpu_idle_wait);
    -
    -/*
    - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    - * which can obviate IPI to trigger checking of need_resched.
    - * We execute MONITOR against need_resched and enter optimized wait state
    - * through MWAIT. Whenever someone changes need_resched, we would be woken
    - * up from MWAIT (without an IPI).
    - *
    - * New with Core Duo processors, MWAIT can take some hints based on CPU
    - * capability.
    - */
    -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    -{
    - if (!need_resched()) {
    - __monitor((void *)&current_thread_info()->flags, 0, 0);
    - smp_mb();
    - if (!need_resched())
    - __mwait(ax, cx);
    - }
    -}
    -
    -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    -static void mwait_idle(void)
    -{
    - if (!need_resched()) {
    - __monitor((void *)&current_thread_info()->flags, 0, 0);
    - smp_mb();
    - if (!need_resched())
    - __sti_mwait(0, 0);
    - else
    - local_irq_enable();
    - } else {
    - local_irq_enable();
    - }
    -}
    -
    -
    -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    -{
    - if (force_mwait)
    - return 1;
    - /* Any C1 states supported? */
    - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    -}
    -
    -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    -{
    - static int selected;
    -
    - if (selected)
    - return;
    -#ifdef CONFIG_X86_SMP
    - if (pm_idle == poll_idle && smp_num_siblings > 1) {
    - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    - " performance may degrade.\n");
    - }
    -#endif
    - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    - /*
    - * Skip, if setup has overridden idle.
    - * One CPU supports mwait => All CPUs supports mwait
    - */
    - if (!pm_idle) {
    - printk(KERN_INFO "using mwait in idle threads.\n");
    - pm_idle = mwait_idle;
    - }
    - }
    - selected = 1;
    -}
    -
    -static int __init idle_setup(char *str)
    -{
    - if (!strcmp(str, "poll")) {
    - printk("using polling idle threads.\n");
    - pm_idle = poll_idle;
    - } else if (!strcmp(str, "mwait"))
    - force_mwait = 1;
    - else
    - return -1;
    -
    - boot_option_idle_override = 1;
    - return 0;
    -}
    -early_param("idle", idle_setup);
    -
    /* Prints also some state that isn't saved in the pt_regs */
    void __show_regs(struct pt_regs * regs, int all)
    {
    Index: linux/drivers/acpi/processor_idle.c
    ================================================== =================
    --- linux.orig/drivers/acpi/processor_idle.c
    +++ linux/drivers/acpi/processor_idle.c
    @@ -418,13 +418,12 @@ static void acpi_processor_idle(void)

    cx = pr->power.state;
    if (!cx || acpi_idle_suspend) {
    - if (pm_idle_save)
    - pm_idle_save();
    - else
    + if (pm_idle_save) {
    + pm_idle_save(); /* enables IRQs */
    + } else {
    acpi_safe_halt();
    -
    - if (irqs_disabled())
    local_irq_enable();
    + }

    return;
    }
    @@ -520,10 +519,12 @@ static void acpi_processor_idle(void)
    * Use the appropriate idle routine, the one that would
    * be used without acpi C-states.
    */
    - if (pm_idle_save)
    - pm_idle_save();
    - else
    + if (pm_idle_save) {
    + pm_idle_save(); /* enables IRQs */
    + } else {
    acpi_safe_halt();
    + local_irq_enable();
    + }

    /*
    * TBD: Can't get time duration while in C1, as resumes
    @@ -534,8 +535,6 @@ static void acpi_processor_idle(void)
    * skew otherwise.
    */
    sleep_ticks = 0xFFFFFFFF;
    - if (irqs_disabled())
    - local_irq_enable();

    break;

    Index: linux/include/asm-x86/processor.h
    ================================================== =================
    --- linux.orig/include/asm-x86/processor.h
    +++ linux/include/asm-x86/processor.h
    @@ -725,6 +725,7 @@ static inline void __mwait(unsigned long

    static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
    {
    + trace_hardirqs_on();
    /* "mwait %eax, %ecx;" */
    asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
    :: "a" (eax), "c" (ecx));
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: spinlock lockup on CPU#0

    On Sat, Apr 26, 2008 at 7:14 PM, Ingo Molnar wrote:
    >
    > * Andrew Morton wrote:
    >
    >
    > > > Hopefully the numbers are right, and hopefully this provides enough
    > > > info to help the kernel out

    > >
    > > Well that's cute. At a guess I'd say that acpi_processor_idle()
    > > managed to call sched_clock_idle_wakeup_event() with local interrupts
    > > enabled. We took an interrupt with rq->lock held and things went
    > > downhill from there.
    > >
    > > Can you add this please, see if it triggers?

    >
    > there's fixes pending in this area. The main fix would be the one below.
    >
    > Ingo
    >
    > ---------------->
    > Subject: idle (arch, acpi and apm) and lockdep
    > From: Peter Zijlstra
    > Date: Fri, 25 Apr 2008 17:39:01 +0200
    >
    > On Fri, 2008-04-25 at 16:59 +0200, Peter Zijlstra wrote:
    > > On Fri, 2008-04-25 at 14:47 +0000, Justin Mattock wrote:
    > > > On Fri, Apr 25, 2008 at 12:11 PM, Peter Zijlstra wrote:
    > > > > On Fri, 2008-04-25 at 00:24 +0000, Justin Mattock wrote:
    > > > >
    > > > > > [ 13.269763] =================================
    > > > > > [ 13.270954] [ INFO: inconsistent lock state ]
    > > > > > [ 13.271865] 2.6.25-04569-gb69d398 #3
    > > > > > [ 13.272614] ---------------------------------
    > > > > > [ 13.273521] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
    > > > > > [ 13.274745] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
    > > > > > [ 13.275787] (&rq->rq_lock_key){++..}, at: [] sched_clock_idle_wakeup_event+0x43/0x74
    > > > > > [ 13.276859] {in-hardirq-W} state was registered at:
    > > > > > [ 13.276859] [] __lock_acquire+0x419/0xb70
    > > > > > [ 13.276859] [] lock_acquire+0x7f/0xa6
    > > > > > [ 13.280188] [] _spin_lock+0x1c/0x49
    > > > > > [ 13.280188] [] scheduler_tick+0x43/0x1bd
    > > > > > [ 13.280188] [] update_process_times+0x3d/0x49
    > > > > > [ 13.283524] [] tick_periodic+0x66/0x72
    > > > > > [ 13.283524] [] tick_handle_periodic+0x19/0x6a
    > > > > > [ 13.283524] [] timer_interrupt+0x47/0x4e
    > > > > > [ 13.286855] [] handle_IRQ_event+0x1a/0x4f
    > > > > > [ 13.286855] [] handle_level_irq+0x7f/0xca
    > > > > > [ 13.286855] [] do_IRQ+0x71/0x8a
    > > > > > [ 13.290190] [] common_interrupt+0x2e/0x34
    > > > > > [ 13.290190] [] calibrate_delay+0x8f/0x276
    > > > > > [ 13.290190] [] start_kernel+0x27c/0x2f8
    > > > > > [ 13.293520] [] __init_begin+0x8/0xa
    > > > > > [ 13.293520] [] 0xffffffff
    > > > > > [ 13.293520] irq event stamp: 253965
    > > > > > [ 13.296856] hardirqs last enabled at (253965): [] native_sched_clock+0xe7/0xff
    > > > > > [ 13.296856] hardirqs last disabled at (253964): [] native_sched_clock+0x6d/0xff
    > > > > > [ 13.300190] softirqs last enabled at (253958): [] __do_softirq+0xf9/0xff
    > > > > > [ 13.300190] softirqs last disabled at (253953): [] do_softirq+0x4d/0x79
    > > > > > [ 13.303522]
    > > > > > [ 13.303522] other info that might help us debug this:
    > > > > > [ 13.303522] no locks held by swapper/0.
    > > > > > [ 13.303522]
    > > > > > [ 13.303522] stack backtrace:
    > > > > > [ 13.306852] Pid: 0, comm: swapper Not tainted 2.6.25-04569-gb69d398 #3
    > > > > > [ 13.336851] [] print_usage_bug+0x106/0x113
    > > > > > [ 13.340185] [] mark_lock+0x1ed/0x3a5
    > > > > > [ 13.343519] [] __lock_acquire+0x48e/0xb70
    > > > > > [ 13.346852] [] lock_acquire+0x7f/0xa6
    > > > > > [ 13.350185] [] ? sched_clock_idle_wakeup_event+0x43/0x74
    > > > > > [ 13.353519] [] _spin_lock+0x1c/0x49
    > > > > > [ 13.360185] [] ? sched_clock_idle_wakeup_event+0x43/0x74
    > > > > > [ 13.363518] [] sched_clock_idle_wakeup_event+0x43/0x74

    > >
    > > Got it:
    > >
    > > acpi_idle_do_entry()
    > > acpi_processor_ffh_cstate_enter()
    > > mwait_idle_with_hints() (32 bit)
    > > local_irq_enable()
    > >
    > > sched_clock_idle_wakeup_event()
    > >
    > >
    > > I think my recent idle patches should address this, no?
    > >
    > > > > > [ 13.366851] [] acpi_idle_enter_bm+0x2b3/0x333 [processor]
    > > > > > [ 13.370184] [] cpuidle_idle_call+0x63/0x92
    > > > > > [ 13.373517] [] ? cpuidle_idle_call+0x0/0x92
    > > > > > [ 13.380184] [] cpu_idle+0xb6/0xd6
    > > > > > [ 13.383517] [] rest_init+0x49/0x4b
    > > > > > [ 13.386850] =======================

    >
    > (I just found out I failed to copy LKML on the last discussion about
    > these patches)
    >
    > Signed-off-by: Ingo Molnar
    > ---
    > From: Peter Zijlstra
    >
    > OK, so 25-mm1 gave a lockdep error which made me look into this.
    >
    > The first thing that I noticed was the horrible mess; the second thing I
    > saw was hacks like: 71e93d15612c61c2e26a169567becf088e71b8ff
    >
    > The problem is that arch idle routines are somewhat inconsitent with
    > their IRQ state handling and instead of fixing _that_, we go paper over
    > the problem.
    >
    > So the thing I've tried to do is set a standard for idle routines and
    > fix them all up to adhere to that. So the rules are:
    >
    > idle routines are entered with IRQs disabled
    > idle routines will exit with IRQs enabled
    >
    > Nearly all already did this in one form or another.
    >
    > Merge the 32 and 64 bit bits so they no longer have different bugs.
    >
    > As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
    > irq-enable.
    >
    > Signed-off-by: Peter Zijlstra
    > ---
    > arch/x86/kernel/apm_32.c | 3 +
    > arch/x86/kernel/process.c | 117 +++++++++++++++++++++++++++++++++++++++
    > arch/x86/kernel/process_32.c | 118 +---------------------------------------
    > arch/x86/kernel/process_64.c | 123 +-----------------------------------------
    > drivers/acpi/processor_idle.c | 19 +++---
    > include/asm-x86/processor.h | 1
    > 6 files changed, 137 insertions(+), 244 deletions(-)
    >
    > Index: linux/arch/x86/kernel/apm_32.c
    > ================================================== =================
    > --- linux.orig/arch/x86/kernel/apm_32.c
    > +++ linux/arch/x86/kernel/apm_32.c
    > @@ -904,6 +904,7 @@ recalc:
    > original_pm_idle();
    > else
    > default_idle();
    > + local_irq_disable();
    > jiffies_since_last_check = jiffies - last_jiffies;
    > if (jiffies_since_last_check > idle_period)
    > goto recalc;
    > @@ -911,6 +912,8 @@ recalc:
    >
    > if (apm_idle_done)
    > apm_do_busy();
    > +
    > + local_irq_enable();
    > }
    >
    > /**
    > Index: linux/arch/x86/kernel/process.c
    > ================================================== =================
    > --- linux.orig/arch/x86/kernel/process.c
    > +++ linux/arch/x86/kernel/process.c
    > @@ -4,6 +4,8 @@
    > #include
    > #include
    > #include
    > +#include
    > +#include
    >
    > struct kmem_cache *task_xstate_cachep;
    >
    > @@ -42,3 +44,118 @@ void arch_task_cache_init(void)
    > __alignof__(union thread_xstate),
    > SLAB_PANIC | SLAB_NOTRACK, NULL);
    > }
    > +
    > +static void do_nothing(void *unused)
    > +{
    > +}
    > +
    > +/*
    > + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    > + * pm_idle and update to new pm_idle value. Required while changing pm_idle
    > + * handler on SMP systems.
    > + *
    > + * Caller must have changed pm_idle to the new value before the call. Old
    > + * pm_idle value will not be used by any CPU after the return of this function.
    > + */
    > +void cpu_idle_wait(void)
    > +{
    > + smp_mb();
    > + /* kick all the CPUs so that they exit out of pm_idle */
    > + smp_call_function(do_nothing, NULL, 0, 1);
    > +}
    > +EXPORT_SYMBOL_GPL(cpu_idle_wait);
    > +
    > +/*
    > + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    > + * which can obviate IPI to trigger checking of need_resched.
    > + * We execute MONITOR against need_resched and enter optimized wait state
    > + * through MWAIT. Whenever someone changes need_resched, we would be woken
    > + * up from MWAIT (without an IPI).
    > + *
    > + * New with Core Duo processors, MWAIT can take some hints based on CPU
    > + * capability.
    > + */
    > +void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    > +{
    > + if (!need_resched()) {
    > + __monitor((void *)&current_thread_info()->flags, 0, 0);
    > + smp_mb();
    > + if (!need_resched())
    > + __mwait(ax, cx);
    > + }
    > +}
    > +
    > +/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    > +static void mwait_idle(void)
    > +{
    > + if (!need_resched()) {
    > + __monitor((void *)&current_thread_info()->flags, 0, 0);
    > + smp_mb();
    > + if (!need_resched())
    > + __sti_mwait(0, 0);
    > + else
    > + local_irq_enable();
    > + } else
    > + local_irq_enable();
    > +}
    > +
    > +
    > +static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    > +{
    > + if (force_mwait)
    > + return 1;
    > + /* Any C1 states supported? */
    > + return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    > +}
    > +
    > +/*
    > + * On SMP it's slightly faster (but much more power-consuming!)
    > + * to poll the ->work.need_resched flag instead of waiting for the
    > + * cross-CPU IPI to arrive. Use this option with caution.
    > + */
    > +static void poll_idle(void)
    > +{
    > + local_irq_enable();
    > + cpu_relax();
    > +}
    > +
    > +void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    > +{
    > + static int selected;
    > +
    > + if (selected)
    > + return;
    > +#ifdef CONFIG_X86_SMP
    > + if (pm_idle == poll_idle && smp_num_siblings > 1) {
    > + printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    > + " performance may degrade.\n");
    > + }
    > +#endif
    > + if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    > + /*
    > + * Skip, if setup has overridden idle.
    > + * One CPU supports mwait => All CPUs supports mwait
    > + */
    > + if (!pm_idle) {
    > + printk(KERN_INFO "using mwait in idle threads.\n");
    > + pm_idle = mwait_idle;
    > + }
    > + }
    > + selected = 1;
    > +}
    > +
    > +static int __init idle_setup(char *str)
    > +{
    > + if (!strcmp(str, "poll")) {
    > + printk("using polling idle threads.\n");
    > + pm_idle = poll_idle;
    > + } else if (!strcmp(str, "mwait"))
    > + force_mwait = 1;
    > + else
    > + return -1;
    > +
    > + boot_option_idle_override = 1;
    > + return 0;
    > +}
    > +early_param("idle", idle_setup);
    > +
    > Index: linux/arch/x86/kernel/process_32.c
    > ================================================== =================
    > --- linux.orig/arch/x86/kernel/process_32.c
    > +++ linux/arch/x86/kernel/process_32.c
    > @@ -111,12 +111,10 @@ void default_idle(void)
    > */
    > smp_mb();
    >
    > - local_irq_disable();
    > - if (!need_resched()) {
    > + if (!need_resched())
    > safe_halt(); /* enables interrupts racelessly */
    > - local_irq_disable();
    > - }
    > - local_irq_enable();
    > + else
    > + local_irq_enable();
    > current_thread_info()->status |= TS_POLLING;
    > } else {
    > local_irq_enable();
    > @@ -128,17 +126,6 @@ void default_idle(void)
    > EXPORT_SYMBOL(default_idle);
    > #endif
    >
    > -/*
    > - * On SMP it's slightly faster (but much more power-consuming!)
    > - * to poll the ->work.need_resched flag instead of waiting for the
    > - * cross-CPU IPI to arrive. Use this option with caution.
    > - */
    > -static void poll_idle(void)
    > -{
    > - local_irq_enable();
    > - cpu_relax();
    > -}
    > -
    > #ifdef CONFIG_HOTPLUG_CPU
    > #include
    > /* We don't actually take CPU down, just spin without interrupts. */
    > @@ -196,6 +183,7 @@ void cpu_idle(void)
    > if (cpu_is_offline(cpu))
    > play_dead();
    >
    > + local_irq_disable();
    > __get_cpu_var(irq_stat).idle_timestamp = jiffies;
    > /* Don't trace irqs off for idle */
    > stop_critical_timings();
    > @@ -209,104 +197,6 @@ void cpu_idle(void)
    > }
    > }
    >
    > -static void do_nothing(void *unused)
    > -{
    > -}
    > -
    > -/*
    > - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    > - * pm_idle and update to new pm_idle value. Required while changing pm_idle
    > - * handler on SMP systems.
    > - *
    > - * Caller must have changed pm_idle to the new value before the call. Old
    > - * pm_idle value will not be used by any CPU after the return of this function.
    > - */
    > -void cpu_idle_wait(void)
    > -{
    > - smp_mb();
    > - /* kick all the CPUs so that they exit out of pm_idle */
    > - smp_call_function(do_nothing, NULL, 0, 1);
    > -}
    > -EXPORT_SYMBOL_GPL(cpu_idle_wait);
    > -
    > -/*
    > - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    > - * which can obviate IPI to trigger checking of need_resched.
    > - * We execute MONITOR against need_resched and enter optimized wait state
    > - * through MWAIT. Whenever someone changes need_resched, we would be woken
    > - * up from MWAIT (without an IPI).
    > - *
    > - * New with Core Duo processors, MWAIT can take some hints based on CPU
    > - * capability.
    > - */
    > -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    > -{
    > - if (!need_resched()) {
    > - __monitor((void *)&current_thread_info()->flags, 0, 0);
    > - smp_mb();
    > - if (!need_resched())
    > - __sti_mwait(ax, cx);
    > - else
    > - local_irq_enable();
    > - } else
    > - local_irq_enable();
    > -}
    > -
    > -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    > -static void mwait_idle(void)
    > -{
    > - local_irq_enable();
    > - mwait_idle_with_hints(0, 0);
    > -}
    > -
    > -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    > -{
    > - if (force_mwait)
    > - return 1;
    > - /* Any C1 states supported? */
    > - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    > -}
    > -
    > -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    > -{
    > - static int selected;
    > -
    > - if (selected)
    > - return;
    > -#ifdef CONFIG_X86_SMP
    > - if (pm_idle == poll_idle && smp_num_siblings > 1) {
    > - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    > - " performance may degrade.\n");
    > - }
    > -#endif
    > - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    > - /*
    > - * Skip, if setup has overridden idle.
    > - * One CPU supports mwait => All CPUs supports mwait
    > - */
    > - if (!pm_idle) {
    > - printk(KERN_INFO "using mwait in idle threads.\n");
    > - pm_idle = mwait_idle;
    > - }
    > - }
    > - selected = 1;
    > -}
    > -
    > -static int __init idle_setup(char *str)
    > -{
    > - if (!strcmp(str, "poll")) {
    > - printk("using polling idle threads.\n");
    > - pm_idle = poll_idle;
    > - } else if (!strcmp(str, "mwait"))
    > - force_mwait = 1;
    > - else
    > - return -1;
    > -
    > - boot_option_idle_override = 1;
    > - return 0;
    > -}
    > -early_param("idle", idle_setup);
    > -
    > void __show_regs(struct pt_regs *regs, int all)
    > {
    > unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
    > Index: linux/arch/x86/kernel/process_64.c
    > ================================================== =================
    > --- linux.orig/arch/x86/kernel/process_64.c
    > +++ linux/arch/x86/kernel/process_64.c
    > @@ -107,26 +107,13 @@ void default_idle(void)
    > * test NEED_RESCHED:
    > */
    > smp_mb();
    > - local_irq_disable();
    > - if (!need_resched()) {
    > + if (!need_resched())
    > safe_halt(); /* enables interrupts racelessly */
    > - local_irq_disable();
    > - }
    > - local_irq_enable();
    > + else
    > + local_irq_enable();
    > current_thread_info()->status |= TS_POLLING;
    > }
    >
    > -/*
    > - * On SMP it's slightly faster (but much more power-consuming!)
    > - * to poll the ->need_resched flag instead of waiting for the
    > - * cross-CPU IPI to arrive. Use this option with caution.
    > - */
    > -static void poll_idle(void)
    > -{
    > - local_irq_enable();
    > - cpu_relax();
    > -}
    > -
    > #ifdef CONFIG_HOTPLUG_CPU
    > DECLARE_PER_CPU(int, cpu_state);
    >
    > @@ -207,110 +194,6 @@ void cpu_idle(void)
    > }
    > }
    >
    > -static void do_nothing(void *unused)
    > -{
    > -}
    > -
    > -/*
    > - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
    > - * pm_idle and update to new pm_idle value. Required while changing pm_idle
    > - * handler on SMP systems.
    > - *
    > - * Caller must have changed pm_idle to the new value before the call. Old
    > - * pm_idle value will not be used by any CPU after the return of this function.
    > - */
    > -void cpu_idle_wait(void)
    > -{
    > - smp_mb();
    > - /* kick all the CPUs so that they exit out of pm_idle */
    > - smp_call_function(do_nothing, NULL, 0, 1);
    > -}
    > -EXPORT_SYMBOL_GPL(cpu_idle_wait);
    > -
    > -/*
    > - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
    > - * which can obviate IPI to trigger checking of need_resched.
    > - * We execute MONITOR against need_resched and enter optimized wait state
    > - * through MWAIT. Whenever someone changes need_resched, we would be woken
    > - * up from MWAIT (without an IPI).
    > - *
    > - * New with Core Duo processors, MWAIT can take some hints based on CPU
    > - * capability.
    > - */
    > -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
    > -{
    > - if (!need_resched()) {
    > - __monitor((void *)&current_thread_info()->flags, 0, 0);
    > - smp_mb();
    > - if (!need_resched())
    > - __mwait(ax, cx);
    > - }
    > -}
    > -
    > -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
    > -static void mwait_idle(void)
    > -{
    > - if (!need_resched()) {
    > - __monitor((void *)&current_thread_info()->flags, 0, 0);
    > - smp_mb();
    > - if (!need_resched())
    > - __sti_mwait(0, 0);
    > - else
    > - local_irq_enable();
    > - } else {
    > - local_irq_enable();
    > - }
    > -}
    > -
    > -
    > -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    > -{
    > - if (force_mwait)
    > - return 1;
    > - /* Any C1 states supported? */
    > - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
    > -}
    > -
    > -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
    > -{
    > - static int selected;
    > -
    > - if (selected)
    > - return;
    > -#ifdef CONFIG_X86_SMP
    > - if (pm_idle == poll_idle && smp_num_siblings > 1) {
    > - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
    > - " performance may degrade.\n");
    > - }
    > -#endif
    > - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
    > - /*
    > - * Skip, if setup has overridden idle.
    > - * One CPU supports mwait => All CPUs supports mwait
    > - */
    > - if (!pm_idle) {
    > - printk(KERN_INFO "using mwait in idle threads.\n");
    > - pm_idle = mwait_idle;
    > - }
    > - }
    > - selected = 1;
    > -}
    > -
    > -static int __init idle_setup(char *str)
    > -{
    > - if (!strcmp(str, "poll")) {
    > - printk("using polling idle threads.\n");
    > - pm_idle = poll_idle;
    > - } else if (!strcmp(str, "mwait"))
    > - force_mwait = 1;
    > - else
    > - return -1;
    > -
    > - boot_option_idle_override = 1;
    > - return 0;
    > -}
    > -early_param("idle", idle_setup);
    > -
    > /* Prints also some state that isn't saved in the pt_regs */
    > void __show_regs(struct pt_regs * regs, int all)
    > {
    > Index: linux/drivers/acpi/processor_idle.c
    > ================================================== =================
    > --- linux.orig/drivers/acpi/processor_idle.c
    > +++ linux/drivers/acpi/processor_idle.c
    > @@ -418,13 +418,12 @@ static void acpi_processor_idle(void)
    >
    > cx = pr->power.state;
    > if (!cx || acpi_idle_suspend) {
    > - if (pm_idle_save)
    > - pm_idle_save();
    > - else
    > + if (pm_idle_save) {
    > + pm_idle_save(); /* enables IRQs */
    > + } else {
    > acpi_safe_halt();
    > -
    > - if (irqs_disabled())
    > local_irq_enable();
    > + }
    >
    > return;
    > }
    > @@ -520,10 +519,12 @@ static void acpi_processor_idle(void)
    > * Use the appropriate idle routine, the one that would
    > * be used without acpi C-states.
    > */
    > - if (pm_idle_save)
    > - pm_idle_save();
    > - else
    > + if (pm_idle_save) {
    > + pm_idle_save(); /* enables IRQs */
    > + } else {
    > acpi_safe_halt();
    > + local_irq_enable();
    > + }
    >
    > /*
    > * TBD: Can't get time duration while in C1, as resumes
    > @@ -534,8 +535,6 @@ static void acpi_processor_idle(void)
    > * skew otherwise.
    > */
    > sleep_ticks = 0xFFFFFFFF;
    > - if (irqs_disabled())
    > - local_irq_enable();
    >
    > break;
    >
    > Index: linux/include/asm-x86/processor.h
    > ================================================== =================
    > --- linux.orig/include/asm-x86/processor.h
    > +++ linux/include/asm-x86/processor.h
    > @@ -725,6 +725,7 @@ static inline void __mwait(unsigned long
    >
    > static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
    > {
    > + trace_hardirqs_on();
    > /* "mwait %eax, %ecx;" */
    > asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
    > :: "a" (eax), "c" (ecx));
    >

    O.K.
    I added WARN_ON(!irqs_disabled()); in kernel/sched.c
    I'm not experiencing anything different, should I have done this with
    a fresh git that was'nt patched with the above?
    The numbers were different for me they were at kernel/sched.c line 1123 and 1143
    regards;

    --
    Justin P. Mattock
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: spinlock lockup on CPU#0

    On Sat, Apr 26, 2008 at 3:14 PM, Ingo Molnar wrote:
    > > Can you add this please, see if it triggers?

    >
    > there's fixes pending in this area. The main fix would be the one below.
    >
    > Ingo
    >
    > ---------------->
    > Subject: idle (arch, acpi and apm) and lockdep


    FWIW, I was seeing the same lockdep trace with eventual hangs, and
    this patch (applied with some fuzz) fixed the problem.

    --
    Bob Copeland %% www.bobcopeland.com
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: spinlock lockup on CPU#0

    On Sat, Apr 26, 2008 at 9:06 PM, Bob Copeland wrote:
    > On Sat, Apr 26, 2008 at 3:14 PM, Ingo Molnar wrote:
    > > > Can you add this please, see if it triggers?

    > >
    > > there's fixes pending in this area. The main fix would be the one below.
    > >
    > > Ingo
    > >
    > > ---------------->
    > > Subject: idle (arch, acpi and apm) and lockdep

    >
    > FWIW, I was seeing the same lockdep trace with eventual hangs, and
    > this patch (applied with some fuzz) fixed the problem.
    >
    > --
    > Bob Copeland %% www.bobcopeland.com
    >


    Just out of curiosity I put the kernel back to it's original state,
    were the freezing occurs, then booted with nohz=off, then added
    WARN_ON(!irqs_disabled()); to sched.c only to the kernel, no other
    patches, upon rebooting
    I received different results: The screen from what I could tell was
    spitting out the spinlock messages, but instead of printing that out,
    and going on to the next task it just keep't printing, from what I
    could tell something with ehci, uhci, agpgart, ieee1394 etc... too
    fast to really make anything out, the numbers on the left side keept
    moving upward, the fans started hauling ass, I waitied a few minuetes
    hopeing this would stop
    so I can grab dmesg, but it would'nt. is there a way to use the boot
    param to write date to a file? so I could capture this event.
    regards

    --
    Justin P. Mattock
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: spinlock lockup on CPU#0

    On Sat, 26 Apr 2008, Ingo Molnar wrote:
    > * Andrew Morton wrote:
    >
    > > > Hopefully the numbers are right, and hopefully this provides enough
    > > > info to help the kernel out

    > >
    > > Well that's cute. At a guess I'd say that acpi_processor_idle()
    > > managed to call sched_clock_idle_wakeup_event() with local interrupts
    > > enabled. We took an interrupt with rq->lock held and things went
    > > downhill from there.
    > >
    > > Can you add this please, see if it triggers?

    >
    > there's fixes pending in this area. The main fix would be the one below.
    >
    > Ingo
    >
    > ---------------->
    > Subject: idle (arch, acpi and apm) and lockdep
    > From: Peter Zijlstra
    > Date: Fri, 25 Apr 2008 17:39:01 +0200


    Oh good, thanks, I see you've just asked Linus to pull that idle fix.

    Thanks to Peter for the fix, and to Justin for calling attention to it:
    this freeze has been plaguing me on one x86_32 box since I put -git8
    on, and I've just verified that it's the same as what's been plaguing
    me on that machine since 2.6.25-mm1 came out.

    Andrew, here's a belated hotfix for 2.6.25-mm1, running nicely at last:

    Peter Zijlstra's "idle (arch, acpi and apm) and lockdep" fix to
    recent x86 freezes - adapted to 2.6.25-mm1 by omitting merges
    from process_32.c and process_64.c into process.c.

    Signed-off-by: Hugh Dickins
    ---

    arch/x86/kernel/apm_32.c | 3 +++
    arch/x86/kernel/process_32.c | 27 +++++++++++++++------------
    arch/x86/kernel/process_64.c | 8 +++-----
    drivers/acpi/processor_idle.c | 19 +++++++++----------
    include/asm-x86/processor.h | 1 +
    5 files changed, 31 insertions(+), 27 deletions(-)

    --- 2.6.25-mm1/arch/x86/kernel/apm_32.c 2008-04-18 12:18:09.000000000 +0100
    +++ linux/arch/x86/kernel/apm_32.c 2008-04-26 22:17:06.000000000 +0100
    @@ -904,6 +904,7 @@ recalc:
    original_pm_idle();
    else
    default_idle();
    + local_irq_disable();
    jiffies_since_last_check = jiffies - last_jiffies;
    if (jiffies_since_last_check > idle_period)
    goto recalc;
    @@ -911,6 +912,8 @@ recalc:

    if (apm_idle_done)
    apm_do_busy();
    +
    + local_irq_enable();
    }

    /**
    --- 2.6.25-mm1/arch/x86/kernel/process_32.c 2008-04-18 12:18:09.000000000 +0100
    +++ linux/arch/x86/kernel/process_32.c 2008-04-26 22:29:36.000000000 +0100
    @@ -111,12 +111,10 @@ void default_idle(void)
    */
    smp_mb();

    - local_irq_disable();
    - if (!need_resched()) {
    + if (!need_resched())
    safe_halt(); /* enables interrupts racelessly */
    - local_irq_disable();
    - }
    - local_irq_enable();
    + else
    + local_irq_enable();
    current_thread_info()->status |= TS_POLLING;
    } else {
    local_irq_enable();
    @@ -196,6 +194,7 @@ void cpu_idle(void)
    if (cpu_is_offline(cpu))
    play_dead();

    + local_irq_disable();
    __get_cpu_var(irq_stat).idle_timestamp = jiffies;
    /* Don't trace irqs off for idle */
    stop_critical_timings();
    @@ -245,18 +244,22 @@ void mwait_idle_with_hints(unsigned long
    __monitor((void *)&current_thread_info()->flags, 0, 0);
    smp_mb();
    if (!need_resched())
    - __sti_mwait(ax, cx);
    - else
    - local_irq_enable();
    - } else
    - local_irq_enable();
    + __mwait(ax, cx);
    + }
    }

    /* Default MONITOR/MWAIT with no hints, used for default C1 state */
    static void mwait_idle(void)
    {
    - local_irq_enable();
    - mwait_idle_with_hints(0, 0);
    + if (!need_resched()) {
    + __monitor((void *)&current_thread_info()->flags, 0, 0);
    + smp_mb();
    + if (!need_resched())
    + __sti_mwait(0, 0);
    + else
    + local_irq_enable();
    + } else
    + local_irq_enable();
    }

    static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
    --- 2.6.25-mm1/arch/x86/kernel/process_64.c 2008-04-18 12:18:09.000000000 +0100
    +++ linux/arch/x86/kernel/process_64.c 2008-04-26 22:19:29.000000000 +0100
    @@ -106,12 +106,10 @@ void default_idle(void)
    * test NEED_RESCHED:
    */
    smp_mb();
    - local_irq_disable();
    - if (!need_resched()) {
    + if (!need_resched())
    safe_halt(); /* enables interrupts racelessly */
    - local_irq_disable();
    - }
    - local_irq_enable();
    + else
    + local_irq_enable();
    current_thread_info()->status |= TS_POLLING;
    }

    --- 2.6.25-mm1/drivers/acpi/processor_idle.c 2008-04-18 12:18:10.000000000 +0100
    +++ linux/drivers/acpi/processor_idle.c 2008-04-26 22:17:06.000000000 +0100
    @@ -436,13 +436,12 @@ static void acpi_processor_idle(void)

    cx = pr->power.state;
    if (!cx || acpi_idle_suspend) {
    - if (pm_idle_save)
    - pm_idle_save();
    - else
    + if (pm_idle_save) {
    + pm_idle_save(); /* enables IRQs */
    + } else {
    acpi_safe_halt();
    -
    - if (irqs_disabled())
    local_irq_enable();
    + }

    return;
    }
    @@ -538,10 +537,12 @@ static void acpi_processor_idle(void)
    * Use the appropriate idle routine, the one that would
    * be used without acpi C-states.
    */
    - if (pm_idle_save)
    - pm_idle_save();
    - else
    + if (pm_idle_save) {
    + pm_idle_save(); /* enables IRQs */
    + } else {
    acpi_safe_halt();
    + local_irq_enable();
    + }

    /*
    * TBD: Can't get time duration while in C1, as resumes
    @@ -552,8 +553,6 @@ static void acpi_processor_idle(void)
    * skew otherwise.
    */
    sleep_ticks = 0xFFFFFFFF;
    - if (irqs_disabled())
    - local_irq_enable();

    break;

    --- 2.6.25-mm1/include/asm-x86/processor.h 2008-04-18 12:18:34.000000000 +0100
    +++ linux/include/asm-x86/processor.h 2008-04-26 22:17:06.000000000 +0100
    @@ -726,6 +726,7 @@ static inline void __mwait(unsigned long

    static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
    {
    + trace_hardirqs_on();
    /* "mwait %eax, %ecx;" */
    asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
    :: "a" (eax), "c" (ecx));
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: spinlock lockup on CPU#0

    On Sat, Apr 26, 2008 at 09:48:55PM +0000, Justin Mattock wrote:
    > On Sat, Apr 26, 2008 at 9:06 PM, Bob Copeland wrote:
    > > On Sat, Apr 26, 2008 at 3:14 PM, Ingo Molnar wrote:
    > > > > Can you add this please, see if it triggers?
    > > >
    > > > there's fixes pending in this area. The main fix would be the one below.
    > > >
    > > > Ingo
    > > >
    > > > ---------------->
    > > > Subject: idle (arch, acpi and apm) and lockdep

    > >
    > > FWIW, I was seeing the same lockdep trace with eventual hangs, and
    > > this patch (applied with some fuzz) fixed the problem.
    > >
    > > --
    > > Bob Copeland %% www.bobcopeland.com
    > >

    >
    > Just out of curiosity I put the kernel back to it's original state,
    > were the freezing occurs, then booted with nohz=off, then added
    > WARN_ON(!irqs_disabled()); to sched.c only to the kernel, no other
    > patches, upon rebooting
    > I received different results: The screen from what I could tell was
    > spitting out the spinlock messages, but instead of printing that out,
    > and going on to the next task it just keep't printing, from what I
    > could tell something with ehci, uhci, agpgart, ieee1394 etc... too
    > fast to really make anything out, the numbers on the left side keept
    > moving upward, the fans started hauling ass, I waitied a few minuetes
    > hopeing this would stop
    > so I can grab dmesg, but it would'nt. is there a way to use the boot
    > param to write date to a file? so I could capture this event.
    > regards
    >


    OK. Hunted this bug down to
    commit 3b22ec7b13cb31e0d87fbc0aabe14caaaad309e8

    which for some reason enables interrupt in mwait_idle_with_hints(), which
    eventually causes interrupts to be enabled in acpi idle call, resulting in
    sched_clock_idle_wakeup_event() with interrupts enabled. This bug
    was only in x86 32 bit version.

    Peter's patch below which is already in git fixes this. So we don't need any
    additional fixes here...

    Thanks,
    Venki

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: spinlock lockup on CPU#0

    On Mon, Apr 28, 2008 at 8:38 PM, Venki Pallipadi
    wrote:
    > On Sat, Apr 26, 2008 at 09:48:55PM +0000, Justin Mattock wrote:
    > > On Sat, Apr 26, 2008 at 9:06 PM, Bob Copeland wrote:
    > > > On Sat, Apr 26, 2008 at 3:14 PM, Ingo Molnar wrote:
    > > > > > Can you add this please, see if it triggers?
    > > > >
    > > > > there's fixes pending in this area. The main fix would be the one below.
    > > > >
    > > > > Ingo
    > > > >
    > > > > ---------------->
    > > > > Subject: idle (arch, acpi and apm) and lockdep
    > > >
    > > > FWIW, I was seeing the same lockdep trace with eventual hangs, and
    > > > this patch (applied with some fuzz) fixed the problem.
    > > >
    > > > --
    > > > Bob Copeland %% www.bobcopeland.com
    > > >

    > >
    > > Just out of curiosity I put the kernel back to it's original state,
    > > were the freezing occurs, then booted with nohz=off, then added
    > > WARN_ON(!irqs_disabled()); to sched.c only to the kernel, no other
    > > patches, upon rebooting
    > > I received different results: The screen from what I could tell was
    > > spitting out the spinlock messages, but instead of printing that out,
    > > and going on to the next task it just keep't printing, from what I
    > > could tell something with ehci, uhci, agpgart, ieee1394 etc... too
    > > fast to really make anything out, the numbers on the left side keept
    > > moving upward, the fans started hauling ass, I waitied a few minuetes
    > > hopeing this would stop
    > > so I can grab dmesg, but it would'nt. is there a way to use the boot
    > > param to write date to a file? so I could capture this event.
    > > regards
    > >

    >
    > OK. Hunted this bug down to
    > commit 3b22ec7b13cb31e0d87fbc0aabe14caaaad309e8
    >
    > which for some reason enables interrupt in mwait_idle_with_hints(), which
    > eventually causes interrupts to be enabled in acpi idle call, resulting in
    > sched_clock_idle_wakeup_event() with interrupts enabled. This bug
    > was only in x86 32 bit version.
    >
    > Peter's patch below which is already in git fixes this. So we don't need any
    > additional fixes here...
    >
    > Thanks,
    > Venki
    >
    >


    Alright, I was concerned about having additional fixes.
    regards;

    --
    Justin P. Mattock
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread