[PATCH] sched: do not stop ticks when cpu is not idle - Kernel

This is a discussion on [PATCH] sched: do not stop ticks when cpu is not idle - Kernel ; Issue: the sched tick would be stopped in some race conditions. One of issues caused by that is: Since there is no timer ticks any more from then, the jiffies update will be up to other interrupt to happen. The ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: [PATCH] sched: do not stop ticks when cpu is not idle

  1. [PATCH] sched: do not stop ticks when cpu is not idle

    Issue: the sched tick would be stopped in some race conditions.

    One of issues caused by that is:

    Since there is no timer ticks any more from then, the jiffies update will be
    up to other interrupt to happen. The jiffies will not be updated for a long
    time, until next interrupt happens. That will cause APIs like
    wait_for_completion_timeout(&complete, timeout) to return timeout by mistake,
    since it is using a old jiffies as start time.

    Please see comments (1)~(6) inline for how the ticks are stopped
    by mistake when cpu is not idle:

    void cpu_idle(void)
    {
    ....
    while (1) {
    void (*idle)(void) = pm_idle;
    if (!idle)
    idle = default_idle;
    leds_event(led_idle_start);
    tick_nohz_stop_sched_tick();
    while (!need_resched())
    idle();
    leds_event(led_idle_end);
    tick_nohz_restart_sched_tick();
    (1) ticks are retarted before switch to other tasks
    preempt_enable_no_resched();
    schedule();
    preempt_disable();
    }
    }

    asmlinkage void __sched schedule(void)
    {
    ...
    ...
    need_resched:
    (6) the idle task will be scheduled out again and switch to next task,
    with ticks stopped in (5). So the next task will be running with tick stopped.
    preempt_disable();
    cpu = smp_processor_id();
    rq = cpu_rq(cpu);
    rcu_qsctr_inc(cpu);
    prev = rq->curr;
    switch_count = &prev->nivcsw;

    release_kernel_lock(prev);
    need_resched_nonpreemptible:

    schedule_debug(prev);

    hrtick_clear(rq);

    /*
    * Do the rq-clock update outside the rq lock:
    */
    local_irq_disable();
    __update_rq_clock(rq);
    spin_lock(&rq->lock);
    clear_tsk_need_resched(prev); (2) resched flag is clear from idle task

    ....

    context_switch(rq, prev, next); /* unlocks the rq */
    (3) IRQ will be enabled at end of context_swtich( ).
    ...
    preempt_enable_no_resched();
    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
    (4) the idle task is scheduled back. If an interrupt happen here,
    The irq_exit( ) will be called at end of the irq handler.
    goto need_resched;
    }

    void irq_exit(void)
    {
    ....
    /* Make sure that timer wheel updates are propagated */
    if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
    tick_nohz_stop_sched_tick();
    (5) The ticks will be stopped again since current
    task is idle task and its resched flag is clear in (2).
    rcu_irq_exit();
    preempt_enable_no_resched();
    }

    Signed-off-by: Jack Ren
    ---
    kernel/sched.c | 3 ++-
    1 files changed, 2 insertions(+), 1 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index ff0a7e2..fd17d74 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -4027,7 +4027,8 @@ need_resched_nonpreemptible:
    rq->nr_switches++;
    rq->curr = next;
    ++*switch_count;
    -
    + if (rq->curr != rq->idle)
    + tick_nohz_restart_sched_tick();
    context_switch(rq, prev, next); /* unlocks the rq */
    /*
    * the context switch might have flipped the stack from under
    --
    1.5.4
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] sched: do not stop ticks when cpu is not idle


    * eric miao wrote:

    > Issue: the sched tick would be stopped in some race conditions.


    > --- a/kernel/sched.c
    > +++ b/kernel/sched.c
    > @@ -4027,7 +4027,8 @@ need_resched_nonpreemptible:
    > rq->nr_switches++;
    > rq->curr = next;
    > ++*switch_count;
    > -
    > + if (rq->curr != rq->idle)
    > + tick_nohz_restart_sched_tick();
    > context_switch(rq, prev, next); /* unlocks the rq */


    applied to tip/sched/urgent, thanks Eric.

    Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)

    It looks a bit ugly to me in the middle of schedule() - is there no wait
    to solve this within kernel/time/*.c ?

    Ingo

    -------------->
    commit ca1b5a8a9abb3db57562a838f41cdba842f13fe8
    Author: eric miao
    Date: Fri Jul 18 14:41:29 2008 +0800

    sched: do not stop ticks when cpu is not idle

    Issue: the sched tick would be stopped in some race conditions.

    One of issues caused by that is:

    Since there is no timer ticks any more from then, the jiffies update will be
    up to other interrupt to happen. The jiffies will not be updated for a long
    time, until next interrupt happens. That will cause APIs like
    wait_for_completion_timeout(&complete, timeout) to return timeout by mistake,
    since it is using a old jiffies as start time.

    Please see comments (1)~(6) inline for how the ticks are stopped
    by mistake when cpu is not idle:

    void cpu_idle(void)
    {
    ...
    while (1) {
    void (*idle)(void) = pm_idle;
    if (!idle)
    idle = default_idle;
    leds_event(led_idle_start);
    tick_nohz_stop_sched_tick();
    while (!need_resched())
    idle();
    leds_event(led_idle_end);
    tick_nohz_restart_sched_tick();
    (1) ticks are retarted before switch to other tasks
    preempt_enable_no_resched();
    schedule();
    preempt_disable();
    }
    }

    asmlinkage void __sched schedule(void)
    {
    ...
    ...
    need_resched:
    (6) the idle task will be scheduled out again and switch to next task,
    with ticks stopped in (5). So the next task will be running with tick stopped.
    preempt_disable();
    cpu = smp_processor_id();
    rq = cpu_rq(cpu);
    rcu_qsctr_inc(cpu);
    prev = rq->curr;
    switch_count = &prev->nivcsw;

    release_kernel_lock(prev);
    need_resched_nonpreemptible:

    schedule_debug(prev);

    hrtick_clear(rq);

    /*
    * Do the rq-clock update outside the rq lock:
    */
    local_irq_disable();
    __update_rq_clock(rq);
    spin_lock(&rq->lock);
    clear_tsk_need_resched(prev); (2) resched flag is clear from idle task

    ....

    context_switch(rq, prev, next); /* unlocks the rq */
    (3) IRQ will be enabled at end of context_swtich( ).
    ...
    preempt_enable_no_resched();
    if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
    (4) the idle task is scheduled back. If an interrupt happen here,
    The irq_exit( ) will be called at end of the irq handler.
    goto need_resched;
    }

    void irq_exit(void)
    {
    ...
    /* Make sure that timer wheel updates are propagated */
    if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
    tick_nohz_stop_sched_tick();
    (5) The ticks will be stopped again since current
    task is idle task and its resched flag is clear in (2).
    rcu_irq_exit();
    preempt_enable_no_resched();
    }

    Signed-off-by: Jack Ren
    Signed-off-by: Ingo Molnar
    ---
    kernel/sched.c | 3 ++-
    1 files changed, 2 insertions(+), 1 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index 1ee18db..e0e0162 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -4446,7 +4446,8 @@ need_resched_nonpreemptible:
    rq->nr_switches++;
    rq->curr = next;
    ++*switch_count;
    -
    + if (rq->curr != rq->idle)
    + tick_nohz_restart_sched_tick();
    context_switch(rq, prev, next); /* unlocks the rq */
    /*
    * the context switch might have flipped the stack from under
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] sched: do not stop ticks when cpu is not idle


    * Ingo Molnar wrote:

    > --- a/kernel/sched.c
    > +++ b/kernel/sched.c
    > @@ -4446,7 +4446,8 @@ need_resched_nonpreemptible:
    > rq->nr_switches++;
    > rq->curr = next;
    > ++*switch_count;
    > -
    > + if (rq->curr != rq->idle)
    > + tick_nohz_restart_sched_tick();
    > context_switch(rq, prev, next); /* unlocks the rq */


    hm, one problem i can see is lock dependencies. This code is executed
    with the rq lock while tick_nohz_restart_sched_tick() takes hr locks =>
    not good. So i havent applied this just yet - this needs to be solved
    differently.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Fri, 2008-07-18 at 12:54 +0200, Ingo Molnar wrote:
    > * Ingo Molnar wrote:
    >
    > > --- a/kernel/sched.c
    > > +++ b/kernel/sched.c
    > > @@ -4446,7 +4446,8 @@ need_resched_nonpreemptible:
    > > rq->nr_switches++;
    > > rq->curr = next;
    > > ++*switch_count;
    > > -
    > > + if (rq->curr != rq->idle)
    > > + tick_nohz_restart_sched_tick();
    > > context_switch(rq, prev, next); /* unlocks the rq */

    >
    > hm, one problem i can see is lock dependencies. This code is executed
    > with the rq lock while tick_nohz_restart_sched_tick() takes hr locks =>
    > not good. So i havent applied this just yet - this needs to be solved
    > differently.


    Actually, that should work these days...

    Also, I assume Eric actually tested this with lockdep enabled (right,
    Eric?) and that'll shout - or rather, lockup hard in this case - if you
    got it wrong.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    2008/7/18 eric miao :
    > Issue: the sched tick would be stopped in some race conditions.
    >
    > One of issues caused by that is:
    >
    > Since there is no timer ticks any more from then, the jiffies update will be
    > up to other interrupt to happen. The jiffies will not be updated for a long
    > time, until next interrupt happens. That will cause APIs like
    > wait_for_completion_timeout(&complete, timeout) to return timeout by mistake,
    > since it is using a old jiffies as start time.
    >
    > Please see comments (1)~(6) inline for how the ticks are stopped
    > by mistake when cpu is not idle:
    >
    > void cpu_idle(void)
    > {
    > ...
    > while (1) {
    > void (*idle)(void) = pm_idle;
    > if (!idle)
    > idle = default_idle;
    > leds_event(led_idle_start);
    > tick_nohz_stop_sched_tick();
    > while (!need_resched())
    > idle();
    > leds_event(led_idle_end);
    > tick_nohz_restart_sched_tick();
    > (1) ticks are retarted before switch to other tasks
    > preempt_enable_no_resched();
    > schedule();
    > preempt_disable();
    > }
    > }
    >
    > asmlinkage void __sched schedule(void)
    > {
    > ...
    > ...
    > need_resched:
    > (6) the idle task will be scheduled out again and switch to next task,
    > with ticks stopped in (5). So the next task will be running with tick stopped.
    > preempt_disable();
    > cpu = smp_processor_id();
    > rq = cpu_rq(cpu);
    > rcu_qsctr_inc(cpu);
    > prev = rq->curr;
    > switch_count = &prev->nivcsw;
    >
    > release_kernel_lock(prev);
    > need_resched_nonpreemptible:
    >
    > schedule_debug(prev);
    >
    > hrtick_clear(rq);
    >
    > /*
    > * Do the rq-clock update outside the rq lock:
    > */
    > local_irq_disable();
    > __update_rq_clock(rq);
    > spin_lock(&rq->lock);
    > clear_tsk_need_resched(prev); (2) resched flag is clear from idle task
    >
    > ....
    >
    > context_switch(rq, prev, next); /* unlocks the rq */
    > (3) IRQ will be enabled at end of context_swtich( ).
    > ...
    > preempt_enable_no_resched();
    > if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
    > (4) the idle task is scheduled back. If an interrupt happen here,
    > The irq_exit( ) will be called at end of the irq handler.
    > goto need_resched;


    (I've taken just a quick look so far, that's maybe why I'm a bit confused)

    So what did set TIF_NEED_RESCHED flag here in (4)?

    - at first, it was cleared in (2) - ok.
    - An interrupt happens somewhere after context_switch() [ btw., what's
    about archs that do ctx-switches with interrupts enabled... ]

    irq_exit() calls tick_nohz_stop_sched_tick() _only_ when
    !need_resched(), meaning TIF_NEED_RESCHED is _not_ set for rq->idle
    (no new tasks were activated)
    ..

    So do we have 2 interruts in a raw?

    my (likely wrong) interpretation:

    (1) schedule() some task - (switch to) -> idle

    idle becomes active but is still running in schedule()

    (2) an interrupt happens at (3), then irq_exit() calls
    tick_nohz_stop_sched_tick()

    so far, idle should still run -> this interrupt didn't lead to new
    tasks having been activated

    (3) another interrupt happens which actually wakes up a task;

    TIF_NEED_RESCHED is set

    (4) this fact is detected at (4) and --> goto need_resched() to pick
    up a new task.

    (5) we kind of have idle -> new task reschedule but cpu_idle() never
    happened to be called _so_ sched-ticks were not resterted...

    is it like this or I'm missing something?


    > }
    >
    > void irq_exit(void)
    > {
    > ...
    > /* Make sure that timer wheel updates are propagated */
    > if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
    > tick_nohz_stop_sched_tick();
    > (5) The ticks will be stopped again since current
    > task is idle task and its resched flag is clear in (2).
    > rcu_irq_exit();
    > preempt_enable_no_resched();
    > }
    >
    > Signed-off-by: Jack Ren
    > ---
    > kernel/sched.c | 3 ++-
    > 1 files changed, 2 insertions(+), 1 deletions(-)
    >
    > diff --git a/kernel/sched.c b/kernel/sched.c
    > index ff0a7e2..fd17d74 100644
    > --- a/kernel/sched.c
    > +++ b/kernel/sched.c
    > @@ -4027,7 +4027,8 @@ need_resched_nonpreemptible:
    > rq->nr_switches++;
    > rq->curr = next;
    > ++*switch_count;
    > -
    > + if (rq->curr != rq->idle)
    > + tick_nohz_restart_sched_tick();
    > context_switch(rq, prev, next); /* unlocks the rq */
    > /*
    > * the context switch might have flipped the stack from under
    > --
    > 1.5.4
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >




    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Fri, 18 Jul 2008, Ingo Molnar wrote:
    >
    > * eric miao wrote:
    >
    > > Issue: the sched tick would be stopped in some race conditions.

    >
    > > --- a/kernel/sched.c
    > > +++ b/kernel/sched.c
    > > @@ -4027,7 +4027,8 @@ need_resched_nonpreemptible:
    > > rq->nr_switches++;
    > > rq->curr = next;
    > > ++*switch_count;
    > > -
    > > + if (rq->curr != rq->idle)
    > > + tick_nohz_restart_sched_tick();
    > > context_switch(rq, prev, next); /* unlocks the rq */

    >
    > applied to tip/sched/urgent, thanks Eric.
    >
    > Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)


    Yes. I did not understand the issue when Jack pointed it out to me,
    but with Erics explanation it's really clear. Thanks for tracking that
    down.

    > It looks a bit ugly to me in the middle of schedule() - is there no wait
    > to solve this within kernel/time/*.c ?


    Hmm, yes. I think the proper fix is to enable the tick stop mechanism
    in the idle loop and disable it before we go to schedule. That takes
    an additional parameter to tick_nohz_stop_sched_tick(), but we then
    gain a very clear section where the nohz mimic can be active.

    I'll whip up a patch.

    Thanks,

    tglx
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Fri, 18 Jul 2008, eric miao wrote:
    > On Fri, Jul 18, 2008 at 9:52 PM, Thomas Gleixner wrote:
    > >> Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)

    > >
    > > Yes. I did not understand the issue when Jack pointed it out to me,
    > > but with Erics explanation it's really clear. Thanks for tracking that
    > > down.

    >
    > Actually, Jack did most of the analysis and came up with this quick
    > fix.
    >
    > >
    > >> It looks a bit ugly to me in the middle of schedule() - is there no wait
    > >> to solve this within kernel/time/*.c ?

    > >
    > > Hmm, yes. I think the proper fix is to enable the tick stop mechanism
    > > in the idle loop and disable it before we go to schedule. That takes
    > > an additional parameter to tick_nohz_stop_sched_tick(), but we then
    > > gain a very clear section where the nohz mimic can be active.
    > >
    > > I'll whip up a patch.

    >
    > Sounds great, thanks.


    Hey, thanks for tracking that down. I was banging my head against the
    wall when I understood the problem.

    I tried to pinpoint the occasional softlockup bug reports, but I
    probably stared too long into that code so I just saw what I expected
    to see.

    Can you give the patch below a try please ?

    Thanks,

    tglx

    ------------------->
    Subject: nohz: prevent tick stop outside of the idle loop
    From: Thomas Gleixner

    Jack Ran and Eric Miao tracked down the following long standing
    problem in the NOHZ code:

    scheduler switch to idle task
    enable interrupts

    Window starts here

    ----> interrupt happens (does not set NEED_RESCHED)
    irq_exit() stops the tick

    ----> interrupt happens (does set NEED_RESCHED)

    return from schedule()

    cpu_idle(): preempt_disable();

    Window ends here

    The interrupts can happen at any point inside the race window. The
    first interrupt stops the tick, the second one causes the scheduler to
    rerun and switch away from idle again and we end up with the tick
    disabled.

    The fact that it needs two interrupts where the first one does not set
    NEED_RESCHED and the second one does made the bug obscure and extremly
    hard to reproduce and analyse. Kudos to Jack and Eric.

    Solution: Limit the NOHZ functionality to the idle loop to make sure
    that we can not run into such a situation ever again.

    cpu_idle()
    {
    preempt_disable();

    while(1) {
    tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
    are in the idle loop

    while (!need_resched())
    halt();

    tick_nohz_restart_sched_tick(); <- disables NOHZ mode
    preempt_enable_no_resched();
    schedule();
    preempt_disable();
    }
    }

    In hindsight we should have done this forever, but ...

    /me grabs a large brown paperbag.

    Debugged-by: Jack Ren ,
    Debugged-by: eric miao
    Signed-off-by: Thomas Gleixner

    ---
    diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
    index 199b368..89bfded 100644
    --- a/arch/arm/kernel/process.c
    +++ b/arch/arm/kernel/process.c
    @@ -162,7 +162,7 @@ void cpu_idle(void)
    if (!idle)
    idle = default_idle;
    leds_event(led_idle_start);
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched())
    idle();
    leds_event(led_idle_end);
    diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
    index 6cf9df1..ff820a9 100644
    --- a/arch/avr32/kernel/process.c
    +++ b/arch/avr32/kernel/process.c
    @@ -31,7 +31,7 @@ void cpu_idle(void)
    {
    /* endless idle loop with no priority at all */
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched())
    cpu_idle_sleep();
    tick_nohz_restart_sched_tick();
    diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
    index 53c2cd2..77800dd 100644
    --- a/arch/blackfin/kernel/process.c
    +++ b/arch/blackfin/kernel/process.c
    @@ -105,7 +105,7 @@ void cpu_idle(void)
    #endif
    if (!idle)
    idle = default_idle;
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched())
    idle();
    tick_nohz_restart_sched_tick();
    diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
    index c06f5b5..b16facd 100644
    --- a/arch/mips/kernel/process.c
    +++ b/arch/mips/kernel/process.c
    @@ -53,7 +53,7 @@ void __noreturn cpu_idle(void)
    {
    /* endless idle loop with no priority at all */
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched()) {
    #ifdef CONFIG_SMTC_IDLE_HOOK_DEBUG
    extern void smtc_idle_loop_hook(void);
    diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
    index c3cf0e8..d308a9f 100644
    --- a/arch/powerpc/kernel/idle.c
    +++ b/arch/powerpc/kernel/idle.c
    @@ -60,7 +60,7 @@ void cpu_idle(void)

    set_thread_flag(TIF_POLLING_NRFLAG);
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched() && !cpu_should_die()) {
    ppc64_runlatch_off();

    diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
    index b721207..70b688c 100644
    --- a/arch/powerpc/platforms/iseries/setup.c
    +++ b/arch/powerpc/platforms/iseries/setup.c
    @@ -561,7 +561,7 @@ static void yield_shared_processor(void)
    static void iseries_shared_idle(void)
    {
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched() && !hvlpevent_is_pending()) {
    local_irq_disable();
    ppc64_runlatch_off();
    @@ -591,7 +591,7 @@ static void iseries_dedicated_idle(void)
    set_thread_flag(TIF_POLLING_NRFLAG);

    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    if (!need_resched()) {
    while (!need_resched()) {
    ppc64_runlatch_off();
    diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
    index b98e37a..921892c 100644
    --- a/arch/sh/kernel/process_32.c
    +++ b/arch/sh/kernel/process_32.c
    @@ -86,7 +86,7 @@ void cpu_idle(void)
    if (!idle)
    idle = default_idle;

    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched())
    idle();
    tick_nohz_restart_sched_tick();
    diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c
    index 2084f81..0798928 100644
    --- a/arch/sparc64/kernel/process.c
    +++ b/arch/sparc64/kernel/process.c
    @@ -97,7 +97,7 @@ void cpu_idle(void)
    set_thread_flag(TIF_POLLING_NRFLAG);

    while(1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);

    while (!need_resched() && !cpu_is_offline(cpu))
    sparc64_yield(cpu);
    diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
    index 83603cf..a1c6d07 100644
    --- a/arch/um/kernel/process.c
    +++ b/arch/um/kernel/process.c
    @@ -243,7 +243,7 @@ void default_idle(void)
    if (need_resched())
    schedule();

    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    nsecs = disable_timer();
    idle_sleep(nsecs);
    tick_nohz_restart_sched_tick();
    diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
    index 0c3927a..53bc653 100644
    --- a/arch/x86/kernel/process_32.c
    +++ b/arch/x86/kernel/process_32.c
    @@ -128,7 +128,7 @@ void cpu_idle(void)

    /* endless idle loop with no priority at all */
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched()) {

    check_pgt_cache();
    diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
    index a8e5362..9a10c18 100644
    --- a/arch/x86/kernel/process_64.c
    +++ b/arch/x86/kernel/process_64.c
    @@ -120,7 +120,7 @@ void cpu_idle(void)
    current_thread_info()->status |= TS_POLLING;
    /* endless idle loop with no priority at all */
    while (1) {
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(1);
    while (!need_resched()) {

    rmb();
    diff --git a/include/linux/tick.h b/include/linux/tick.h
    index a881c65..d3c0269 100644
    --- a/include/linux/tick.h
    +++ b/include/linux/tick.h
    @@ -49,6 +49,7 @@ struct tick_sched {
    unsigned long check_clocks;
    enum tick_nohz_mode nohz_mode;
    ktime_t idle_tick;
    + int inidle;
    int tick_stopped;
    unsigned long idle_jiffies;
    unsigned long idle_calls;
    @@ -105,14 +106,14 @@ static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
    #endif /* !CONFIG_GENERIC_CLOCKEVENTS */

    # ifdef CONFIG_NO_HZ
    -extern void tick_nohz_stop_sched_tick(void);
    +extern void tick_nohz_stop_sched_tick(int inidle);
    extern void tick_nohz_restart_sched_tick(void);
    extern void tick_nohz_update_jiffies(void);
    extern ktime_t tick_nohz_get_sleep_length(void);
    extern void tick_nohz_stop_idle(int cpu);
    extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
    # else
    -static inline void tick_nohz_stop_sched_tick(void) { }
    +static inline void tick_nohz_stop_sched_tick(int inidle) { }
    static inline void tick_nohz_restart_sched_tick(void) { }
    static inline void tick_nohz_update_jiffies(void) { }
    static inline ktime_t tick_nohz_get_sleep_length(void)
    diff --git a/kernel/softirq.c b/kernel/softirq.c
    index 81e2fe0..f6b03d5 100644
    --- a/kernel/softirq.c
    +++ b/kernel/softirq.c
    @@ -286,7 +286,7 @@ void irq_exit(void)
    #ifdef CONFIG_NO_HZ
    /* Make sure that timer wheel updates are propagated */
    if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
    - tick_nohz_stop_sched_tick();
    + tick_nohz_stop_sched_tick(0);
    rcu_irq_exit();
    #endif
    preempt_enable_no_resched();
    diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
    index beef7cc..a5c26d2 100644
    --- a/kernel/time/tick-sched.c
    +++ b/kernel/time/tick-sched.c
    @@ -195,7 +195,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
    * Called either from the idle loop or from irq_exit() when an idle period was
    * just interrupted by an interrupt which did not cause a reschedule.
    */
    -void tick_nohz_stop_sched_tick(void)
    +void tick_nohz_stop_sched_tick(int inidle)
    {
    unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
    struct tick_sched *ts;
    @@ -224,6 +224,11 @@ void tick_nohz_stop_sched_tick(void)
    if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
    goto end;

    + if (!inidle && !ts->inidle)
    + goto end;
    +
    + ts->inidle = 1;
    +
    if (need_resched())
    goto end;

    @@ -373,11 +378,14 @@ void tick_nohz_restart_sched_tick(void)
    local_irq_disable();
    tick_nohz_stop_idle(cpu);

    - if (!ts->tick_stopped) {
    + if (!ts->inidle || !ts->tick_stopped) {
    + ts->inidle = 0;
    local_irq_enable();
    return;
    }

    + ts->inidle = 0;
    +
    rcu_exit_nohz();

    /* Update jiffies first */
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Fri, Jul 18, 2008 at 05:27:28PM +0200, Thomas Gleixner wrote:
    > On Fri, 18 Jul 2008, eric miao wrote:
    > > On Fri, Jul 18, 2008 at 9:52 PM, Thomas Gleixner wrote:
    > > >> Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)
    > > >
    > > > Yes. I did not understand the issue when Jack pointed it out to me,
    > > > but with Erics explanation it's really clear. Thanks for tracking that
    > > > down.

    > >
    > > Actually, Jack did most of the analysis and came up with this quick
    > > fix.
    > >
    > > >
    > > >> It looks a bit ugly to me in the middle of schedule() - is there no wait
    > > >> to solve this within kernel/time/*.c ?
    > > >
    > > > Hmm, yes. I think the proper fix is to enable the tick stop mechanism
    > > > in the idle loop and disable it before we go to schedule. That takes
    > > > an additional parameter to tick_nohz_stop_sched_tick(), but we then
    > > > gain a very clear section where the nohz mimic can be active.
    > > >
    > > > I'll whip up a patch.

    > >
    > > Sounds great, thanks.

    >
    > Hey, thanks for tracking that down. I was banging my head against the
    > wall when I understood the problem.
    >
    > I tried to pinpoint the occasional softlockup bug reports, but I
    > probably stared too long into that code so I just saw what I expected
    > to see.
    >
    > Can you give the patch below a try please ?


    If this patch works, could you also add the s390 hunk before submitting?
    Thanks!
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] sched: do not stop ticks when cpu is not idle


    * Peter Zijlstra wrote:

    > On Fri, 2008-07-18 at 12:54 +0200, Ingo Molnar wrote:
    > > * Ingo Molnar wrote:
    > >
    > > > --- a/kernel/sched.c
    > > > +++ b/kernel/sched.c
    > > > @@ -4446,7 +4446,8 @@ need_resched_nonpreemptible:
    > > > rq->nr_switches++;
    > > > rq->curr = next;
    > > > ++*switch_count;
    > > > -
    > > > + if (rq->curr != rq->idle)
    > > > + tick_nohz_restart_sched_tick();
    > > > context_switch(rq, prev, next); /* unlocks the rq */

    > >
    > > hm, one problem i can see is lock dependencies. This code is executed
    > > with the rq lock while tick_nohz_restart_sched_tick() takes hr locks =>
    > > not good. So i havent applied this just yet - this needs to be solved
    > > differently.

    >
    > Actually, that should work these days...
    >
    > Also, I assume Eric actually tested this with lockdep enabled (right,
    > Eric?) and that'll shout - or rather, lockup hard in this case - if
    > you got it wrong.


    nope:

    [ 0.188011] =================================
    [ 0.188011] [ INFO: inconsistent lock state ]
    [ 0.188011] 2.6.26-tip-03835-g9d964b9-dirty #20198
    [ 0.188011] ---------------------------------
    [ 0.188011] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
    [ 0.188011] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
    [ 0.188011] (&rq->rq_lock_key){+...}, at: [] schedule+0x191/0x900
    [ 0.188011] {in-hardirq-W} state was registered at:
    [ 0.188011] [] 0xffffffffffffffff

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Fri, 18 Jul 2008, Heiko Carstens wrote:
    > On Fri, Jul 18, 2008 at 05:27:28PM +0200, Thomas Gleixner wrote:
    > > Can you give the patch below a try please ?

    >
    > If this patch works, could you also add the s390 hunk before submitting?


    Sure. It should have been there already

    Thanks,
    tglx

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    Hi Thomas,
    I applied your patch, and rerun the test on my environment, the
    issue went away. I also analyzed your patch and did not find any other
    race condition so far.

    cheers,
    Jack Ren

    On Fri, Jul 18, 2008 at 11:27 PM, Thomas Gleixner wrote:
    > On Fri, 18 Jul 2008, eric miao wrote:
    >> On Fri, Jul 18, 2008 at 9:52 PM, Thomas Gleixner wrote:
    >> >> Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)
    >> >
    >> > Yes. I did not understand the issue when Jack pointed it out to me,
    >> > but with Erics explanation it's really clear. Thanks for tracking that
    >> > down.

    >>
    >> Actually, Jack did most of the analysis and came up with this quick
    >> fix.
    >>
    >> >
    >> >> It looks a bit ugly to me in the middle of schedule() - is there no wait
    >> >> to solve this within kernel/time/*.c ?
    >> >
    >> > Hmm, yes. I think the proper fix is to enable the tick stop mechanism
    >> > in the idle loop and disable it before we go to schedule. That takes
    >> > an additional parameter to tick_nohz_stop_sched_tick(), but we then
    >> > gain a very clear section where the nohz mimic can be active.
    >> >
    >> > I'll whip up a patch.

    >>
    >> Sounds great, thanks.

    >
    > Hey, thanks for tracking that down. I was banging my head against the
    > wall when I understood the problem.
    >
    > I tried to pinpoint the occasional softlockup bug reports, but I
    > probably stared too long into that code so I just saw what I expected
    > to see.
    >
    > Can you give the patch below a try please ?
    >
    > Thanks,
    >
    > tglx
    >
    > ------------------->
    > Subject: nohz: prevent tick stop outside of the idle loop
    > From: Thomas Gleixner
    >
    > Jack Ran and Eric Miao tracked down the following long standing
    > problem in the NOHZ code:
    >
    > scheduler switch to idle task
    > enable interrupts
    >
    > Window starts here
    >
    > ----> interrupt happens (does not set NEED_RESCHED)
    > irq_exit() stops the tick
    >
    > ----> interrupt happens (does set NEED_RESCHED)
    >
    > return from schedule()
    >
    > cpu_idle(): preempt_disable();
    >
    > Window ends here
    >
    > The interrupts can happen at any point inside the race window. The
    > first interrupt stops the tick, the second one causes the scheduler to
    > rerun and switch away from idle again and we end up with the tick
    > disabled.
    >
    > The fact that it needs two interrupts where the first one does not set
    > NEED_RESCHED and the second one does made the bug obscure and extremly
    > hard to reproduce and analyse. Kudos to Jack and Eric.
    >
    > Solution: Limit the NOHZ functionality to the idle loop to make sure
    > that we can not run into such a situation ever again.
    >
    > cpu_idle()
    > {
    > preempt_disable();
    >
    > while(1) {
    > tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
    > are in the idle loop
    >
    > while (!need_resched())
    > halt();
    >
    > tick_nohz_restart_sched_tick(); <- disables NOHZ mode
    > preempt_enable_no_resched();
    > schedule();
    > preempt_disable();
    > }
    > }
    >
    > In hindsight we should have done this forever, but ...
    >
    > /me grabs a large brown paperbag.
    >
    > Debugged-by: Jack Ren ,
    > Debugged-by: eric miao
    > Signed-off-by: Thomas Gleixner
    >
    > ---
    > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
    > index 199b368..89bfded 100644
    > --- a/arch/arm/kernel/process.c
    > +++ b/arch/arm/kernel/process.c
    > @@ -162,7 +162,7 @@ void cpu_idle(void)
    > if (!idle)
    > idle = default_idle;
    > leds_event(led_idle_start);
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched())
    > idle();
    > leds_event(led_idle_end);
    > diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
    > index 6cf9df1..ff820a9 100644
    > --- a/arch/avr32/kernel/process.c
    > +++ b/arch/avr32/kernel/process.c
    > @@ -31,7 +31,7 @@ void cpu_idle(void)
    > {
    > /* endless idle loop with no priority at all */
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched())
    > cpu_idle_sleep();
    > tick_nohz_restart_sched_tick();
    > diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
    > index 53c2cd2..77800dd 100644
    > --- a/arch/blackfin/kernel/process.c
    > +++ b/arch/blackfin/kernel/process.c
    > @@ -105,7 +105,7 @@ void cpu_idle(void)
    > #endif
    > if (!idle)
    > idle = default_idle;
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched())
    > idle();
    > tick_nohz_restart_sched_tick();
    > diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
    > index c06f5b5..b16facd 100644
    > --- a/arch/mips/kernel/process.c
    > +++ b/arch/mips/kernel/process.c
    > @@ -53,7 +53,7 @@ void __noreturn cpu_idle(void)
    > {
    > /* endless idle loop with no priority at all */
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched()) {
    > #ifdef CONFIG_SMTC_IDLE_HOOK_DEBUG
    > extern void smtc_idle_loop_hook(void);
    > diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
    > index c3cf0e8..d308a9f 100644
    > --- a/arch/powerpc/kernel/idle.c
    > +++ b/arch/powerpc/kernel/idle.c
    > @@ -60,7 +60,7 @@ void cpu_idle(void)
    >
    > set_thread_flag(TIF_POLLING_NRFLAG);
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched() && !cpu_should_die()) {
    > ppc64_runlatch_off();
    >
    > diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
    > index b721207..70b688c 100644
    > --- a/arch/powerpc/platforms/iseries/setup.c
    > +++ b/arch/powerpc/platforms/iseries/setup.c
    > @@ -561,7 +561,7 @@ static void yield_shared_processor(void)
    > static void iseries_shared_idle(void)
    > {
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched() && !hvlpevent_is_pending()) {
    > local_irq_disable();
    > ppc64_runlatch_off();
    > @@ -591,7 +591,7 @@ static void iseries_dedicated_idle(void)
    > set_thread_flag(TIF_POLLING_NRFLAG);
    >
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > if (!need_resched()) {
    > while (!need_resched()) {
    > ppc64_runlatch_off();
    > diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
    > index b98e37a..921892c 100644
    > --- a/arch/sh/kernel/process_32.c
    > +++ b/arch/sh/kernel/process_32.c
    > @@ -86,7 +86,7 @@ void cpu_idle(void)
    > if (!idle)
    > idle = default_idle;
    >
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched())
    > idle();
    > tick_nohz_restart_sched_tick();
    > diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c
    > index 2084f81..0798928 100644
    > --- a/arch/sparc64/kernel/process.c
    > +++ b/arch/sparc64/kernel/process.c
    > @@ -97,7 +97,7 @@ void cpu_idle(void)
    > set_thread_flag(TIF_POLLING_NRFLAG);
    >
    > while(1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    >
    > while (!need_resched() && !cpu_is_offline(cpu))
    > sparc64_yield(cpu);
    > diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
    > index 83603cf..a1c6d07 100644
    > --- a/arch/um/kernel/process.c
    > +++ b/arch/um/kernel/process.c
    > @@ -243,7 +243,7 @@ void default_idle(void)
    > if (need_resched())
    > schedule();
    >
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > nsecs = disable_timer();
    > idle_sleep(nsecs);
    > tick_nohz_restart_sched_tick();
    > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
    > index 0c3927a..53bc653 100644
    > --- a/arch/x86/kernel/process_32.c
    > +++ b/arch/x86/kernel/process_32.c
    > @@ -128,7 +128,7 @@ void cpu_idle(void)
    >
    > /* endless idle loop with no priority at all */
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched()) {
    >
    > check_pgt_cache();
    > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
    > index a8e5362..9a10c18 100644
    > --- a/arch/x86/kernel/process_64.c
    > +++ b/arch/x86/kernel/process_64.c
    > @@ -120,7 +120,7 @@ void cpu_idle(void)
    > current_thread_info()->status |= TS_POLLING;
    > /* endless idle loop with no priority at all */
    > while (1) {
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(1);
    > while (!need_resched()) {
    >
    > rmb();
    > diff --git a/include/linux/tick.h b/include/linux/tick.h
    > index a881c65..d3c0269 100644
    > --- a/include/linux/tick.h
    > +++ b/include/linux/tick.h
    > @@ -49,6 +49,7 @@ struct tick_sched {
    > unsigned long check_clocks;
    > enum tick_nohz_mode nohz_mode;
    > ktime_t idle_tick;
    > + int inidle;
    > int tick_stopped;
    > unsigned long idle_jiffies;
    > unsigned long idle_calls;
    > @@ -105,14 +106,14 @@ static inline int tick_check_oneshot_change(int allow_nohz) { return 0; }
    > #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
    >
    > # ifdef CONFIG_NO_HZ
    > -extern void tick_nohz_stop_sched_tick(void);
    > +extern void tick_nohz_stop_sched_tick(int inidle);
    > extern void tick_nohz_restart_sched_tick(void);
    > extern void tick_nohz_update_jiffies(void);
    > extern ktime_t tick_nohz_get_sleep_length(void);
    > extern void tick_nohz_stop_idle(int cpu);
    > extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
    > # else
    > -static inline void tick_nohz_stop_sched_tick(void) { }
    > +static inline void tick_nohz_stop_sched_tick(int inidle) { }
    > static inline void tick_nohz_restart_sched_tick(void) { }
    > static inline void tick_nohz_update_jiffies(void) { }
    > static inline ktime_t tick_nohz_get_sleep_length(void)
    > diff --git a/kernel/softirq.c b/kernel/softirq.c
    > index 81e2fe0..f6b03d5 100644
    > --- a/kernel/softirq.c
    > +++ b/kernel/softirq.c
    > @@ -286,7 +286,7 @@ void irq_exit(void)
    > #ifdef CONFIG_NO_HZ
    > /* Make sure that timer wheel updates are propagated */
    > if (!in_interrupt() && idle_cpu(smp_processor_id()) && !need_resched())
    > - tick_nohz_stop_sched_tick();
    > + tick_nohz_stop_sched_tick(0);
    > rcu_irq_exit();
    > #endif
    > preempt_enable_no_resched();
    > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
    > index beef7cc..a5c26d2 100644
    > --- a/kernel/time/tick-sched.c
    > +++ b/kernel/time/tick-sched.c
    > @@ -195,7 +195,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
    > * Called either from the idle loop or from irq_exit() when an idle period was
    > * just interrupted by an interrupt which did not cause a reschedule.
    > */
    > -void tick_nohz_stop_sched_tick(void)
    > +void tick_nohz_stop_sched_tick(int inidle)
    > {
    > unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
    > struct tick_sched *ts;
    > @@ -224,6 +224,11 @@ void tick_nohz_stop_sched_tick(void)
    > if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
    > goto end;
    >
    > + if (!inidle && !ts->inidle)
    > + goto end;
    > +
    > + ts->inidle = 1;
    > +
    > if (need_resched())
    > goto end;
    >
    > @@ -373,11 +378,14 @@ void tick_nohz_restart_sched_tick(void)
    > local_irq_disable();
    > tick_nohz_stop_idle(cpu);
    >
    > - if (!ts->tick_stopped) {
    > + if (!ts->inidle || !ts->tick_stopped) {
    > + ts->inidle = 0;
    > local_irq_enable();
    > return;
    > }
    >
    > + ts->inidle = 0;
    > +
    > rcu_exit_nohz();
    >
    > /* Update jiffies first */
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    Thomas Gleixner writes:

    > On Fri, 18 Jul 2008, eric miao wrote:
    > > On Fri, Jul 18, 2008 at 9:52 PM, Thomas Gleixner wrote:
    > > >> Thomas, Peter, Dmitry, do you concur with the analysis? (commit below)
    > > >
    > > > Yes. I did not understand the issue when Jack pointed it out to me,
    > > > but with Erics explanation it's really clear. Thanks for tracking that
    > > > down.

    > >
    > > Actually, Jack did most of the analysis and came up with this quick
    > > fix.
    > >
    > > >
    > > >> It looks a bit ugly to me in the middle of schedule() - is there no wait
    > > >> to solve this within kernel/time/*.c ?
    > > >
    > > > Hmm, yes. I think the proper fix is to enable the tick stop mechanism
    > > > in the idle loop and disable it before we go to schedule. That takes
    > > > an additional parameter to tick_nohz_stop_sched_tick(), but we then
    > > > gain a very clear section where the nohz mimic can be active.
    > > >
    > > > I'll whip up a patch.

    > >
    > > Sounds great, thanks.

    >
    > Hey, thanks for tracking that down. I was banging my head against the
    > wall when I understood the problem.
    >
    > I tried to pinpoint the occasional softlockup bug reports, but I
    > probably stared too long into that code so I just saw what I expected
    > to see.
    >
    > Can you give the patch below a try please ?


    Hi Thomas,

    I've seen weird timer behavior on both i386 and x86_64 on SMP
    machines. By weird I mean:

    - time stops for a few hours, then resumes as if nothing happened;

    - time flows too fast or slow (4x faster to 2x slower depending on
    phase of the moon);

    - the last one I've seen (yesterday), was:
    sleep(1) sleeps for 1 second, but
    select(0, NULL, NULL, NULL, 0.5) sleeps for nine seconds.

    I have been trying to track this problem for a few weeks now, without
    success. Booting a CONFIG_NO_HZ-enabled kernel with "highres=off
    nohz=off" does not make a difference. However booting a kernel with
    CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS disabled seems to be working
    (I cannot garantee that since I've been using that for 48h so far, but
    sometimes the problem takes a few days to manifest itself).

    After a cursory reading of your patch, it looks to me that the race
    could happen on a kernel compiled with CONFIG_NO_HZ and
    CONFIG_HIGH_RES_TIMERS and booted with "nohz=off highres=off". Can
    you confirm that?

    If you need more details (dmesg, lspci, etc), I have posted some
    details on LKML ( http://lkml.org/lkml/2008/7/9/330 ) and I have a bug
    posted on the Fedora/RH bugzilla (
    https://bugzilla.redhat.com/show_bug.cgi?id=451824 ).

    Phil.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    On Mon, 21 Jul 2008, Philippe Troin wrote:
    > Thomas Gleixner writes:
    > I've seen weird timer behavior on both i386 and x86_64 on SMP
    > machines. By weird I mean:
    >
    > - time stops for a few hours, then resumes as if nothing happened;
    >
    > - time flows too fast or slow (4x faster to 2x slower depending on
    > phase of the moon);
    >
    > - the last one I've seen (yesterday), was:
    > sleep(1) sleeps for 1 second, but
    > select(0, NULL, NULL, NULL, 0.5) sleeps for nine seconds.
    >
    > I have been trying to track this problem for a few weeks now, without
    > success. Booting a CONFIG_NO_HZ-enabled kernel with "highres=off
    > nohz=off" does not make a difference. However booting a kernel with
    > CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS disabled seems to be working
    > (I cannot garantee that since I've been using that for 48h so far, but
    > sometimes the problem takes a few days to manifest itself).
    >
    > After a cursory reading of your patch, it looks to me that the race
    > could happen on a kernel compiled with CONFIG_NO_HZ and
    > CONFIG_HIGH_RES_TIMERS and booted with "nohz=off highres=off". Can
    > you confirm that?


    No, I can not confirm that. With nohz=off / highres=off that code path
    is not invoked.

    > If you need more details (dmesg, lspci, etc), I have posted some
    > details on LKML ( http://lkml.org/lkml/2008/7/9/330 ) and I have a bug
    > posted on the Fedora/RH bugzilla (
    > https://bugzilla.redhat.com/show_bug.cgi?id=451824 ).


    Will have a look.

    Question: which clocksource is active ?

    cat /sys/devices/system/clocksource/clocksource0/current_clocksource

    Thanks,

    tglx

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] sched: do not stop ticks when cpu is not idle

    Thomas Gleixner writes:

    > On Mon, 21 Jul 2008, Philippe Troin wrote:
    > > Thomas Gleixner writes:
    > > I've seen weird timer behavior on both i386 and x86_64 on SMP
    > > machines. By weird I mean:
    > >
    > > - time stops for a few hours, then resumes as if nothing happened;
    > >
    > > - time flows too fast or slow (4x faster to 2x slower depending on
    > > phase of the moon);
    > >
    > > - the last one I've seen (yesterday), was:
    > > sleep(1) sleeps for 1 second, but
    > > select(0, NULL, NULL, NULL, 0.5) sleeps for nine seconds.
    > >
    > > I have been trying to track this problem for a few weeks now, without
    > > success. Booting a CONFIG_NO_HZ-enabled kernel with "highres=off
    > > nohz=off" does not make a difference. However booting a kernel with
    > > CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS disabled seems to be working
    > > (I cannot garantee that since I've been using that for 48h so far, but
    > > sometimes the problem takes a few days to manifest itself).
    > >
    > > After a cursory reading of your patch, it looks to me that the race
    > > could happen on a kernel compiled with CONFIG_NO_HZ and
    > > CONFIG_HIGH_RES_TIMERS and booted with "nohz=off highres=off". Can
    > > you confirm that?

    >
    > No, I can not confirm that. With nohz=off / highres=off that code path
    > is not invoked.


    Darn. You're right, on a more detailed reading:

    With CONFIG_NO_HZ set, the tick_nohz_stop_sched_tick() function is
    defined (declared in tick.h and defined in tick-sched.c).

    There's nothing to prevent tick_nohz_stop_sched_tick() to be called
    from cpu_idle().

    However in tick_nohz_stop_sched_tick(), ts->nohz_mode ==
    NOHZ_MODE_INACTIVE is true and the function bails out early. And
    just before the section which was patched.

    > > If you need more details (dmesg, lspci, etc), I have posted some
    > > details on LKML ( http://lkml.org/lkml/2008/7/9/330 ) and I have a bug
    > > posted on the Fedora/RH bugzilla (
    > > https://bugzilla.redhat.com/show_bug.cgi?id=451824 ).

    >
    > Will have a look.
    >
    > Question: which clocksource is active ?
    >
    > cat /sys/devices/system/clocksource/clocksource0/current_clocksource


    As mentionned earlier I found two systems showing up the problem, a
    dual Pentium III system (i386) and a dual Opteron system running in
    64-bit (x86_64).

    On the i386:

    current_clocksource is jiffies

    On this one, the symptoms tend to be that the clock goes too fast or
    too slow, always by an integer multiple (seen 2x slower and 4x
    faster so far).

    Once on this system, while the clock was running 4x faster, changing
    current_clocksource to tsc (the only other available choice)
    reestablished the "normal flow of time" Back to jiffies, and the
    clock went back to 4x faster. I could switch back and forth.

    On the x86_64:

    current_clocksource is hpet

    On the dual Opteron system, the symptoms I've seen are that the
    system becomes unresponsive, with some "stuck" processes, and the
    time not changing for long periods of time (like a few hours).

    It's also on this sytem that I saw yesterday:

    sleep(1) takes 1 seconds.
    select(0, NULL, NULL, NULL, .5) takes 9 seconds.
    date was reporting a wall time flowing normally.

    A question I had was: when the system(s) gets wedged, what kind of
    debugging information could I gather on the live system before I
    reboot?

    Phil.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread