2.6.25-rc5-git6: Reported regressions from 2.6.24 - Kernel

This is a discussion on 2.6.25-rc5-git6: Reported regressions from 2.6.24 - Kernel ; Thomas Gleixner wrote: > On Fri, 21 Mar 2008, Thomas Gleixner wrote: >>> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0 >> Ok. So the watchdog trigger is a false positive. >> >> Thinking more about ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 40 of 40

Thread: 2.6.25-rc5-git6: Reported regressions from 2.6.24

  1. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Thomas Gleixner wrote:
    > On Fri, 21 Mar 2008, Thomas Gleixner wrote:
    >>> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0

    >> Ok. So the watchdog trigger is a false positive.
    >>
    >> Thinking more about it, it looks like Andi's change triggers some
    >> hidden bug in the combination of NO_HZ and add_timer_on(), where the
    >> CPU on which the timer is added is likely in a long idle sleep. I look
    >> into this tomorrow.

    >
    > Ok. Here is what's happening:
    >
    > CPU0 runs the watchdog timer and schedules it on CPU1.
    >
    > With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
    > boot process there is probably no timer pending on CPU1, which means
    > the idle sleep is infinite.
    >
    > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    > timer wheel. At this point the pm_timer which is the reference clock
    > has already wrapped around, so the watchdog thinks that there is a
    > huge time difference and marks the TSC unstable.
    >
    > Aside of that watchdog issue this also affects the other users of
    > add_timer_on(): e.g. queue_delayed_work_on().
    >
    > Can you please apply the patch below and verify it with Andi's
    > watchdog patch applied ?



    Did that , git head , Andi's + your patch but TSC is still marked unstable.

    >
    > Thanks,
    >
    > tglx
    >



    Gabriel
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    > CPU0 runs the watchdog timer and schedules it on CPU1.
    >
    > With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
    > boot process there is probably no timer pending on CPU1, which means
    > the idle sleep is infinite.
    >
    > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    > timer wheel. At this point the pm_timer which is the reference clock
    > has already wrapped around, so the watchdog thinks that there is a


    In my old original own noidletick code I simply limited all sleeps
    to below the wrap around of the primary timer. Wouldn't something
    like that work?

    In the case of the watchdog i guess it would need to be limited
    to the wrap around of multiple timers, at least all that
    are used by the watchdog.

    I'm not sure just doing this for add_timer_on() only is correct.
    After all it could affect any other code not run by add_timer_on()
    couldn't it?

    -Andi

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sat, 22 Mar 2008, Gabriel C wrote:
    > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    > > timer wheel. At this point the pm_timer which is the reference clock
    > > has already wrapped around, so the watchdog thinks that there is a
    > > huge time difference and marks the TSC unstable.
    > >
    > > Aside of that watchdog issue this also affects the other users of
    > > add_timer_on(): e.g. queue_delayed_work_on().
    > >
    > > Can you please apply the patch below and verify it with Andi's
    > > watchdog patch applied ?

    >
    >
    > Did that , git head , Andi's + your patch but TSC is still marked unstable.


    Doh, stupid me. We do not reevaluate the timer wheel, when we just
    wake up via the smp_reschedule IPI when the resched flag on the other
    CPU is not set. That's a separate vector which is not going through
    irq_enter() / irq_exit().

    Does the patch below solve the problem ?

    Thanks,

    tglx

    ---
    include/linux/tick.h | 4 +++
    kernel/time/tick-sched.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++
    kernel/timer.c | 14 ++++++++++++-
    3 files changed, 67 insertions(+), 1 deletion(-)

    Index: linux-2.6/include/linux/tick.h
    ================================================== =================
    --- linux-2.6.orig/include/linux/tick.h
    +++ linux-2.6/include/linux/tick.h
    @@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
    extern ktime_t tick_nohz_get_sleep_length(void);
    extern void tick_nohz_stop_idle(int cpu);
    extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
    +extern int tick_nohz_cpu_needs_wakeup(int cpu);
    +extern void tick_nohz_rescan_timers_on(int cpu);
    # else
    static inline void tick_nohz_stop_sched_tick(void) { }
    static inline void tick_nohz_restart_sched_tick(void) { }
    @@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
    }
    static inline void tick_nohz_stop_idle(int cpu) { }
    static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
    +static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
    +static inline void tick_nohz_rescan_timers_on(int cpu) { }
    # endif /* !NO_HZ */

    #endif
    Index: linux-2.6/kernel/time/tick-sched.c
    ================================================== =================
    --- linux-2.6.orig/kernel/time/tick-sched.c
    +++ linux-2.6/kernel/time/tick-sched.c
    @@ -183,6 +183,56 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
    }

    /**
    + * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
    + *
    + * when add_timer_on() happens on a CPU which is in a long idle sleep,
    + * then we need to wake it up so the timer wheel gets reevaluated.
    + *
    + * Note: we use idle_cpu() which checks the idle state lockless, but
    + * we are ordered against the other cpu which might be on the way to
    + * idle by the timer base lock, which we hold.
    + */
    +int tick_nohz_cpu_needs_wakeup(int cpu)
    +{
    + return tick_nohz_enabled && idle_cpu(cpu) &&
    + (cpu != smp_processor_id());
    +}
    +
    +/*
    + * Rescan the timer wheel, when
    + *
    + * - the CPU is idle
    + * - the CPU is not processing an interupt
    + * - the need_resched flag is off
    + */
    +static void tick_nohz_rescan_timers(void *unused)
    +{
    + int cpu = smp_processor_id();
    +
    + if (!idle_cpu(cpu) || in_interrupt() || need_resched())
    + return;
    +
    + tick_nohz_stop_idle(cpu);
    + tick_nohz_update_jiffies();
    + tick_nohz_stop_sched_tick();
    +}
    +
    +/**
    + * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
    + *
    + * When a CPU is idle and a timer got added to this CPU timer wheel
    + * via add_timer_on() then we need to make sure that the CPU
    + * reevaluates the timer wheel. Otherwise the timer might be delayed
    + * for a real long time.
    + */
    +void tick_nohz_rescan_timers_on(int cpu)
    +{
    + if (tick_nohz_enabled && idle_cpu(cpu))
    + smp_call_function_single(cpu, tick_nohz_rescan_timers, NULL,
    + 0, 0);
    +}
    +
    +/**
    * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
    *
    * When the next event is more than a tick into the future, stop the idle tick
    Index: linux-2.6/kernel/timer.c
    ================================================== =================
    --- linux-2.6.orig/kernel/timer.c
    +++ linux-2.6/kernel/timer.c
    @@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
    {
    struct tvec_base *base = per_cpu(tvec_bases, cpu);
    unsigned long flags;
    + int wakeidle;

    timer_stats_timer_set_start_info(timer);
    BUG_ON(timer_pending(timer) || !timer->function);
    spin_lock_irqsave(&base->lock, flags);
    timer_set_base(timer, base);
    internal_add_timer(base, timer);
    + /*
    + * Check whether the other CPU is idle and needs to be
    + * triggered to reevaluate the timer wheel when nohz is
    + * active. We are protected against the other CPU fiddling
    + * with the timer by holding the timer base lock. This also
    + * makes sure that a CPU on the way to idle can not evaluate
    + * the timer wheel.
    + */
    + wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
    spin_unlock_irqrestore(&base->lock, flags);
    -}

    + if (wakeidle)
    + tick_nohz_rescan_timers_on(cpu);
    +}

    /**
    * mod_timer - modify a timer's timeout
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sat, 22 Mar 2008, Andi Kleen wrote:
    > > CPU0 runs the watchdog timer and schedules it on CPU1.
    > >
    > > With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
    > > boot process there is probably no timer pending on CPU1, which means
    > > the idle sleep is infinite.
    > >
    > > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    > > timer wheel. At this point the pm_timer which is the reference clock
    > > has already wrapped around, so the watchdog thinks that there is a

    >
    > In my old original own noidletick code I simply limited all sleeps
    > to below the wrap around of the primary timer. Wouldn't something
    > like that work?


    No, it does not solve the real problem of not reevaluating the timer
    wheel on the idle CPU when a timer gets added from some other CPU. We
    would paper over the watchdog issue, but postponing a timer event,
    which was added cross CPU to some artifical expiry time is simply
    wrong.

    > I'm not sure just doing this for add_timer_on() only is correct.
    > After all it could affect any other code not run by add_timer_on()
    > couldn't it?


    No, it's limited to add_timer_on() simply because no other code can
    add a new timer (timer_list or hrtimer) which modifies the next event
    on another CPU. There is also the rare case, when one CPU runs the
    timer callback and the other one modifies the timer, but that's not
    relevant for the NOHZ problem because the CPU which runs the callback
    is not idle at this point.

    All other timer operations are CPU local and reevaluated before the
    CPU goes idle again.

    Thanks,

    tglx
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Thomas Gleixner wrote:
    > On Sat, 22 Mar 2008, Gabriel C wrote:
    > > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    >>> timer wheel. At this point the pm_timer which is the reference clock
    >>> has already wrapped around, so the watchdog thinks that there is a
    >>> huge time difference and marks the TSC unstable.
    >>>
    >>> Aside of that watchdog issue this also affects the other users of
    >>> add_timer_on(): e.g. queue_delayed_work_on().
    >>>
    >>> Can you please apply the patch below and verify it with Andi's
    >>> watchdog patch applied ?

    >>
    >> Did that , git head , Andi's + your patch but TSC is still marked unstable.

    >
    > Doh, stupid me. We do not reevaluate the timer wheel, when we just
    > wake up via the smp_reschedule IPI when the resched flag on the other
    > CPU is not set. That's a separate vector which is not going through
    > irq_enter() / irq_exit().
    >
    > Does the patch below solve the problem ?


    With this one TSC is fine but now I get a warning on boot :

    ...

    [ 0.041037] ------------[ cut here ]------------
    [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
    [ 0.041074] Modules linked in:
    [ 0.041087] Pid: 1, comm: swapper Not tainted 2.6.25-rc6-00243-g028011e-dirty #12
    [ 0.041107] [] warn_on_slowpath+0x40/0x65
    [ 0.041128] [] autoremove_wake_function+0xd/0x2d
    [ 0.041148] [] schedule_timeout+0x13/0x99
    [ 0.041167] [] __wake_up+0x29/0x39
    [ 0.041182] [] __wake_up+0x29/0x39
    [ 0.041197] [] call_usermodehelper_exec+0x97/0xa2
    [ 0.041214] [] native_smp_call_function_mask+0x23/0x11e
    [ 0.041233] [] kobject_uevent_env+0x346/0x368
    [ 0.041251] [] smp_call_function_single+0x50/0x6f
    [ 0.041268] [] tick_nohz_rescan_timers_on+0x27/0x2b
    [ 0.041287] [] clocksource_register+0x162/0x174
    [ 0.041306] [] kernel_init+0x126/0x25e
    [ 0.041322] [] schedule_tail+0x17/0x44
    [ 0.041337] [] ret_from_fork+0x6/0x1c
    [ 0.041353] [] kernel_init+0x0/0x25e
    [ 0.041367] [] kernel_init+0x0/0x25e
    [ 0.041381] [] kernel_thread_helper+0x7/0x10
    [ 0.041397] =======================
    [ 0.041417] ---[ end trace ca143223eefdc828 ]---

    ...

    Full dmesg there -> http://frugalware.org/~crazy/dmesg/dmesg_tsc


    >
    > Thanks,
    >
    > tglx
    >


    Gabriel
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sat, 22 Mar 2008, Gabriel C wrote:
    > Thomas Gleixner wrote:
    > > On Sat, 22 Mar 2008, Gabriel C wrote:
    > > > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
    > >>> timer wheel. At this point the pm_timer which is the reference clock
    > >>> has already wrapped around, so the watchdog thinks that there is a
    > >>> huge time difference and marks the TSC unstable.
    > >>>
    > >>> Aside of that watchdog issue this also affects the other users of
    > >>> add_timer_on(): e.g. queue_delayed_work_on().
    > >>>
    > >>> Can you please apply the patch below and verify it with Andi's
    > >>> watchdog patch applied ?
    > >>
    > >> Did that , git head , Andi's + your patch but TSC is still marked unstable.

    > >
    > > Doh, stupid me. We do not reevaluate the timer wheel, when we just
    > > wake up via the smp_reschedule IPI when the resched flag on the other
    > > CPU is not set. That's a separate vector which is not going through
    > > irq_enter() / irq_exit().
    > >
    > > Does the patch below solve the problem ?

    >
    > With this one TSC is fine but now I get a warning on boot :


    Good. It confirms my assumptions about the root cause.

    > [ 0.041037] ------------[ cut here ]------------
    > [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()


    Grr. I'll work out a solution for that one.

    Thanks,

    tglx
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sat, 22 Mar 2008, Thomas Gleixner wrote:
    > On Sat, 22 Mar 2008, Gabriel C wrote:
    > > With this one TSC is fine but now I get a warning on boot :

    >
    > Good. It confirms my assumptions about the root cause.
    >
    > > [ 0.041037] ------------[ cut here ]------------
    > > [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()

    >
    > Grr. I'll work out a solution for that one.


    Gabriel,

    I'm happy to rack your nerves some more.

    After discussing the issue with Peter and Ingo the following solution
    seems to be the one which is the least intrusive.

    Can you please give it a test ride ?

    Thanks,

    tglx
    ---
    include/linux/sched.h | 6 ++++++
    kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
    kernel/timer.c | 10 +++++++++-
    3 files changed, 57 insertions(+), 1 deletion(-)

    Index: linux-2.6/include/linux/sched.h
    ================================================== =================
    --- linux-2.6.orig/include/linux/sched.h
    +++ linux-2.6/include/linux/sched.h
    @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)

    extern void sched_idle_next(void);

    +#ifdef CONFIG_NO_HZ
    +extern void wake_up_idle_cpu(int cpu);
    +#else
    +static inline void wake_up_idle_cpu(int cpu) { }
    +#endif
    +
    #ifdef CONFIG_SCHED_DEBUG
    extern unsigned int sysctl_sched_latency;
    extern unsigned int sysctl_sched_min_granularity;
    Index: linux-2.6/kernel/sched.c
    ================================================== =================
    --- linux-2.6.orig/kernel/sched.c
    +++ linux-2.6/kernel/sched.c
    @@ -848,6 +848,48 @@ static inline void resched_task(struct t
    __resched_task(p, TIF_NEED_RESCHED);
    }

    +#ifdef CONFIG_NO_HZ
    +/*
    + * When add_timer_on() enqueues a timer into the timer wheel of an
    + * idle CPU then this timer might expire before the next timer event
    + * which is scheduled to wake up that CPU. In case of a completely
    + * idle system the next event might even be infinite time into the
    + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
    + * leaves the inner idle loop so the newle added timer is taken into
    + * account when the CPU goes back to idle and evaluates the timer
    + * wheel for the next timer event.
    + */
    +void wake_up_idle_cpu(int cpu)
    +{
    + struct rq *rq = cpu_rq(cpu);
    +
    + if (cpu == smp_processor_id())
    + return;
    +
    + /*
    + * This is safe, as this function is called with the timer
    + * wheel base lock of (cpu) held. When the CPU is on the way
    + * to idle and has not yet set rq->curr to idle then it will
    + * be serialized on the timer wheel base lock and take the new
    + * timer into account automatically.
    + */
    + if (rq->curr != rq->idle)
    + return;
    +
    + /*
    + * We can set TIF_RESCHED on the idle task of the other CPU
    + * lockless. The worst case is that the other CPU runs the
    + * idle task through an additional NOOP schedule()
    + */
    + set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
    +
    + /* NEED_RESCHED must be visible before we test polling */
    + smp_mb();
    + if (!tsk_is_polling(rq->idle))
    + smp_send_reschedule(cpu);
    +}
    +#endif
    +
    #ifdef CONFIG_SCHED_HRTICK
    /*
    * Use HR-timers to deliver accurate preemption points.
    Index: linux-2.6/kernel/timer.c
    ================================================== =================
    --- linux-2.6.orig/kernel/timer.c
    +++ linux-2.6/kernel/timer.c
    @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
    spin_lock_irqsave(&base->lock, flags);
    timer_set_base(timer, base);
    internal_add_timer(base, timer);
    + /*
    + * Check whether the other CPU is idle and needs to be
    + * triggered to reevaluate the timer wheel when nohz is
    + * active. We are protected against the other CPU fiddling
    + * with the timer by holding the timer base lock. This also
    + * makes sure that a CPU on the way to idle can not evaluate
    + * the timer wheel.
    + */
    + wake_up_idle_cpu(cpu);
    spin_unlock_irqrestore(&base->lock, flags);
    }

    -
    /**
    * mod_timer - modify a timer's timeout
    * @timer: the timer to be modified
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Thomas Gleixner wrote:
    > On Sat, 22 Mar 2008, Thomas Gleixner wrote:
    >> On Sat, 22 Mar 2008, Gabriel C wrote:
    >>> With this one TSC is fine but now I get a warning on boot :

    >> Good. It confirms my assumptions about the root cause.
    >>
    >>> [ 0.041037] ------------[ cut here ]------------
    >>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()

    >> Grr. I'll work out a solution for that one.

    >
    > Gabriel,
    >
    > I'm happy to rack your nerves some more.


    No worries

    >
    > After discussing the issue with Peter and Ingo the following solution
    > seems to be the one which is the least intrusive.
    >
    > Can you please give it a test ride ?


    Done , git head + Andi's patch + this version of your patch does work here.

    Also time-warp-test is just fine and everything else seems to work.


    > ---
    > include/linux/sched.h | 6 ++++++
    > kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
    > kernel/timer.c | 10 +++++++++-
    > 3 files changed, 57 insertions(+), 1 deletion(-)
    >
    > Index: linux-2.6/include/linux/sched.h
    > ================================================== =================
    > --- linux-2.6.orig/include/linux/sched.h
    > +++ linux-2.6/include/linux/sched.h
    > @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)
    >
    > extern void sched_idle_next(void);
    >
    > +#ifdef CONFIG_NO_HZ
    > +extern void wake_up_idle_cpu(int cpu);
    > +#else
    > +static inline void wake_up_idle_cpu(int cpu) { }
    > +#endif
    > +
    > #ifdef CONFIG_SCHED_DEBUG
    > extern unsigned int sysctl_sched_latency;
    > extern unsigned int sysctl_sched_min_granularity;
    > Index: linux-2.6/kernel/sched.c
    > ================================================== =================
    > --- linux-2.6.orig/kernel/sched.c
    > +++ linux-2.6/kernel/sched.c
    > @@ -848,6 +848,48 @@ static inline void resched_task(struct t
    > __resched_task(p, TIF_NEED_RESCHED);
    > }
    >
    > +#ifdef CONFIG_NO_HZ
    > +/*
    > + * When add_timer_on() enqueues a timer into the timer wheel of an
    > + * idle CPU then this timer might expire before the next timer event
    > + * which is scheduled to wake up that CPU. In case of a completely
    > + * idle system the next event might even be infinite time into the
    > + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
    > + * leaves the inner idle loop so the newle added timer is taken into
    > + * account when the CPU goes back to idle and evaluates the timer
    > + * wheel for the next timer event.
    > + */
    > +void wake_up_idle_cpu(int cpu)
    > +{
    > + struct rq *rq = cpu_rq(cpu);
    > +
    > + if (cpu == smp_processor_id())
    > + return;
    > +
    > + /*
    > + * This is safe, as this function is called with the timer
    > + * wheel base lock of (cpu) held. When the CPU is on the way
    > + * to idle and has not yet set rq->curr to idle then it will
    > + * be serialized on the timer wheel base lock and take the new
    > + * timer into account automatically.
    > + */
    > + if (rq->curr != rq->idle)
    > + return;
    > +
    > + /*
    > + * We can set TIF_RESCHED on the idle task of the other CPU
    > + * lockless. The worst case is that the other CPU runs the
    > + * idle task through an additional NOOP schedule()
    > + */
    > + set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
    > +
    > + /* NEED_RESCHED must be visible before we test polling */
    > + smp_mb();
    > + if (!tsk_is_polling(rq->idle))
    > + smp_send_reschedule(cpu);
    > +}
    > +#endif
    > +
    > #ifdef CONFIG_SCHED_HRTICK
    > /*
    > * Use HR-timers to deliver accurate preemption points.
    > Index: linux-2.6/kernel/timer.c
    > ================================================== =================
    > --- linux-2.6.orig/kernel/timer.c
    > +++ linux-2.6/kernel/timer.c
    > @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
    > spin_lock_irqsave(&base->lock, flags);
    > timer_set_base(timer, base);
    > internal_add_timer(base, timer);
    > + /*
    > + * Check whether the other CPU is idle and needs to be
    > + * triggered to reevaluate the timer wheel when nohz is
    > + * active. We are protected against the other CPU fiddling
    > + * with the timer by holding the timer base lock. This also
    > + * makes sure that a CPU on the way to idle can not evaluate
    > + * the timer wheel.
    > + */
    > + wake_up_idle_cpu(cpu);
    > spin_unlock_irqrestore(&base->lock, flags);
    > }
    >
    > -
    > /**
    > * mod_timer - modify a timer's timeout
    > * @timer: the timer to be modified

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Gabriel C wrote:
    > Thomas Gleixner wrote:
    >> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
    >>> On Sat, 22 Mar 2008, Gabriel C wrote:
    >>>> With this one TSC is fine but now I get a warning on boot :
    >>> Good. It confirms my assumptions about the root cause.
    >>>
    >>>> [ 0.041037] ------------[ cut here ]------------
    >>>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
    >>> Grr. I'll work out a solution for that one.

    >> Gabriel,
    >>
    >> I'm happy to rack your nerves some more.

    >
    > No worries
    >
    >> After discussing the issue with Peter and Ingo the following solution
    >> seems to be the one which is the least intrusive.
    >>
    >> Can you please give it a test ride ?

    >
    > Done , git head + Andi's patch + this version of your patch does work here.
    >
    > Also time-warp-test is just fine and everything else seems to work.


    Also I've tested with my other motherboard and is fine too

    Feel free to add my Tested-by when you push this patch.

    >
    >
    >> ---
    >> include/linux/sched.h | 6 ++++++
    >> kernel/sched.c | 42 ++++++++++++++++++++++++++++++++++++++++++
    >> kernel/timer.c | 10 +++++++++-
    >> 3 files changed, 57 insertions(+), 1 deletion(-)
    >>
    >> Index: linux-2.6/include/linux/sched.h
    >> ================================================== =================
    >> --- linux-2.6.orig/include/linux/sched.h
    >> +++ linux-2.6/include/linux/sched.h
    >> @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void)
    >>
    >> extern void sched_idle_next(void);
    >>
    >> +#ifdef CONFIG_NO_HZ
    >> +extern void wake_up_idle_cpu(int cpu);
    >> +#else
    >> +static inline void wake_up_idle_cpu(int cpu) { }
    >> +#endif
    >> +
    >> #ifdef CONFIG_SCHED_DEBUG
    >> extern unsigned int sysctl_sched_latency;
    >> extern unsigned int sysctl_sched_min_granularity;
    >> Index: linux-2.6/kernel/sched.c
    >> ================================================== =================
    >> --- linux-2.6.orig/kernel/sched.c
    >> +++ linux-2.6/kernel/sched.c
    >> @@ -848,6 +848,48 @@ static inline void resched_task(struct t
    >> __resched_task(p, TIF_NEED_RESCHED);
    >> }
    >>
    >> +#ifdef CONFIG_NO_HZ
    >> +/*
    >> + * When add_timer_on() enqueues a timer into the timer wheel of an
    >> + * idle CPU then this timer might expire before the next timer event
    >> + * which is scheduled to wake up that CPU. In case of a completely
    >> + * idle system the next event might even be infinite time into the
    >> + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
    >> + * leaves the inner idle loop so the newle added timer is taken into
    >> + * account when the CPU goes back to idle and evaluates the timer
    >> + * wheel for the next timer event.
    >> + */
    >> +void wake_up_idle_cpu(int cpu)
    >> +{
    >> + struct rq *rq = cpu_rq(cpu);
    >> +
    >> + if (cpu == smp_processor_id())
    >> + return;
    >> +
    >> + /*
    >> + * This is safe, as this function is called with the timer
    >> + * wheel base lock of (cpu) held. When the CPU is on the way
    >> + * to idle and has not yet set rq->curr to idle then it will
    >> + * be serialized on the timer wheel base lock and take the new
    >> + * timer into account automatically.
    >> + */
    >> + if (rq->curr != rq->idle)
    >> + return;
    >> +
    >> + /*
    >> + * We can set TIF_RESCHED on the idle task of the other CPU
    >> + * lockless. The worst case is that the other CPU runs the
    >> + * idle task through an additional NOOP schedule()
    >> + */
    >> + set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
    >> +
    >> + /* NEED_RESCHED must be visible before we test polling */
    >> + smp_mb();
    >> + if (!tsk_is_polling(rq->idle))
    >> + smp_send_reschedule(cpu);
    >> +}
    >> +#endif
    >> +
    >> #ifdef CONFIG_SCHED_HRTICK
    >> /*
    >> * Use HR-timers to deliver accurate preemption points.
    >> Index: linux-2.6/kernel/timer.c
    >> ================================================== =================
    >> --- linux-2.6.orig/kernel/timer.c
    >> +++ linux-2.6/kernel/timer.c
    >> @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
    >> spin_lock_irqsave(&base->lock, flags);
    >> timer_set_base(timer, base);
    >> internal_add_timer(base, timer);
    >> + /*
    >> + * Check whether the other CPU is idle and needs to be
    >> + * triggered to reevaluate the timer wheel when nohz is
    >> + * active. We are protected against the other CPU fiddling
    >> + * with the timer by holding the timer base lock. This also
    >> + * makes sure that a CPU on the way to idle can not evaluate
    >> + * the timer wheel.
    >> + */
    >> + wake_up_idle_cpu(cpu);
    >> spin_unlock_irqrestore(&base->lock, flags);
    >> }
    >>
    >> -
    >> /**
    >> * mod_timer - modify a timer's timeout
    >> * @timer: the timer to be modified

    >

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Hi Rafael,

    On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
    > Subject : INFO: task mount:11202 blocked for more than 120 seconds
    > Submitter : Christian Kujau
    > Date : 2008-03-07 21:32 (10 days old)
    > References : http://lkml.org/lkml/2008/3/7/308
    > http://lkml.org/lkml/2008/3/9/186
    >


    The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
    I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532

    Thanks for maintaining the regression list,
    Christian.
    --
    BOFH excuse #91:

    Mouse chewed through power cable
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sunday, 23 of March 2008, Christian Kujau wrote:
    > Hi Rafael,
    >
    > On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
    > > Subject : INFO: task mount:11202 blocked for more than 120 seconds
    > > Submitter : Christian Kujau
    > > Date : 2008-03-07 21:32 (10 days old)
    > > References : http://lkml.org/lkml/2008/3/7/308
    > > http://lkml.org/lkml/2008/3/9/186
    > >

    >
    > The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
    > I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532


    Is the patch present in the mainline yet?

    > Thanks for maintaining the regression list,


    You're welcome. :-)

    Thanks,
    Rafael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Sunday 23 March 2008 20:06:56 Rafael J. Wysocki wrote:
    > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10207
    > > > Subject : INFO: task mount:11202 blocked for more than 120 seconds
    > > > Submitter : Christian Kujau
    > > > Date : 2008-03-07 21:32 (10 days old)
    > > > References : http://lkml.org/lkml/2008/3/7/308
    > > > http://lkml.org/lkml/2008/3/9/186

    > >
    > > The other Christian reported this as fixed:
    > > http://lkml.org/lkml/2008/3/17/232 I too can confirm that the hangs are
    > > gone now: http://lkml.org/lkml/2008/3/21/532

    >
    > Is the patch present in the mainline yet?

    No... it isn't in the mainline?! (or was is commited as I wrote this mail?!)
    anyway, can someone please merge the patch there?

    http://lkml.org/lkml/2008/3/17/214

    Regards,
    Christian
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
    > Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
    > Submitter : Linas Žvirblis <0x0007@gmail.com>
    > Date : 2008-02-13 22:38 (33 days old)
    > References : http://lkml.org/lkml/2008/2/13/566


    Linas did not respond any more, and you closed the bug

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
    > Subject : Spurious messages at boot, eventually hangs the usb subsustem
    > Submitter : Jean-Luc Coulon
    > Date : 2008-02-20 09:10 (26 days old)


    Hm, Jean-Luc said:
    > ------- Comment #4 From Jean-Luc Coulon 2008-03-09 22:50:19 ----
    > BTW, I can normally boot my system since rc4


    Close?

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
    > Subject : 2.6.25-rc2 + smartd = hang
    > Submitter : Anders Eriksson
    > Date : 2008-02-22 17:51 (24 days old)
    > References : http://lkml.org/lkml/2008/2/22/239
    > Handled-By : Bartlomiej Zolnierkiewicz


    http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.


    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
    > Subject : INFO: possible circular locking in the resume
    > Submitter : Zdenek Kabelac
    > Date : 2008-02-27 (19 days old)
    > References : http://lkml.org/lkml/2008/2/26/479
    > Handled-By : Gautham R Shenoy


    Gautham said on 2008-02-28 he has a patch - but did not post it. What now?


    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
    > Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
    > Submitter : Marcin Slusarz
    > Date : 2008-03-02 20:00 (15 days old)
    > References : http://lkml.org/lkml/2008/3/2/91
    > Handled-By : Peter Zijlstra


    Seems to be fixed: http://lkml.org/lkml/2008/3/23/275

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
    > Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
    > Submitter : Gabriel C
    > Date : 2008-02-24 01:31 (22 days old)
    > References : http://lkml.org/lkml/2008/2/23/380
    > http://lkml.org/lkml/2008/2/24/281
    > Handled-By : Thomas Gleixner


    Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
    Which introduced a WARNNG:, fixed by the subsequent:
    http://lkml.org/lkml/2008/3/23/199

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10190
    > Subject : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
    > Submitter : Tarkan Erimer
    > Date : 2008-03-05 05:01 (12 days old)
    > References : http://www.ussg.iu.edu/hypermail/lin...03.0/1867.html


    Bugzila entry is closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10211
    > Subject : drivers/media/video/cx2341x.c: undefined references
    > Submitter : Toralf Förster
    > Date : 2008-03-07 13:48 (10 days old)
    > References : http://lkml.org/lkml/2008/3/7/168


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10234
    > Subject : pciehp hang on hp ia64 rx6600
    > Submitter : Alex Chiang
    > Date : 2008-03-12 00:47 (5 days old)
    > References : http://lkml.org/lkml/2008/3/12/31
    > Handled-By : Mark Lord


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10238
    > Subject : netconsole still hangs
    > Submitter : Andrew Morton
    > Date : 2008-03-12 23:14 (5 days old)
    > References : http://marc.info/?t=120536379200004&r=1&w=2
    > Handled-By : David Miller
    > Stephen Hemminger


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10242
    > Subject : rm command hangs
    > Submitter : Jean-Luc Coulon
    > Date : 2008-03-14 05:47 (3 days old)


    Maybe related to http://bugzilla.kernel.org/show_bug.cgi?id=10207, which
    is (about to be) closed?

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10266
    > Subject : [PATCH] i810fb: Fix console switch regression
    > Submitter : Stefan Bauer
    > Date : 2008-03-16 19:42 (1 days old)
    > References : http://lkml.org/lkml/2008/3/16/84


    Closed.

    > Regressionn with patches
    > ------------------------
    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10016
    > Subject : cobalt_btns.c <-> struct platform_device compile error
    > Submitter : Adrian Bunk
    > Date : 2008-02-17 12:12 (29 days old)
    > References : http://lkml.org/lkml/2008/2/17/293
    > Handled-By : Yoichi Yuasa
    > Patch : http://lkml.org/lkml/2008/3/9/25


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10017
    > Subject : cdev removal broke cobalt_btns.c compilation
    > Submitter : Adrian Bunk
    > Date : 2008-02-17 12:14 (29 days old)
    > References : http://lkml.org/lkml/2008/2/17/295
    > Handled-By : Yoichi Yuasa
    > Patch : http://lkml.org/lkml/2008/3/9/25


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10186
    > Subject : SCSI_AIC94XX must depend on SCSI
    > Submitter : Toralf Förster
    > Date : 2008-03-06 19:09 (11 days old)
    > References : http://marc.info/?l=linux-kernel&m=120483073617232&w=2
    > Handled-By : Adrian Bunk
    > Patch : http://marc.info/?l=linux-kernel&m=120483499725928&w=2


    Testing...

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10210
    > Subject : 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
    > Submitter : Rafael J. Wysocki
    > Date : 2008-03-08 22:46 (9 days old)
    > References : http://lkml.org/lkml/2008/3/8/123
    > Handled-By : Tejun Heo
    > Patch : http://lkml.org/lkml/2008/3/10/69


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10232
    > Subject : intel mtrr fixups apparently broke display and e1000 probe
    > Submitter : Stephen Gran
    > Date : 2008-03-12 08:37 (5 days old)
    > Handled-By : Yinghai Lu
    > Patch : http://bugzilla.kernel.org/attachmen...71&action=view


    Closed.

    > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10259
    > Subject : /sys/class/hwmon/hwmon0 is missing a device link
    > Submitter : Jean-Luc Coulon
    > Date : 2008-03-16 04:56 (1 days old)
    > Handled-By : Jean Delvare
    > Patch : http://bugzilla.kernel.org/attachmen...01&action=view


    Closed.


    Thanks,
    Christian.
    --
    BOFH excuse #387:

    Your computer's union contract is set to expire at midnight.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Hi,

    Please have a look at the latest report:
    http://lkml.org/lkml/2008/3/21/516

    On Sunday, 23 of March 2008, Christian Kujau wrote:
    > On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9983
    > > Subject : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
    > > Submitter : Linas Žvirblis <0x0007@gmail.com>
    > > Date : 2008-02-13 22:38 (33 days old)
    > > References : http://lkml.org/lkml/2008/2/13/566

    >
    > Linas did not respond any more, and you closed the bug
    >
    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10051
    > > Subject : Spurious messages at boot, eventually hangs the usb subsustem
    > > Submitter : Jean-Luc Coulon
    > > Date : 2008-02-20 09:10 (26 days old)

    >
    > Hm, Jean-Luc said:
    > > ------- Comment #4 From Jean-Luc Coulon 2008-03-09 22:50:19 ----
    > > BTW, I can normally boot my system since rc4

    >
    > Close?


    Yes, if he doesn't respond for a couple of days.

    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10086
    > > Subject : 2.6.25-rc2 + smartd = hang
    > > Submitter : Anders Eriksson
    > > Date : 2008-02-22 17:51 (24 days old)
    > > References : http://lkml.org/lkml/2008/2/22/239
    > > Handled-By : Bartlomiej Zolnierkiewicz

    >
    > http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.


    Yes, it's closed now.

    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10133
    > > Subject : INFO: possible circular locking in the resume
    > > Submitter : Zdenek Kabelac
    > > Date : 2008-02-27 (19 days old)
    > > References : http://lkml.org/lkml/2008/2/26/479
    > > Handled-By : Gautham R Shenoy

    >
    > Gautham said on 2008-02-28 he has a patch - but did not post it. What now?


    The reporter is unresponsive. We're waiting for him to respond.

    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10146
    > > Subject : 2.6.25-rc: complete lockup on boot/start of X (bisected)
    > > Submitter : Marcin Slusarz
    > > Date : 2008-03-02 20:00 (15 days old)
    > > References : http://lkml.org/lkml/2008/3/2/91
    > > Handled-By : Peter Zijlstra

    >
    > Seems to be fixed: http://lkml.org/lkml/2008/3/23/275


    Yes, I've already updated the entry with this patch.

    > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10152
    > > Subject : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
    > > Submitter : Gabriel C
    > > Date : 2008-02-24 01:31 (22 days old)
    > > References : http://lkml.org/lkml/2008/2/23/380
    > > http://lkml.org/lkml/2008/2/24/281
    > > Handled-By : Thomas Gleixner

    >
    > Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
    > Which introduced a WARNNG:, fixed by the subsequent:
    > http://lkml.org/lkml/2008/3/23/199


    This is not on the list any more.

    BTW, the reports I send reflect the state of the Bugzilla entries. The closed
    entries will not be reported next time.

    Thanks,
    Rafael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Gabriel C wrote:
    > Gabriel C wrote:
    >> Thomas Gleixner wrote:
    >>> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
    >>>> On Sat, 22 Mar 2008, Gabriel C wrote:
    >>>>> With this one TSC is fine but now I get a warning on boot :
    >>>> Good. It confirms my assumptions about the root cause.
    >>>>
    >>>>> [ 0.041037] ------------[ cut here ]------------
    >>>>> [ 0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
    >>>> Grr. I'll work out a solution for that one.
    >>> Gabriel,
    >>>
    >>> I'm happy to rack your nerves some more.

    >> No worries
    >>
    >>> After discussing the issue with Peter and Ingo the following solution
    >>> seems to be the one which is the least intrusive.
    >>>
    >>> Can you please give it a test ride ?

    >> Done , git head + Andi's patch + this version of your patch does work here.
    >>
    >> Also time-warp-test is just fine and everything else seems to work.

    >
    > Also I've tested with my other motherboard and is fine too
    >
    > Feel free to add my Tested-by when you push this patch.


    Heh :/

    ....

    [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
    [ 5920.650516] Time: acpi_pm clocksource has been installed.

    ....

    Seems like something still triggers that :/


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Mon, 24 Mar 2008, Gabriel C wrote:
    > >>> Can you please give it a test ride ?
    > >> Done , git head + Andi's patch + this version of your patch does work here.
    > >>
    > >> Also time-warp-test is just fine and everything else seems to work.

    > >
    > > Also I've tested with my other motherboard and is fine too
    > >
    > > Feel free to add my Tested-by when you push this patch.

    >
    > Heh :/
    >
    > ...
    >
    > [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
    > [ 5920.650516] Time: acpi_pm clocksource has been installed.
    >
    > ...
    >
    > Seems like something still triggers that :/


    Hmm. Can you please apply the patch below. It add some more info and
    triggers the sysrq-q timer list printout when the watchdog
    triggers. That might us give some insight into this.

    Thanks,
    tglx

    ---
    kernel/time/clocksource.c | 6 ++++--
    1 file changed, 4 insertions(+), 2 deletions(-)

    Index: linux-2.6/kernel/time/clocksource.c
    ================================================== =================
    --- linux-2.6.orig/kernel/time/clocksource.c
    +++ linux-2.6/kernel/time/clocksource.c
    @@ -87,8 +87,10 @@ static void clocksource_ratewd(struct cl
    if (delta > -WATCHDOG_THRESHOLD && delta < WATCHDOG_THRESHOLD)
    return;

    - printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
    - cs->name, delta);
    + printk(KERN_WARNING
    + "Clocksource %s unstable (delta = %Ld ns) E:%lu J:%lu\n",
    + cs->name, delta, watchdog_timer.expires, jiffies);
    + sysrq_timer_list_show();
    cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
    clocksource_change_rating(cs, 0);
    list_del(&cs->wd_list);
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Thomas Gleixner wrote:
    > On Mon, 24 Mar 2008, Gabriel C wrote:
    >>>>> Can you please give it a test ride ?
    >>>> Done , git head + Andi's patch + this version of your patch does work here.
    >>>>
    >>>> Also time-warp-test is just fine and everything else seems to work.
    >>> Also I've tested with my other motherboard and is fine too
    >>>
    >>> Feel free to add my Tested-by when you push this patch.

    >> Heh :/
    >>
    >> ...
    >>
    >> [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
    >> [ 5920.650516] Time: acpi_pm clocksource has been installed.
    >>
    >> ...
    >>
    >> Seems like something still triggers that :/

    >
    > Hmm. Can you please apply the patch below. It add some more info and
    > triggers the sysrq-q timer list printout when the watchdog
    > triggers. That might us give some insight into this.


    Sorry for the lag , I was out the whole day.

    Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):

    ....

    [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
    [34528.893380] Timer List Version: v0.3
    [34528.893386] HRTIMER_MAX_CLOCK_BASES: 2
    [34528.893392] now at 34510722407314 nsecs
    [34528.893396]
    [34528.893399] cpu: 0
    [34528.893402] clock 0:
    [34528.893404] .index: 0
    [34528.893407] .resolution: 1 nsecs
    [34528.893409] .get_time: ktime_get_real
    [34528.893422] .offset: 1206358214734619011 nsecs
    [34528.893425] active timers:
    [34528.893428] clock 1:
    [34528.893430] .index: 1
    [34528.893433] .resolution: 1 nsecs
    [34528.893435] .get_time: ktime_get
    [34528.893440] .offset: 0 nsecs
    [34528.893443] active timers:
    [34528.893445] #0: , tick_sched_timer, S:01
    [34528.893467] # expires at 34510723000000 nsecs [in 592686 nsecs]
    [34528.893470] #1: , it_real_fn, S:01
    [34528.893481] # expires at 34510724648354 nsecs [in 2241040 nsecs]
    [34528.893485] #2: , hrtimer_wakeup, S:01
    [34528.893495] # expires at 34510997616597 nsecs [in 275209283 nsecs]
    [34528.893498] #3: , hrtimer_wakeup, S:01
    [34528.893508] # expires at 34511115498292 nsecs [in 393090978 nsecs]
    [34528.893512] #4: , hrtimer_wakeup, S:01
    [34528.893521] # expires at 34511328809630 nsecs [in 606402316 nsecs]
    [34528.893525] #5: , it_real_fn, S:01
    [34528.893534] # expires at 34511515619673 nsecs [in 793212359 nsecs]
    [34528.893537] #6: , hrtimer_wakeup, S:01
    [34528.893547] # expires at 34512265383335 nsecs [in 1542976021 nsecs]
    [34528.893551] #7: , hrtimer_wakeup, S:01
    [34528.893561] # expires at 34518835323224 nsecs [in 8112915910 nsecs]
    [34528.893564] #8: , hrtimer_wakeup, S:01
    [34528.893574] # expires at 34546891223588 nsecs [in 36168816274 nsecs]
    [34528.893578] #9: , hrtimer_wakeup, S:01
    [34528.893588] # expires at 36035545999324 nsecs [in 1524823592010 nsecs]
    [34528.893592] #10: , hrtimer_wakeup, S:01
    [34528.893601] # expires at 36035980869577 nsecs [in 1525258462263 nsecs]
    [34528.893606] .expires_next : 34510723000000 nsecs
    [34528.893609] .hres_active : 1
    [34528.893612] .nr_events : 3447408
    [34528.893615] .nohz_mode : 2
    [34528.893618] .idle_tick : 34510712000000 nsecs
    [34528.893621] .tick_stopped : 0
    [34528.893624] .idle_jiffies : 34210712
    [34528.893627] .idle_calls : 3267634
    [34528.893630] .idle_sleeps : 1588325
    [34528.893633] .idle_entrytime : 34510722486118 nsecs
    [34528.893636] .idle_waketime : 34510722348607 nsecs
    [34528.893640] .idle_exittime : 34510722383780 nsecs
    [34528.893643] .idle_sleeptime : 33379861006002 nsecs
    [34528.893646] .last_jiffies : 34210723
    [34528.893649] .next_jiffies : 34210725
    [34528.893652] .idle_expires : 34510724000000 nsecs
    [34528.893655] jiffies: 34210723
    [34528.893657]
    [34528.893660] cpu: 1
    [34528.893662] clock 0:
    [34528.893664] .index: 0
    [34528.893666] .resolution: 1 nsecs
    [34528.893669] .get_time: ktime_get_real
    [34528.893675] .offset: 1206358214734619011 nsecs
    [34528.893677] active timers:
    [34528.893680] clock 1:
    [34528.893682] .index: 1
    [34528.893685] .resolution: 1 nsecs
    [34528.893687] .get_time: ktime_get
    [34528.893692] .offset: 0 nsecs
    [34528.893694] active timers:
    [34528.893697] #0: , tick_sched_timer, S:01
    [34528.893706] # expires at 34510996000000 nsecs [in 273592686 nsecs]
    [34528.893710] .expires_next : 34510996000000 nsecs
    [34528.893713] .hres_active : 1
    [34528.893716] .nr_events : 3081558
    [34528.893719] .nohz_mode : 2
    [34528.893722] .idle_tick : 34510713125000 nsecs
    [34528.893725] .tick_stopped : 1
    [34528.893727] .idle_jiffies : 34210713
    [34528.893730] .idle_calls : 2673472
    [34528.893733] .idle_sleeps : 1233326
    [34528.893736] .idle_entrytime : 34510712135468 nsecs
    [34528.893740] .idle_waketime : 34507995998292 nsecs
    [34528.893743] .idle_exittime : 34510711012024 nsecs
    [34528.893746] .idle_sleeptime : 33654735968486 nsecs
    [34528.893749] .last_jiffies : 34210713
    [34528.893752] .next_jiffies : 34210997
    [34528.893755] .idle_expires : 34510996000000 nsecs
    [34528.893758] jiffies: 34210723
    [34528.893760]
    [34528.893763] cpu: 2
    [34528.893765] clock 0:
    [34528.893767] .index: 0
    [34528.893769] .resolution: 1 nsecs
    [34528.893772] .get_time: ktime_get_real
    [34528.893778] .offset: 1206358214734619011 nsecs
    [34528.893780] active timers:
    [34528.893783] clock 1:
    [34528.893785] .index: 1
    [34528.893787] .resolution: 1 nsecs
    [34528.893790] .get_time: ktime_get
    [34528.893795] .offset: 0 nsecs
    [34528.893797] active timers:
    [34528.893799] #0: , tick_sched_timer, S:01
    [34528.893809] # expires at 34511541000000 nsecs [in 818592686 nsecs]
    [34528.893813] .expires_next : 34511541000000 nsecs
    [34528.893815] .hres_active : 1
    [34528.893818] .nr_events : 2005329
    [34528.893821] .nohz_mode : 2
    [34528.893824] .idle_tick : 34510562250000 nsecs
    [34528.893827] .tick_stopped : 1
    [34528.893830] .idle_jiffies : 34210562
    [34528.893833] .idle_calls : 1749202
    [34528.893836] .idle_sleeps : 898585
    [34528.893839] .idle_entrytime : 34510561258541 nsecs
    [34528.893842] .idle_waketime : 34509285251187 nsecs
    [34528.893845] .idle_exittime : 34510176022616 nsecs
    [34528.893848] .idle_sleeptime : 33931425421772 nsecs
    [34528.893851] .last_jiffies : 34210562
    [34528.893854] .next_jiffies : 34211542
    [34528.893858] .idle_expires : 34511541000000 nsecs
    [34528.893860] jiffies: 34210723
    [34528.893863]
    [34528.893865] cpu: 3
    [34528.893867] clock 0:
    [34528.893869] .index: 0
    [34528.893872] .resolution: 1 nsecs
    [34528.893874] .get_time: ktime_get_real
    [34528.893880] .offset: 1206358214734619011 nsecs
    [34528.893883] active timers:
    [34528.893885] clock 1:
    [34528.893887] .index: 1
    [34528.893890] .resolution: 1 nsecs
    [34528.893892] .get_time: ktime_get
    [34528.893897] .offset: 0 nsecs
    [34528.893899] active timers:
    [34528.893902] #0: , tick_sched_timer, S:01
    [34528.893911] # expires at 34510723375000 nsecs [in 967686 nsecs]
    [34528.893915] .expires_next : 34510723375000 nsecs
    [34528.893918] .hres_active : 1
    [34528.893921] .nr_events : 1532911
    [34528.893923] .nohz_mode : 2
    [34528.893926] .idle_tick : 34510713375000 nsecs
    [34528.893929] .tick_stopped : 0
    [34528.893932] .idle_jiffies : 34210714
    [34528.893935] .idle_calls : 1350449
    [34528.893938] .idle_sleeps : 896094
    [34528.893941] .idle_entrytime : 34510713334805 nsecs
    [34528.893944] .idle_waketime : 34509973216268 nsecs
    [34528.893947] .idle_exittime : 34510722367621 nsecs
    [34528.893951] .idle_sleeptime : 34031256949569 nsecs
    [34528.893954] .last_jiffies : 34210714
    [34528.893957] .next_jiffies : 34240714
    [34528.893960] .idle_expires : 34540713000000 nsecs
    [34528.893963] jiffies: 34210723
    [34528.893965]
    [34528.893967]
    [34528.893969] Tick Device: mode: 1
    [34528.893972] Clock Event Device: pit
    [34528.893976] max_delta_ns: 27461866
    [34528.893979] min_delta_ns: 12571
    [34528.893982] mult: 5124677
    [34528.893984] shift: 32
    [34528.893987] mode: 1
    [34528.893990] next_event: 9223372036854775807 nsecs
    [34528.893992] set_next_event: pit_next_event
    [34528.894000] set_mode: init_pit_timer
    [34528.894005] event_handler: tick_handle_oneshot_broadcast
    [34528.894013] tick_broadcast_mask: 00000000
    [34528.894016] tick_broadcast_oneshot_mask: 00000000
    [34528.894019]
    [34528.894021]
    [34528.894023] Tick Device: mode: 1
    [34528.894026] Clock Event Device: lapic
    [34528.894030] max_delta_ns: 1346255303
    [34528.894033] min_delta_ns: 2407
    [34528.894035] mult: 26762229
    [34528.894038] shift: 32
    [34528.894041] mode: 3
    [34528.894044] next_event: 34510724000000 nsecs
    [34528.894046] set_next_event: lapic_next_event
    [34528.894054] set_mode: lapic_timer_setup
    [34528.894059] event_handler: hrtimer_interrupt
    [34528.894064]
    [34528.894066] Tick Device: mode: 1
    [34528.894069] Clock Event Device: lapic
    [34528.894073] max_delta_ns: 1346255303
    [34528.894075] min_delta_ns: 2407
    [34528.894078] mult: 26762229
    [34528.894081] shift: 32
    [34528.894083] mode: 3
    [34528.894086] next_event: 34510996000000 nsecs
    [34528.894089] set_next_event: lapic_next_event
    [34528.894094] set_mode: lapic_timer_setup
    [34528.894099] event_handler: hrtimer_interrupt
    [34528.894104]
    [34528.894107] Tick Device: mode: 1
    [34528.894109] Clock Event Device: lapic
    [34528.894113] max_delta_ns: 1346255303
    [34528.894115] min_delta_ns: 2407
    [34528.894118] mult: 26762229
    [34528.894121] shift: 32
    [34528.894123] mode: 3
    [34528.894126] next_event: 34511541000000 nsecs
    [34528.894129] set_next_event: lapic_next_event
    [34528.894134] set_mode: lapic_timer_setup
    [34528.894139] event_handler: hrtimer_interrupt
    [34528.894144]
    [34528.894146] Tick Device: mode: 1
    [34528.894149] Clock Event Device: lapic
    [34528.894153] max_delta_ns: 1346255303
    [34528.894155] min_delta_ns: 2407
    [34528.894158] mult: 26762229
    [34528.894161] shift: 32
    [34528.894163] mode: 3
    [34528.894166] next_event: 34510723375000 nsecs
    [34528.894169] set_next_event: lapic_next_event
    [34528.894174] set_mode: lapic_timer_setup
    [34528.894179] event_handler: hrtimer_interrupt
    [34528.894184]
    [34528.894350] Time: acpi_pm clocksource has been installed.

    ....

    And that made irqbalance go mad which got killed by OOM , very strange.


    >
    > Thanks,
    > tglx



    Gabriel
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Mon, 24 Mar 2008, Gabriel C wrote:
    > > Hmm. Can you please apply the patch below. It add some more info and
    > > triggers the sysrq-q timer list printout when the watchdog
    > > triggers. That might us give some insight into this.

    >
    > Sorry for the lag , I was out the whole day.
    >
    > Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
    >
    > ...
    >
    > [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723


    Ok. The timer got delayed. It got delayed because it is initialized as
    a deferrable timer, which is obviously wrong. Sigh, I signed off on
    that commit myself without thinking about the consequences.

    Can you please apply the patch below on top of the others?

    > ...
    >
    > And that made irqbalance go mad which got killed by OOM , very strange.


    Ouch.

    revert: 1077f5a917b7c630231037826b344b2f7f5b903f

    ---
    kernel/time/clocksource.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

    Index: linux-2.6/kernel/time/clocksource.c
    ================================================== =================
    --- linux-2.6.orig/kernel/time/clocksource.c
    +++ linux-2.6/kernel/time/clocksource.c
    @@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
    if (watchdog)
    del_timer(&watchdog_timer);
    watchdog = cs;
    - init_timer_deferrable(&watchdog_timer);
    + init_timer(&watchdog_timer);
    watchdog_timer.function = clocksource_watchdog;

    /* Reset watchdog cycles */


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    Thomas Gleixner wrote:
    > On Mon, 24 Mar 2008, Gabriel C wrote:
    >>> Hmm. Can you please apply the patch below. It add some more info and
    >>> triggers the sysrq-q timer list printout when the watchdog
    >>> triggers. That might us give some insight into this.

    >> Sorry for the lag , I was out the whole day.
    >>
    >> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
    >>
    >> ...
    >>
    >> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723

    >
    > Ok. The timer got delayed. It got delayed because it is initialized as
    > a deferrable timer, which is obviously wrong. Sigh, I signed off on
    > that commit myself without thinking about the consequences.
    >
    > Can you please apply the patch below on top of the others?


    Box is up for almost one day with that patch on top the other ones and everything is fine so far.


    > revert: 1077f5a917b7c630231037826b344b2f7f5b903f
    >
    > ---
    > kernel/time/clocksource.c | 2 +-
    > 1 file changed, 1 insertion(+), 1 deletion(-)
    >
    > Index: linux-2.6/kernel/time/clocksource.c
    > ================================================== =================
    > --- linux-2.6.orig/kernel/time/clocksource.c
    > +++ linux-2.6/kernel/time/clocksource.c
    > @@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
    > if (watchdog)
    > del_timer(&watchdog_timer);
    > watchdog = cs;
    > - init_timer_deferrable(&watchdog_timer);
    > + init_timer(&watchdog_timer);
    > watchdog_timer.function = clocksource_watchdog;
    >
    > /* Reset watchdog cycles */
    >
    >



    Gabriel
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24

    On Wed, 26 Mar 2008, Gabriel C wrote:
    > Thomas Gleixner wrote:
    > > On Mon, 24 Mar 2008, Gabriel C wrote:
    > >>> Hmm. Can you please apply the patch below. It add some more info and
    > >>> triggers the sysrq-q timer list printout when the watchdog
    > >>> triggers. That might us give some insight into this.
    > >> Sorry for the lag , I was out the whole day.
    > >>
    > >> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
    > >>
    > >> ...
    > >>
    > >> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723

    > >
    > > Ok. The timer got delayed. It got delayed because it is initialized as
    > > a deferrable timer, which is obviously wrong. Sigh, I signed off on
    > > that commit myself without thinking about the consequences.
    > >
    > > Can you please apply the patch below on top of the others?

    >
    > Box is up for almost one day with that patch on top the other ones and everything is fine so far.


    Thanks for testing. I push the patches Linuswards.

    @Andi: The revert of the reverted clocksource watchdog is staged for .26

    Thanks,

    tglx
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2