[regression bisected] HR-timers bug >=2.6.25 - Kernel

This is a discussion on [regression bisected] HR-timers bug >=2.6.25 - Kernel ; Hi, I've reported this problem previously: http://lkml.org/lkml/2008/3/12/290 http://bugzilla.kernel.org/show_bug.cgi?id=10235 https://bugs.freedesktop.org/show_bug.cgi?id=15602 This bug is still in mainline as of 2.6.26-rc1. (affected versions 2.6.25-rc0 - 2.6.26-rc1) Ever since 2.6.25-rc3 (the first rc I tested), my screen would blank out when mode switching from ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: [regression bisected] HR-timers bug >=2.6.25

  1. [regression bisected] HR-timers bug >=2.6.25

    Hi,

    I've reported this problem previously:
    http://lkml.org/lkml/2008/3/12/290
    http://bugzilla.kernel.org/show_bug.cgi?id=10235
    https://bugs.freedesktop.org/show_bug.cgi?id=15602

    This bug is still in mainline as of 2.6.26-rc1. (affected versions
    2.6.25-rc0 - 2.6.26-rc1)

    Ever since 2.6.25-rc3 (the first rc I tested), my screen would blank out
    when mode switching from the Ubuntu uplash to the gdm login. Normally
    after the boot splash, the screen goes black and "refreshes" to select
    the correct resolution. But when using a kernel newer than 2.6.24 the
    screen doesn't come back on - it stays blank. This doesn't happen every
    time, but a good majority of the time. The workaround is to disable the
    splash, and that fixes the problem almost completely.

    I did a git bisect. Rebooting 3 times if the problem didn't occur just
    to make sure.
    The git bisect reports that it's timing related - which is what Jesse
    Barnes said it could be.

    > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
    > Author: Peter Zijlstra
    > Date: Fri Jan 25 21:08:29 2008 +0100
    >
    > sched: high-res preemption tick
    >
    > Use HR-timers (when available) to deliver an accurate preemption tick.
    >
    > The regular scheduler tick that runs at 1/HZ can be too coarse when
    > nice level
    > are used. The fairness system will still keep the cpu utilisation
    > 'fair' by
    > then delaying the task that got an excessive amount of CPU time but try to
    > minimize this by delivering preemption points spot-on.
    >
    > The average frequency of this extra interrupt is sched_latency /
    > nr_latency.
    > Which need not be higher than 1/HZ, its just that the distribution
    > within the
    > sched_latency period is important.
    >
    > Signed-off-by: Peter Zijlstra
    > Signed-off-by: Ingo Molnar
    >
    > :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1
    > f1742e1d225a72aecea9d6961ed989b5943d31d8 M arch
    > :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe
    > ae61510186b4fad708ef0211ac169decba16d4e5 M include
    > :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26
    > 950832cc1dc4d30923f593ecec883a06b45d62e9 M kernel


    > # bad: [29e8c3c304b62f31b799565c9ee85d42bd163f80] Linux 2.6.25-rc4
    > # good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
    > git-bisect start 'v2.6.25-rc4' 'v2.6.24'
    > # bad: [bd45ac0c5daae35e7c71138172e63df5cf644cf6] Merge branch 'linux-2.6'
    > git-bisect bad bd45ac0c5daae35e7c71138172e63df5cf644cf6
    > # good: [a6f71745969d495d697d1ccd96385d2f7a963375] [POWERPC] 85xx:
    > Only invalidate TLB0 and TLB1
    > git-bisect good a6f71745969d495d697d1ccd96385d2f7a963375
    > # bad: [fb5b6095f320bd5a615049aa5fe8827ae9d1bf80] [NETFILTER]:
    > arp_tables: move entry and target checks to seperate functions
    > git-bisect bad fb5b6095f320bd5a615049aa5fe8827ae9d1bf80
    > # bad: [50d9a126240f9961cfdd063336bbeb91f77a7dce] Merge
    > git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
    > git-bisect bad 50d9a126240f9961cfdd063336bbeb91f77a7dce
    > # good: [102df6a785bd5ff22b0ca745f3107ab9780fc30b] V4L/DVB (6685):
    > ir-keymaps.c: extra keys on winfast Y04G0033 remote
    > git-bisect good 102df6a785bd5ff22b0ca745f3107ab9780fc30b
    > # good: [a999337b49fcdd2c4a475e97e4b8337ebdfa4abf] V4L/DVB (7078):
    > radio: fix sf16fmi section mismatch
    > git-bisect good a999337b49fcdd2c4a475e97e4b8337ebdfa4abf
    > # bad: [b31fde6db2b76a9f7f59bf016652b46cff43f8da] Merge
    > git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
    > git-bisect bad b31fde6db2b76a9f7f59bf016652b46cff43f8da
    > # bad: [6478d8800b75253b2a934ddcb734e13ade023ad0] sched: remove the
    > !PREEMPT_BKL code
    > git-bisect bad 6478d8800b75253b2a934ddcb734e13ade023ad0
    > # good: [80bf3171dcdf0f5d236e2e48afe2a95c7ce23879] sched: clean up
    > pull_rt_task()
    > git-bisect good 80bf3171dcdf0f5d236e2e48afe2a95c7ce23879
    > # good: [eaf649e9fe6685f4c5a392cd0e16df5fd6660b7c] Preempt-RCU: CPU
    > Hotplug handling
    > git-bisect good eaf649e9fe6685f4c5a392cd0e16df5fd6660b7c
    > # bad: [6f505b16425a51270058e4a93441fe64de3dd435] sched: rt group
    > scheduling
    > git-bisect bad 6f505b16425a51270058e4a93441fe64de3dd435
    > # good: [78f2c7db6068fd6ef75b8c120f04a388848eacb5] sched:
    > SCHED_FIFO/SCHED_RR watchdog timer
    > git-bisect good 78f2c7db6068fd6ef75b8c120f04a388848eacb5
    > # good: [02b67cc3ba36bdba351d6c3a00593f4ec550d9d3] sched: do not do
    > cond_resched() when CONFIG_PREEMPT
    > git-bisect good 02b67cc3ba36bdba351d6c3a00593f4ec550d9d3
    > # bad: [fa85ae2418e6843953107cd6a06f645752829bc0] sched: rt time limit
    > git-bisect bad fa85ae2418e6843953107cd6a06f645752829bc0
    > # bad: [8f4d37ec073c17e2d4aa8851df5837d798606d6f] sched: high-res
    > preemption tick
    > git-bisect bad 8f4d37ec073c17e2d4aa8851df5837d798606d6f



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [regression bisected] HR-timers bug >=2.6.25

    On Thu, 15 May 2008, Justin Madru wrote:
    > I've reported this problem previously:
    > http://lkml.org/lkml/2008/3/12/290
    > http://bugzilla.kernel.org/show_bug.cgi?id=10235
    > https://bugs.freedesktop.org/show_bug.cgi?id=15602
    >
    > This bug is still in mainline as of 2.6.26-rc1. (affected versions 2.6.25-rc0
    > - 2.6.26-rc1)
    >
    > Ever since 2.6.25-rc3 (the first rc I tested), my screen would blank out when
    > mode switching from the Ubuntu uplash to the gdm login. Normally after the
    > boot splash, the screen goes black and "refreshes" to select the correct
    > resolution. But when using a kernel newer than 2.6.24 the screen doesn't come
    > back on - it stays blank. This doesn't happen every time, but a good majority
    > of the time. The workaround is to disable the splash, and that fixes the
    > problem almost completely.
    >
    > I did a git bisect. Rebooting 3 times if the problem didn't occur just to make
    > sure.
    > The git bisect reports that it's timing related - which is what Jesse Barnes
    > said it could be.


    Well, it's a timing problem related to splash which is exposed by the
    scheduler timing changes. This is neither a hrtimer nor a scheduler
    bug. Something in the splash / video driver switchover relies on some
    obscure timing which is nowhere guaranteed.

    Thanks,
    tglx


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [regression bisected] HR-timers bug >=2.6.25

    On Fri, 2008-05-16 at 09:15 +0200, Thomas Gleixner wrote:
    > On Thu, 15 May 2008, Justin Madru wrote:
    > > I've reported this problem previously:
    > > http://lkml.org/lkml/2008/3/12/290
    > > http://bugzilla.kernel.org/show_bug.cgi?id=10235
    > > https://bugs.freedesktop.org/show_bug.cgi?id=15602
    > >
    > > This bug is still in mainline as of 2.6.26-rc1. (affected versions 2.6.25-rc0
    > > - 2.6.26-rc1)
    > >
    > > Ever since 2.6.25-rc3 (the first rc I tested), my screen would blank out when
    > > mode switching from the Ubuntu uplash to the gdm login. Normally after the
    > > boot splash, the screen goes black and "refreshes" to select the correct
    > > resolution. But when using a kernel newer than 2.6.24 the screen doesn't come
    > > back on - it stays blank. This doesn't happen every time, but a good majority
    > > of the time. The workaround is to disable the splash, and that fixes the
    > > problem almost completely.
    > >
    > > I did a git bisect. Rebooting 3 times if the problem didn't occur just to make
    > > sure.
    > > The git bisect reports that it's timing related - which is what Jesse Barnes
    > > said it could be.

    >
    > Well, it's a timing problem related to splash which is exposed by the
    > scheduler timing changes. This is neither a hrtimer nor a scheduler
    > bug. Something in the splash / video driver switchover relies on some
    > obscure timing which is nowhere guaranteed.


    @Justin:

    The kernel doesn't hang, right? My reading of the description tells me
    the splash just doesn't work.

    If it does hang the kernel, an NMI trace collected over serial or
    netconsole (regular console being out of the question since its
    graphical stuff :/) would be most helpful.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [regression bisected] HR-timers bug >=2.6.25

    Peter Zijlstra wrote:
    > The kernel doesn't hang, right? My reading of the description tells me
    > the splash just doesn't work.
    >
    > If it does hang the kernel, an NMI trace collected over serial or
    > netconsole (regular console being out of the question since its
    > graphical stuff :/) would be most helpful.


    I've seen it blank out in 2 different ways.

    1) Usplash completely finishes. Then screen goes blank for the usual mode change/screen refreshing,
    doing several refreshes of a blank screen. But, the screen then stays black.
    I do still hear the gdm sound, and can switch to a console.
    Pressing ctrl+alt+f1 makes the screen "refresh" again, but stay blank/black. I can still login but blindly.

    2) More rarely. Usplash hasn't finished (but near the end). The usplash screen fades out to black.
    It's like a screen burn in, or after image; the screen slowly fades out to black.
    After this the computer is _seemingly_unresponsive_ - Only alt+sysrq+b seems to work. (I have to hard reset)
    After I reboot the backlight is at the lowest level.

    Anyways, I think I can figure out how to set up a netconsole, but
    is there something more to setup a NMI trace? I found Documentation/networking/netconsole.txt
    But, it doesn't say anything about a NMI trace.

    Justin


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [regression bisected] HR-timers bug >=2.6.25

    On Fri, 2008-05-16 at 10:26 -0700, Justin Madru wrote:
    > Peter Zijlstra wrote:
    > > The kernel doesn't hang, right? My reading of the description tells me
    > > the splash just doesn't work.
    > >
    > > If it does hang the kernel, an NMI trace collected over serial or
    > > netconsole (regular console being out of the question since its
    > > graphical stuff :/) would be most helpful.

    >
    > I've seen it blank out in 2 different ways.
    >
    > 1) Usplash completely finishes. Then screen goes blank for the usual mode change/screen refreshing,
    > doing several refreshes of a blank screen. But, the screen then stays black.
    > I do still hear the gdm sound, and can switch to a console.
    > Pressing ctrl+alt+f1 makes the screen "refresh" again, but stay blank/black. I can still login but blindly.
    >
    > 2) More rarely. Usplash hasn't finished (but near the end). The usplash screen fades out to black.
    > It's like a screen burn in, or after image; the screen slowly fades out to black.
    > After this the computer is _seemingly_unresponsive_ - Only alt+sysrq+b seems to work. (I have to hard reset)
    > After I reboot the backlight is at the lowest level.
    >
    > Anyways, I think I can figure out how to set up a netconsole, but
    > is there something more to setup a NMI trace? I found Documentation/networking/netconsole.txt
    > But, it doesn't say anything about a NMI trace.


    those can be obtained by adding:

    nmi_watchdog=[12]

    to the kernel boot parameters - it depends a bit on the hardware which
    of the two choices works best, just start with 1 and if that doesn't
    work try 2.

    This enabled the NMI watchdog and that will print a backtrace when it
    times out after 30 or so seconds.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [regression bisected] HR-timers bug >=2.6.25

    >
    > those can be obtained by adding:
    >
    > nmi_watchdog=[12]
    >
    > to the kernel boot parameters - it depends a bit on the hardware which
    > of the two choices works best, just start with 1 and if that doesn't
    > work try 2.
    >
    > This enabled the NMI watchdog and that will print a backtrace when it
    > times out after 30 or so seconds.

    I got netconsole to work, but it stops logging. Below are the last few
    lines of output:

    > ACPI: device:25 is registered as cooling_device2
    > input: Video Bus as
    > /devices/LNXSYSTM:00/device:00/PNP0A03:00/device:21/device:22/input/input4
    > phy0: Selected rate control algorithm 'iwl-3945-rs'
    > ACPI: Video Device [VID] (multi-head: yes rom: no post: no)
    > ACPI: device:2a is registered as cooling_device3
    > input: Video Bus as
    > /devices/LNXSYSTM:00/device:00/PNP0A03:00/device:27/input/input5
    > ACPI: Video Device [VID1] (multi-head: yes rom: no post: no)
    > input: Video Bus as
    > /devices/LNXSYSTM:00/device:00/PNP0A03:00/device:2c/input/input6
    > ACPI: Video Device [VID2] (multi-head: yes rom: no post: no)
    > ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 21 (level, low) -> IRQ 21
    > PCI: Setting latency timer of device 0000:00:1b.0 to 64
    > ACPI: PCI interrupt for device 0000:0b:00.0 disabled
    > Synaptics Touchpad, model: 1, fw: 6.2, id: 0x180b1, caps:
    > 0xa04713/0x200000
    > input: SynPS/2 Synaptics TouchPad as
    > /devices/platform/i8042/serio1/input/input7
    > dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
    > fuse init (API version 7.9)

    I do get several of the following after several minuets
    > trying to get vblank count for disabled pipe 0


    Even when I got the screen to "fade" out, I never got anything more. I
    guess it doesn't hang the kernel. It just must be a different timing
    problem related to usplash/xorg which is exposed by the scheduler timing
    changes since around that one commit that I found by git bisect. It does
    seem to be a very weird problem.

    Any ideas on how to help the xorg intel driver developers pinpoint the
    problem?

    Justin

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread