x86-64 sporadic hang in 2.6.23rc7 and 2.6.22 - Kernel

This is a discussion on x86-64 sporadic hang in 2.6.23rc7 and 2.6.22 - Kernel ; The two kernels mentioned hangs occationally. Typically when I compile something and pass the time by surfing the web. A few minutes and then I notice that the mouse (and everything else in X) stops. kbd LEDs does not react ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

  1. x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    The two kernels mentioned hangs occationally.
    Typically when I compile something and pass the time
    by surfing the web.

    A few minutes and then I notice that the mouse (and everything else in X)
    stops. kbd LEDs does not react to numlock/capslock.
    The only thing that still works is sysrq+B
    So far this has happened while running X, so no messages.

    I have gone back to 2.6.22rc4, which seems to work.

    This is a single opteron, although on a dual-slot board.


    Helge Hafting
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22


    On Mon, 2007-09-24 at 23:08 +0200, Helge Hafting wrote:
    > The two kernels mentioned hangs occationally.
    > Typically when I compile something and pass the time
    > by surfing the web.
    >
    > A few minutes and then I notice that the mouse (and everything else in X)
    > stops. kbd LEDs does not react to numlock/capslock.
    > The only thing that still works is sysrq+B
    > So far this has happened while running X, so no messages.
    >
    > I have gone back to 2.6.22rc4, which seems to work.
    >
    > This is a single opteron, although on a dual-slot board.


    Can you switch to serial console, so we can get some information out of
    that box? Sysrq-B is working, so we can get info from other sysrq
    functions as well.

    tglx


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Thomas Gleixner wrote:
    > On Mon, 2007-09-24 at 23:08 +0200, Helge Hafting wrote:
    >
    >> The two kernels mentioned hangs occationally.
    >> Typically when I compile something and pass the time
    >> by surfing the web.
    >>
    >> A few minutes and then I notice that the mouse (and everything else in X)
    >> stops. kbd LEDs does not react to numlock/capslock.
    >> The only thing that still works is sysrq+B
    >> So far this has happened while running X, so no messages.
    >>
    >> I have gone back to 2.6.22rc4, which seems to work.
    >>
    >> This is a single opteron, although on a dual-slot board.
    >>

    >
    > Can you switch to serial console, so we can get some information out of
    > that box? Sysrq-B is working, so we can get info from other sysrq
    > functions as well.
    >

    I didn't need the serial - it crashes during console work too.
    I think a "make clean" was in progress at the time. There must be work
    going on
    in order to crash.

    This time 2.6.22rc4 died on me with a general protection fault

    I got two reports, the first one scrolled partially off screen but
    the whole trace was there:

    shrink_dcache_memory
    shrink_slab
    kswapd
    autoremove_wake_function
    thread_return
    trace_hardirqs_on
    kswapd
    kswapd
    kthtread
    child_rip
    restore_args
    kthread
    child_rip

    Then I got:
    spinlock lockup on cpu #0, kswapd 0/212
    _raw_spin_lock
    shrink_dcache_parent
    shrink_dcache_parent
    proc_flush_task
    release_task
    do_exit
    die
    error_exit
    prune_dcache
    [From here on, it continues exactly like the first report:]
    shrink_dcache_memory
    shrink_slab
    kswapd
    autoremove_wake_function
    thread_return
    trace_hardirqs_on
    kswapd
    kswapd
    kthtread
    child_rip
    restore_args
    kthread
    child_rip


    sysrq P says:
    cpu 0
    pid 212 comm: kswapd0 not tainted 2.6.22-rc4 #18
    RIP: __delay

    I took a picture of the screen, in case the register dumps are interesting.
    Wonder what this is - dcache trouble? swap trouble?
    Helge Hafting
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    On Sat, 29 Sep 2007, Helge Hafting wrote:
    > Thomas Gleixner wrote:
    > > > I have gone back to 2.6.22rc4, which seems to work.
    > > >
    > > > This is a single opteron, although on a dual-slot board.
    > > >

    > >
    > > Can you switch to serial console, so we can get some information out of
    > > that box? Sysrq-B is working, so we can get info from other sysrq
    > > functions as well.
    > >

    > I didn't need the serial - it crashes during console work too.
    > I think a "make clean" was in progress at the time. There must be work going
    > on in order to crash.
    >
    > This time 2.6.22rc4 died on me with a general protection fault
    >
    > I got two reports, the first one scrolled partially off screen but
    > the whole trace was there:


    That's why I asked for a serial console. That way we can get all the
    information from the reports including the register dumps ....

    > Then I got:
    > spinlock lockup on cpu #0, kswapd 0/212


    That's probably caused by the previous one.

    tglx
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Thomas Gleixner wrote:
    > On Sat, 29 Sep 2007, Helge Hafting wrote:
    >
    >> Thomas Gleixner wrote:
    >>
    >>>> I have gone back to 2.6.22rc4, which seems to work.
    >>>>
    >>>> This is a single opteron, although on a dual-slot board.
    >>>>
    >>>>
    >>> Can you switch to serial console, so we can get some information out of
    >>> that box? Sysrq-B is working, so we can get info from other sysrq
    >>> functions as well.
    >>>
    >>>

    >> I didn't need the serial - it crashes during console work too.
    >> I think a "make clean" was in progress at the time. There must be work going
    >> on in order to crash.
    >>
    >> This time 2.6.22rc4 died on me with a general protection fault
    >>
    >> I got two reports, the first one scrolled partially off screen but
    >> the whole trace was there:
    >>

    >
    > That's why I asked for a serial console. That way we can get all the
    > information from the reports including the register dumps ...
    >

    Sure. But I can't get a cable right now. Was the registers necessary
    in this case? Often, the trace turns out to be enough.

    Helge Hafting

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Helge Hafting writes:
    >
    > shrink_dcache_memory


    That usually means random memory corruption from somewhere -- dcache
    tends to use a lot of memory and when it is corrupted anywhere these
    functions tend to crash while walking the lists.

    Unfortunately memory corruption is hard to track down because
    the messenger is usually not the one to blame.

    Perhaps enable slab debugging and see if it turns
    something up. Could be also broken hardware. Does an older kernel
    run stable? If yes and if it can be reproduced bisecting would
    be good.

    -Andi
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Andi Kleen wrote:
    > Helge Hafting writes:
    >
    >> shrink_dcache_memory
    >>

    >
    > That usually means random memory corruption from somewhere -- dcache
    > tends to use a lot of memory and when it is corrupted anywhere these
    > functions tend to crash while walking the lists.
    >
    > Unfortunately memory corruption is hard to track down because
    > the messenger is usually not the one to blame.
    >
    > Perhaps enable slab debugging and see if it turns
    > something up. Could be also broken hardware. Does an older kernel
    > run stable? If yes and if it can be reproduced bisecting would
    > be good.
    >

    2.6.18 had no problem compiling stuff without crashing.
    Looks like I have some work to do then.

    Helge Hafting
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Andi Kleen wrote:
    > Helge Hafting writes:
    >
    >> shrink_dcache_memory
    >>

    >
    > That usually means random memory corruption from somewhere -- dcache
    > tends to use a lot of memory and when it is corrupted anywhere these
    > functions tend to crash while walking the lists.
    >
    > Unfortunately memory corruption is hard to track down because
    > the messenger is usually not the one to blame.
    >
    > Perhaps enable slab debugging and see if it turns
    > something up. Could be also broken hardware. Does an older kernel
    > run stable? If yes and if it can be reproduced bisecting would
    > be good.
    >

    I attempted bisecting - and failed. The problem is that
    2.6.23rc7 seems very unstable, but 2.6.22rc4 is much better
    but not perfect. 2.6.22rc4 only crashed once - it can compile for
    hours and swap lots and keep running. But it died at least once.

    I'll try running recent kernels with more debugging instead.
    I think I used SLUB instead of SLAB - either way I can switch
    that over to see if it changes things.

    Helge Hafting

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22

    Thomas Gleixner wrote:
    > On Sat, 29 Sep 2007, Helge Hafting wrote:
    >
    >> Thomas Gleixner wrote:
    >>
    >>>> I have gone back to 2.6.22rc4, which seems to work.
    >>>>
    >>>> This is a single opteron, although on a dual-slot board.
    >>>>
    >>>>
    >>> Can you switch to serial console, so we can get some information out of
    >>> that box? Sysrq-B is working, so we can get info from other sysrq
    >>> functions as well.
    >>>
    >>>

    >> I didn't need the serial - it crashes during console work too.
    >> I think a "make clean" was in progress at the time. There must be work going
    >> on in order to crash.
    >>
    >> This time 2.6.22rc4 died on me with a general protection fault
    >>
    >> I got two reports, the first one scrolled partially off screen but
    >> the whole trace was there:
    >>

    >
    > That's why I asked for a serial console. That way we can get all the
    > information from the reports including the register dumps ....
    >

    I got another crash - with a full dump. I have also discovered
    files with lots of single-bit errors, so this is probably just some kind
    of hw problem. :-(

    Replace mermory or the motherboard with everything on it . . . :-(

    Helge Hafting

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread