Bug#494365: 2.6.26 hangs on opteron CPUs - Debian

This is a discussion on Bug#494365: 2.6.26 hangs on opteron CPUs - Debian ; On Fri, Aug 08, 2008 at 08:51:33PM +0200, Peter Palfrader wrote: > Package: linux-image-2.6.26-1-amd64 > Version: 2.6.26-1 > Severity: important > > Hi, > > it seems that 2.6.26 (whether the debian package or the kernel.org > kernel) locks up ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Bug#494365: 2.6.26 hangs on opteron CPUs

  1. Bug#494365: 2.6.26 hangs on opteron CPUs

    On Fri, Aug 08, 2008 at 08:51:33PM +0200, Peter Palfrader wrote:
    > Package: linux-image-2.6.26-1-amd64
    > Version: 2.6.26-1
    > Severity: important
    >
    > Hi,
    >
    > it seems that 2.6.26 (whether the debian package or the kernel.org
    > kernel) locks up after a while on Debian's DL385G1 systems.
    >
    > After a while, sooner with more disk IO/filesystem load, the system
    > hangs: it continues to do stuff but everything involving disk hangs
    > forever.
    >
    > The systems work just fine on a 2.6.25.10 kernel.
    >
    > The servers have Opterons like this:
    > cpu family : 15
    > model : 33
    >
    > so http://www.uwsg.iu.edu/hypermail/lin...08.0/0882.html might
    > explain it.


    hey Peter,
    This is readily reproducible - a simple kernel compile was all it
    took. git bisecting suggests that this issue was introduced by [1]
    and unmasked by [2] during 2.6.26 devlopment. It was later fixed
    during 2.6.27 development by [3].

    Can you confirm that the attached backport of [3] fixes the problem
    for you?

    [1] 35605a1027ac630f85a1b95684f7e86b82498cd6
    [2] 8d539108560ec121d59eee05160236488266221c
    [3] 8004dd965b13b01a96def054d420f6df7ff22d53


    --
    dann frazier



  2. Bug#494365: 2.6.26 hangs on opteron CPUs

    adding stable@kernel.org on cc for
    8004dd965b13b01a96def054d420f6df7ff22d53

    On Mon, Aug 11, 2008 at 11:17:34PM -0600, dann frazier wrote:
    > On Fri, Aug 08, 2008 at 08:51:33PM +0200, Peter Palfrader wrote:
    > > Package: linux-image-2.6.26-1-amd64
    > > Version: 2.6.26-1
    > > Severity: important
    > >
    > > Hi,
    > >
    > > it seems that 2.6.26 (whether the debian package or the kernel.org
    > > kernel) locks up after a while on Debian's DL385G1 systems.
    > >
    > > After a while, sooner with more disk IO/filesystem load, the system
    > > hangs: it continues to do stuff but everything involving disk hangs
    > > forever.
    > >
    > > The systems work just fine on a 2.6.25.10 kernel.
    > >
    > > The servers have Opterons like this:
    > > cpu family : 15
    > > model : 33
    > >
    > > so http://www.uwsg.iu.edu/hypermail/lin...08.0/0882.html might
    > > explain it.

    >
    > hey Peter,
    > This is readily reproducible - a simple kernel compile was all it
    > took. git bisecting suggests that this issue was introduced by [1]
    > and unmasked by [2] during 2.6.26 devlopment. It was later fixed
    > during 2.6.27 development by [3].
    >
    > Can you confirm that the attached backport of [3] fixes the problem
    > for you?
    >
    > [1] 35605a1027ac630f85a1b95684f7e86b82498cd6
    > [2] 8d539108560ec121d59eee05160236488266221c
    > [3] 8004dd965b13b01a96def054d420f6df7ff22d53
    >
    >
    > --
    > dann frazier
    >


    > commit 8004dd965b13b01a96def054d420f6df7ff22d53
    > Author: Yinghai Lu
    > Date: Mon May 12 17:40:39 2008 -0700
    >
    > x86: amd opteron TOM2 mask val fix
    >
    > there is a typo in the mask value, need to remove that extra 0,
    > to avoid 4bit clearing.
    >
    > Signed-off-by: Yinghal Lu
    > Signed-off-by: Ingo Molnar
    >
    > Backported to Debian's 2.6.26 by dann frazier
    >
    > diff -urpN linux-source-2.6.26.orig/arch/x86/kernel/cpu/mtrr/generic.c linux-source-2.6.26/arch/x86/kernel/cpu/mtrr/generic.c
    > --- linux-source-2.6.26.orig/arch/x86/kernel/cpu/mtrr/generic.c 2008-08-11 22:55:59.000000000 -0600
    > +++ linux-source-2.6.26/arch/x86/kernel/cpu/mtrr/generic.c 2008-08-11 22:57:13.000000000 -0600
    > @@ -219,7 +219,7 @@ void __init get_mtrr_state(void)
    > tom2 = hi;
    > tom2 <<= 32;
    > tom2 |= lo;
    > - tom2 &= 0xffffff8000000ULL;
    > + tom2 &= 0xffffff800000ULL;
    > }
    > if (mtrr_show) {
    > int high_width;
    > diff -urpN linux-source-2.6.26.orig/arch/x86/pci/k8-bus_64.c linux-source-2.6.26/arch/x86/pci/k8-bus_64.c
    > --- linux-source-2.6.26.orig/arch/x86/pci/k8-bus_64.c 2008-08-11 22:55:59.000000000 -0600
    > +++ linux-source-2.6.26/arch/x86/pci/k8-bus_64.c 2008-08-11 22:57:13.000000000 -0600
    > @@ -384,7 +384,7 @@ static int __init early_fill_mp_bus_info
    > /* need to take out [0, TOM) for RAM*/
    > address = MSR_K8_TOP_MEM1;
    > rdmsrl(address, val);
    > - end = (val & 0xffffff8000000ULL);
    > + end = (val & 0xffffff800000ULL);
    > printk(KERN_INFO "TOM: %016lx aka %ldM\n", end, end>>20);
    > if (end < (1ULL<<32))
    > update_range(range, 0, end - 1);
    > @@ -478,7 +478,7 @@ static int __init early_fill_mp_bus_info
    > /* TOP_MEM2 */
    > address = MSR_K8_TOP_MEM2;
    > rdmsrl(address, val);
    > - end = (val & 0xffffff8000000ULL);
    > + end = (val & 0xffffff800000ULL);
    > printk(KERN_INFO "TOM2: %016lx aka %ldM\n", end, end>>20);
    > update_range(range, 1ULL<<32, end - 1);
    > }





    --
    To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  3. Bug#494365: 2.6.26 hangs on opteron CPUs

    On Fri, Aug 08, 2008 at 08:51:33PM +0200, Peter Palfrader wrote:
    > > > it seems that 2.6.26 (whether the debian package or the kernel.org
    > > > kernel) locks up after a while on Debian's DL385G1 systems.


    On Mon, Aug 11, 2008 at 11:17:34PM -0600, dann frazier wrote:
    > > hey Peter,
    > > This is readily reproducible - a simple kernel compile was all it
    > > took. git bisecting suggests that this issue was introduced by [1]
    > > and unmasked by [2] during 2.6.26 devlopment. It was later fixed
    > > during 2.6.27 development by [3].
    > >
    > > Can you confirm that the attached backport of [3] fixes the problem
    > > for you?


    > > [3] 8004dd965b13b01a96def054d420f6df7ff22d53


    I upgraded several of Debian's machines to 2.6.26.2 plus the patch you
    provided. So far they are still running - and I put them under load
    that would reliably kill them before.

    Thanks,
    weasel
    --
    | .''`. ** Debian GNU/Linux **
    Peter Palfrader | : :' : The universal
    http://www.palfrader.org/ | `. `' Operating System
    | `- http://www.debian.org/



    --
    To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

+ Reply to Thread