kernel: Bad page state in process 'cc1' - Linux

This is a discussion on kernel: Bad page state in process 'cc1' - Linux ; Hi, just a quick note to record a problem and our work-around for it. We were running the 2.6.16.17 kernel for a few months. Then we started noticing kernel: Backtrace: dumps in the syslog at a particular time. It turns ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: kernel: Bad page state in process 'cc1'

  1. kernel: Bad page state in process 'cc1'



    Hi,

    just a quick note to record a problem and our work-around for it.

    We were running the 2.6.16.17 kernel for a few months. Then we started
    noticing
    kernel: Backtrace:
    dumps in the syslog at a particular time. It turns out that a system
    cron job was resulting in this. The reason was unclear.

    Soon afterwards, we started seeing paging failures. Initially, the
    same cron job was dumping this into the terminals; later on, other apps
    did so.

    Message from syslogd@chaos at Wed Nov 22 06:26:21 2006 ...
    chaos kernel: Bad page state in process 'sort'

    Message from syslogd@chaos at Wed Nov 22 06:26:21 2006 ...
    chaos kernel: page:c1736a00 flags:0xc0000000 mapping:00000000 mapcount:33554432 count:0

    Message from syslogd@chaos at Wed Nov 22 06:26:21 2006 ...
    chaos kernel: Backtrace:

    Message from syslogd@chaos at Wed Nov 22 06:26:21 2006 ...
    chaos kernel: Trying to fix it up, but a reboot is needed






    The odd thing about this was this symptom occured with increasing frequency!
    Rebooting did not reduce this. That is, even though we rebooted, the freq.
    increased. Eventually, searching usenet pointed out some issues with
    the kernels between 2.6.14 and 2.6.16.
    However, trying to upgrade to the latest kernel was not easy. For reasons
    specific to us, we had to build a kernel; attempts at downloading a generic
    kernel did not help anyway.

    The problem was that the paging problem showed up during the kernel build,
    and so the build would fail. Using an old vanilla 2.4 kernel choked as well
    for other reasons. It would fail with

    internal compiler error: Segmentation fault

    We even reduced the runlevel, to no avail. Finally, we gave up, and resumed
    the build after every failure. That is, we simply continued the build with
    all its previous object files. If the make failed to remove the object file
    which it was trying to build at the point of failure, then we manually
    removed it. It was simply not possible to do a brand new clean build after
    each failure: it would fail at some point!

    Eventually, the build finished while using the old 2.6.16.17 kernel.
    The new 2.6.18 kernel has been running for over 40 days now, with no
    problems. So there seems to have been no problems with using the old, faulty
    kernel for building the new kernel.

  2. Re: kernel: Bad page state in process 'cc1'

    On Feb 2, 8:53 pm, Dan Bray wrote:
    > Hi,
    >
    > just a quick note to record a problem and our work-around for it.


    It sounds a lot like bad hardware. Don't waste anyone's time with
    this.

    > We even reduced the runlevel, to no avail. Finally, we gave up, and resumed
    > the build after every failure. That is, we simply continued the build with
    > all its previous object files. If the make failed to remove the object file
    > which it was trying to build at the point of failure, then we manually
    > removed it. It was simply not possible to do a brand new clean build after
    > each failure: it would fail at some point!


    Why would you build the kernel /on/ the machine that it is targetted
    for, if that machine is failing?

    Did you run any diagnostic software on the machine? Memory, disk.

    Did you try booting a copy of the exact same kernel image on another
    machine?

    Did you try simply swapping some memory modules to see whether it
    makes a difference?

    There are just the obvious things that can be done to narrow something
    like this down.

    Continuing to run the system and even compiling on it, that's just
    idiocy.

    > Eventually, the build finished while using the old 2.6.16.17 kernel.


    Yeah, after how many reboots? It would probably have been faster to
    install enough Linux on some spare PC to get a kernel-compiling
    toolchain up and running, plus networking to get the image across.

    > The new 2.6.18 kernel has been running for over 40 days now, with no
    > problems. So there seems to have been no problems with using the old, faulty
    > kernel for building the new kernel.


    Your reference here to a ``faulty kernel'' is based on inconclusive
    evidence.

    A bad memory chip might cause completely different behavior with a
    different operating system image.

    If you randomly flip a bit in a program, you might cause it to fail
    miserably, or the change could do something as innocuous as
    introducing a spelling error into a debugging trace message.

    How can you trust this box?


  3. Re: kernel: Bad page state in process 'cc1'

    Dan Bray wrote in part:
    > Eventually, the build finished while using the old 2.6.16.17
    > kernel. The new 2.6.18 kernel has been running for over 40 days
    > now, with no problems. So there seems to have been no problems
    > with using the old, faulty kernel for building the new kernel.


    This is somewhat surprising. Linus might say unintended because
    he expects perfect hardware. I'm running an ancient (1999) Abit
    BP6 (dual 500 Celeron). It has an overly long APIC bus (Intel
    didn't have decent guidelines) and most users reported lockups ever
    10-20 days. I've patched successive kernels and been running ~8
    years with less than one lockup per year. Kernel code sometimes
    aggravates marginal hardware.

    It appears you have less-than-perfect hardware (details please!)
    that one kernel works harder than another. So defects show much
    more on the former.

    However, you really should investigate your hardware. Progressively
    increasing errors are extremely worrysome. Power supplies and
    hard-disks can fail this way, and you ought to investigate.
    Repeated `md5sum` on an unmounted disk partition is good,
    `memtest86` is good to exhausively test RAM, and my (somewhat old)
    utilities are good for testing CPUs and mobo.

    -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm



  4. Re: kernel: Bad page state in process 'cc1'

    Robert Redelmeier writes:

    > It appears you have less-than-perfect hardware (details please!)


    The hardware is fine. Period. I do not have time to record the
    history of this, but I do thank you for your response.

    If you do a search for this 'bad page state' issue, you will find
    some (though not many) references to it. It applied to specific
    kernels, as noted in the originl post.



    > that one kernel works harder than another. So defects show much
    > more on the former.
    >
    > However, you really should investigate your hardware. Progressively
    > increasing errors are extremely worrysome. Power supplies and


    Agreed. The post was not done in order to help diagnose the problem;
    the root cause still remains unknown: either the kernels were at fault,
    or the kernels merely brought out fickle h/w issues.
    The 'bad page state' issue has been tackled elsewhere, presumably in
    the kernel list. I did not aim to delve into that.

    The post was done for archival purposes, only, so that it may be of help
    to someone getting the same error messages.


+ Reply to Thread