kernel: Uhhuh. NMI received for unknown reason 21. - Linux

This is a discussion on kernel: Uhhuh. NMI received for unknown reason 21. - Linux ; hey, A server has started to complaint. Every 2 minuttes i get in /var/log/messages Feb 16 17:36:41 svr kernel: Uhhuh. NMI received for unknown reason 21. Feb 16 17:36:41 svr kernel: Dazed and confused, but trying to continue Feb 16 ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: kernel: Uhhuh. NMI received for unknown reason 21.

  1. kernel: Uhhuh. NMI received for unknown reason 21.

    hey,

    A server has started to complaint. Every 2 minuttes i get in /var/log/messages

    Feb 16 17:36:41 svr kernel: Uhhuh. NMI received for unknown reason 21.
    Feb 16 17:36:41 svr kernel: Dazed and confused, but trying to continue
    Feb 16 17:36:41 svr kernel: Do you have a strange power saving mode enabled?
    ....
    Feb 16 17:38:41 svr kernel: Uhhuh. NMI received for unknown reason 21.
    Feb 16 17:38:41 svr kernel: Dazed and confused, but trying to continue
    Feb 16 17:38:41 svr kernel: Do you have a strange power saving mode enabled?
    ....

    etc

    # cat /proc/version Linux version 2.4.18-14
    (bhcompile@stripples.devel.redhat.com) (gcc version 3.2 20020903 (Red
    Hat Linux 8.0 3.2-7)) #1 Wed Sep 4 13:35:50 EDT 2002

    Anyone know what generates these messages?


    --
    Søren Dideriksen

  2. Re: kernel: Uhhuh. NMI received for unknown reason 21.

    Søren Dideriksen wrote:
    > Feb 16 17:38:41 svr kernel: Dazed and confused, but trying to continue
    > Anyone know what generates these messages?


    Just typing the error message into google, and following a couple of links, it
    appears that the error is something like a memory parity error, and is tied to
    the error checking and correcting facility on intel motherboards. You could
    have a faulty memory chip, or it may be that you need to adjust some chipset
    related settings in the system BIOS.

    Regards,

    Mark.

    --
    Mark Hobley
    393 Quinton Road West
    QUINTON
    Birmingham
    B32 1QE

    Telephone: (0121) 247 1596
    International: 0044 121 247 1596

    Email: markhobley at hotpop dot donottypethisbit com

    http://markhobley.yi.org/


  3. Re: kernel: Uhhuh. NMI received for unknown reason 21.

    On 16 Feb 2007, in the Usenet newsgroup comp.os.linux, in article
    <87sld6b4ig.fsf@localhost.localdomain>,
    =?utf-8?b?U8O4cmVuIERpZGVyaWtzZQ==?= =?utf-8?b?bg==?= wrote:

    >A server has started to complaint. Every 2 minuttes i get in
    >/var/log/messages
    >
    >Feb 16 17:36:41 svr kernel: Uhhuh. NMI received for unknown reason 21.
    >Feb 16 17:36:41 svr kernel: Dazed and confused, but trying to continue
    >Feb 16 17:36:41 svr kernel: Do you have a strange power saving mode enabled?


    Hmmh, I would have thought that was in the Linux FAQ - and while several
    HOWTOs mention it, not describe it. "NMI" is a Non-Maskable-Interrupt,
    which is literally a pin on the CPU. When IBM built the original PC in
    1981, they included what is called parity checking of RAM - an extra bit
    that was set to indicate if the number of "one" bits was odd or even. It
    was an extremely crude check to detect memory problems. IBM was of the
    philosophy that a memory error was bad. This parity check was connected
    to the NMI pin, and was used to tell the BIOS that a failure had occurred
    and processing should stop so that the hardware could be checked. Apple
    did not use this philosophy, mainly because of their non-business history.
    Also, modern memory is less likely to have problems as the 1960s to 1980s
    versions did.

    The Linux operating system has memory bounds checking so a memory error is
    less likely to crash a program (though the data could still be corrupted).
    More modern computers use either the Apple (no parity) form, or use Error
    Correcting Code memory to "handle" such memory errors as may occur. Thus,
    memory errors are less common. Still, the _hardware_ still has this NMI
    circuitry - although it's rarely used any more. You'd have to look at your
    BIOS setup to see if something has changed.

    What is happening here is that your computer is suffering from false
    signals on the NMI circuitry. This used to be a memory problem, but in
    the past few years this is usually a minor hardware error. It can ALSO
    be caused by a hardware configuration error - possibly something related
    to a power saving mode as the error message indicates.

    >Feb 16 17:38:41 svr kernel: Uhhuh. NMI received for unknown reason 21.


    Is it exactly two minutes all of the time? That's definitely some timer
    setting in the BIOS - I've no idea what.

    ># cat /proc/version Linux version 2.4.18-14
    >(bhcompile@stripples.devel.redhat.com) (gcc version 3.2 20020903 (Red
    >Hat Linux 8.0 3.2-7)) #1 Wed Sep 4 13:35:50 EDT 2002


    Oh, boy - Red Hat 8.0 with the original out-of-box kernel. Over the
    supported life, Red Hat released at least 12 kernel updates for various
    security and bug problems, and fedoralegacy.org backported another before
    all support for this old distribution ended in May 2004. You may still
    find the old errata at download.fedoralegacy.org, but you really should
    have replaced this old release a long time ago.

    >Anyone know what generates these messages?


    Depends on the motherboard, and how it's configured. What have you changed
    recently?

    Old guy

  4. Re: kernel: Uhhuh. NMI received for unknown reason 21.

    ibuprofin@painkiller.example.tld (Moe Trin) writes:

    > Hmmh, I would have thought that was in the Linux FAQ - and while several
    > HOWTOs mention it, not describe it.


    [...cut...]

    Thank you - for a good explanation!

    > >Feb 16 17:38:41 svr kernel: Uhhuh. NMI received for unknown reason 21.

    >
    > Is it exactly two minutes all of the time? That's definitely some timer
    > setting in the BIOS - I've no idea what.
    >
    > ># cat /proc/version Linux version 2.4.18-14
    > >(bhcompile@stripples.devel.redhat.com) (gcc version 3.2 20020903 (Red
    > >Hat Linux 8.0 3.2-7)) #1 Wed Sep 4 13:35:50 EDT 2002

    >
    > Oh, boy - Red Hat 8.0 with the original out-of-box kernel. Over the
    > supported life, Red Hat released at least 12 kernel updates for various
    > security and bug problems, and fedoralegacy.org backported another before
    > all support for this old distribution ended in May 2004. You may still
    > find the old errata at download.fedoralegacy.org, but you really should
    > have replaced this old release a long time ago.


    You are right. The thing is - the hardware is old as well - IBM x305.
    but then again, the whole thing just runs a secondary nameserver,
    which it - if I may add - run just fine. I only just recently looked
    into this machine, even though it complaints, it has no performance
    issues at all. Changing a record I usually tail messages to see if I
    did it right; when I reload named and noticed the NMI messages.

    > >Anyone know what generates these messages?

    >
    > Depends on the motherboard, and how it's configured. What have you changed
    > recently?


    I didn't change anything recently, others could have although i doubt
    it ... for all I know, it could have had this problem for ages. I'll
    look into the BIOS settings and run a memcheck on it. I was planning
    to upgrade it to FC6, but if the memory is faulty, I think I'll
    upgrade the hardware as well.

    Thanks for a good answer.

    --
    Søren Dideriksen

  5. Re: kernel: Uhhuh. NMI received for unknown reason 21.

    On 16 Feb 2007, in the Usenet newsgroup comp.os.linux, in article
    <87y7mx3jhd.fsf@localhost.localdomain>,
    =?utf-8?b?U8O4cmVuIERpZGVyaWtzZQ==?= =?utf-8?b?bg==?= wrote:

    >ibuprofin@painkiller.example.tld (Moe Trin) writes:


    >Thank you - for a good explanation!


    Glad to help!

    >> Oh, boy - Red Hat 8.0 with the original out-of-box kernel.


    >You are right. The thing is - the hardware is old as well - IBM x305.
    >but then again, the whole thing just runs a secondary nameserver,
    >which it - if I may add - run just fine.


    But have you updated the DNS server for the exploits (CVE-2007-0493 and
    CVE-2007-0494) reported last month?

    >I only just recently looked into this machine, even though it complaints,
    >it has no performance issues at all. Changing a record I usually tail
    >messages to see if I did it right; when I reload named and noticed the
    >NMI messages.


    It's a kernel error message - if you are compiling your own kernel for an
    Intel based processor, you can set a configuration flag (CONFIG_IGNORE_NMI)
    to ignore the error, but it's still going to waste a few CPU cycles.

    >> Depends on the motherboard, and how it's configured. What have you changed
    >> recently?

    >
    >I didn't change anything recently, others could have although i doubt
    >it ... for all I know, it could have had this problem for ages.


    ;-)

    >I'll look into the BIOS settings and run a memcheck on it.


    If the error is occurring at a regular two minute rate, it's not very
    likely to be a memory error - that's merely what the NMI was used for
    in the past. It's possible of course - you may have a cron job that runs
    every two minutes and is resident in an area of bad memory, but that's
    pushing probability pretty hard.

    >I was planning to upgrade it to FC6, but if the memory is faulty, I
    >think I'll upgrade the hardware as well.


    That would be a good solution, but remember that the Fedora project is
    meant to have a short lifetime. Support for FC4 (June 2005) ended last
    August officially, and fedoralegacy.org dropped it in November.

    Old guy

+ Reply to Thread