smp failure after power supply death? - BSD

This is a discussion on smp failure after power supply death? - BSD ; I came back from holidays to discover that my FreeBSD 6.2-STABLE(SMP) (as of about August) system was "off", irrespective of how I flipped the power switch. Power supply death, I supposed, and so I replaced the supply. Now the box ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: smp failure after power supply death?

  1. smp failure after power supply death?

    I came back from holidays to discover that my FreeBSD 6.2-STABLE(SMP) (as
    of about August) system was "off", irrespective of how I flipped the
    power switch. Power supply death, I supposed, and so I replaced the
    supply.

    Now the box "goes", but only if I tell the boot loader to boot in "safe"
    mode, or if SMP mode is disabled in /boot/loader.conf. If I let it try
    to start the second core (the processor is an Athlon64 X2 4400) then it
    immediately panics so:

    SMP: AP CPU #1 launched!
    panic: failed to create swap_zone

    Since that panic comes from the swap pager code, which works fine with
    one processor, that doesn't seem terribly related to the presence or
    otherwise of the second core, so I suspect that the specific panic is a
    red herring.

    How likely is it that a dead power supply (possibly killed by an
    electrical storm) could maim just one of the cores on a dual core CPU?

    Any other reasons why there could be a problem launching the second core?

    Am I up for a chip replacement?
    A motherboard replacement?
    Any other thoughts?

    Cheers,

    Andrew

  2. Re: smp failure after power supply death?

    Begin <5ppq8rFsapa2U1@mid.individual.net>
    On 12 Nov 2007 01:57:47 GMT,
    Andrew Reilly wrote:
    > I came back from holidays to discover that my FreeBSD 6.2-STABLE(SMP) (as
    > of about August) system was "off", irrespective of how I flipped the
    > power switch. Power supply death, I supposed, and so I replaced the
    > supply.
    >
    > Now the box "goes", but only if I tell the boot loader to boot in "safe"
    > mode, or if SMP mode is disabled in /boot/loader.conf. If I let it try
    > to start the second core (the processor is an Athlon64 X2 4400) then it
    > immediately panics so:
    >
    > SMP: AP CPU #1 launched!
    > panic: failed to create swap_zone


    You might want to ask about this on the freebsd-hardware list. There's a
    somewhat greater chance people on there understand the exact meaning of
    this message.

    All I can really suggest is to run memory tests and swap parts with
    known-good parts to see what part or what combination of parts triggers
    the problem. Might even be the new PSU is underspecified or marginal.


    > Since that panic comes from the swap pager code, which works fine with
    > one processor, that doesn't seem terribly related to the presence or
    > otherwise of the second core, so I suspect that the specific panic is a
    > red herring.


    Maybe, maybe not. Broken hardware can cause improbable things to happen.
    Most software assumes the hardware is no more broken than designed
    (often it can hardly do anything else) and that can easily make hardware
    failure related errors look... strange.


    > How likely is it that a dead power supply (possibly killed by an
    > electrical storm) could maim just one of the cores on a dual core CPU?


    Likely? Personally, I would expect both to die. But that doesn't mean it
    can't happen. It could also, conceivably, be because some part of the
    system board fried, so it could also be that both cores function fine in
    a different board. Or some memory area fried, or what-have-you.

    Since we are talking about relatively cheap peecee hardware, just
    replacing the entire thing might be the simplest option, depending.


    --
    j p d (at) d s b (dot) t u d e l f t (dot) n l .
    This message was originally posted on Usenet in plain text.
    Any other representation, additions, or changes do not have my
    consent and may be a violation of international copyright law.

  3. Re: smp failure after power supply death?

    On Nov 11, 8:57 pm, Andrew Reilly users.org> wrote:
    > How likely is it that a deadpower supply(possibly killed by an
    > electrical storm) could maim just one of the cores on a dual core CPU?


    Not likely. But this is only an answer from speculation. To know
    better, well, what could have been the path from AC mains, through
    motherboard and CPU, to get to earth ground. View possible outgoing
    conductors such as table top, mouse wire touching baseboard heater,
    etc.

    Generally, CPUs are rarely damaged because of so many layers into
    the CPU (including its own power supply) and because there is rarely
    an outgoing path to earth. Electricity without both an incoming and
    outgoing path does not cause damage.

    It is just as possible that the new power supply is detective.
    Defective supplies can still boot a system. Without voltage
    measurements with a multimeter when computer is trying to use both
    cores, then you really don't even know the supply as good. Especially
    important in your case would be measurements on the yellow wire (as
    well as red and orange). Yellow wire voltage must exceed 11.7 volts.
    A defective power supply, for example, would only measure 11.5 as both
    cores draw more power.


  4. Re: smp failure after power supply death?

    On Wed, 14 Nov 2007 19:44:41 UTC, w_tom wrote:

    > On Nov 11, 8:57 pm, Andrew Reilly > users.org> wrote:
    > > How likely is it that a deadpower supply(possibly killed by an
    > > electrical storm) could maim just one of the cores on a dual core CPU?

    >
    > Not likely. But this is only an answer from speculation. To know


    You mentioned 'electrical storm'. Now you've attracted 'w_tom'! Help!
    --
    Bob Eager
    UNIX since v6..
    http://tinyurl.com/2xqr6h


  5. Re: smp failure after power supply death?

    In article <5ppq8rFsapa2U1@mid.individual.net>,
    Andrew Reilly wrote:
    >I came back from holidays to discover that my FreeBSD 6.2-STABLE(SMP) (as
    >of about August) system was "off", irrespective of how I flipped the
    >power switch. Power supply death, I supposed, and so I replaced the
    >supply.
    >
    >Now the box "goes", but only if I tell the boot loader to boot in "safe"
    >mode, or if SMP mode is disabled in /boot/loader.conf. If I let it try
    >to start the second core (the processor is an Athlon64 X2 4400) then it
    >immediately panics so:
    >
    >SMP: AP CPU #1 launched!
    >panic: failed to create swap_zone


    >Since that panic comes from the swap pager code, which works fine with
    >one processor, that doesn't seem terribly related to the presence or
    >otherwise of the second core, so I suspect that the specific panic is a
    >red herring.


    >How likely is it that a dead power supply (possibly killed by an
    >electrical storm) could maim just one of the cores on a dual core
    >CPU?


    It could be one of the glue chips and not the CPU. In fact it
    could be an electrical device on the motherboard.

    >Any other reasons why there could be a problem launching the
    >second core?


    >Am I up for a chip replacement?


    Perhaps.

    >A motherboard replacement?


    That could be required as with all the surface mounted devices and
    complex chips you just can do component level repair anymore.

    I've seen some truly strange events. I saw one that got hit by
    lightning and the main problem was that a platter on the hard drive
    had a little upward protusion that looked like a small volcano.

    Electricity has a mind of it's own when it comes to killing certain
    components.

    Bill
    >Any other thoughts?
    >
    >Cheers,
    >
    >Andrew



    --
    Bill Vermillion - bv @ wjv . com

  6. Re: smp failure after power supply death?

    On Wed, 14 Nov 2007 22:41:27 +0000, Bill Vermillion wrote:

    >>How likely is it that a dead power supply (possibly killed by an
    >>electrical storm) could maim just one of the cores on a dual core CPU?

    >
    > It could be one of the glue chips and not the CPU. In fact it could be
    > an electrical device on the motherboard.


    Thanks to all for the suggestions.

    I'm still unsure whether or not the hardware is damaged, but more (all?)
    of it is working now than it was when I first posted...

    On the off-chance that the forced power down had done nothing more than
    corrupt some hard disk, I used the single working CPU to cvsup and
    rebuild world and kernel (it now claims to be 6.3-PRERELEASE). Reboot
    and both cores claim to work. Yay! Maybe riding in the boot of the car
    for a while re-seated some componenets. Who knows.

    So a buildworld worked without flaw or error, I figured I'd do a port-
    upgrade to get at the GNOME 2.20 goodness. Well, that finished OK too,
    (eventually), but the bonobo-activation service hangs (running) and the
    desktop doesn't come all the way up. Aargh. Probably some glitched
    configuration foo. I'm busy reading mailing lists and web pages for
    clues...

    Cheers,

    Andrew

+ Reply to Thread