Kernel mode trap type E - SCO

This is a discussion on Kernel mode trap type E - SCO ; When it rains, it pours. End user B, running 5.0.6 on a Proliant 1600 sold during the Clinton administration, had two crashes tonight in the space of an hour, trap type E. He thinks he saw an NMI message. After ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: Kernel mode trap type E

  1. Kernel mode trap type E

    When it rains, it pours.

    End user B, running 5.0.6 on a Proliant 1600 sold during
    the Clinton administration, had two crashes tonight in the
    space of an hour, trap type E. He thinks he saw an NMI message.
    After eight years, it's gotta be bad hardware, right?
    Let it be the memory. Oh, please, let it be the memory.
    .

    Reading Tony L's article on the subject makes me wonder
    what to replace, however. He says that it (almost certainly)
    cannot be the memory if it is ECC. And the (not very clear)
    Compaq docs seem to point to ECC memory being installed at
    the factory. Bummer. But I think (hope) after-market memory was
    installed, as the factory sticker says "128" and hw reports
    384, so someone (probably me) installed more. Can you mix ECC
    and non-ECC?

    Anyway, the Proliant articles I've found tend to deal with new
    installs, which is not my problem - this server has been rock
    solid for ages.

    So, anyone out there care to volunteer some troubleshooting tips?

    Thanks!

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  2. Re: Kernel mode trap type E

    On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    >When it rains, it pours.
    >
    >End user B, running 5.0.6 on a Proliant 1600 sold during
    >the Clinton administration, had two crashes tonight in the
    >space of an hour, trap type E. He thinks he saw an NMI message.
    >After eight years, it's gotta be bad hardware, right?
    >Let it be the memory. Oh, please, let it be the memory.
    .
    >

    ....
    >So, anyone out there care to volunteer some troubleshooting tips?


    The trap ``E'' probably indicates a hardware error. I would check all
    fans, remove and replace RAM and other cards to clean possible corrosion on
    the contacts. Clean as much dust out of the innards of the machine as
    possible.

    Bill
    --
    INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
    URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
    Voice: (206) 236-1676 Mercer Island, WA 98040-0820
    Fax: (206) 232-9186

    We believe...that a mugger will kill you in the half-second it takes to
    draw from the holster, but won't harm you while you dial the police on your
    cell phone, talk to the dispatcher and wait half an hour for officers to
    arrive. -- Gun-Control Net-work Credo

  3. Re: Kernel mode trap type E

    Bill Campbell wrote (on Tue, Jun 17, 2008 at 08:08:56PM -0700):
    > On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    > >When it rains, it pours.
    > >
    > >End user B, running 5.0.6 on a Proliant 1600 sold during
    > >the Clinton administration, had two crashes tonight in the
    > >space of an hour, trap type E. He thinks he saw an NMI message.
    > >After eight years, it's gotta be bad hardware, right?
    > >Let it be the memory. Oh, please, let it be the memory.
    .
    > >

    > ...
    > >So, anyone out there care to volunteer some troubleshooting tips?

    >
    > The trap ``E'' probably indicates a hardware error. I would check all
    > fans, remove and replace RAM and other cards to clean possible corrosion on
    > the contacts. Clean as much dust out of the innards of the machine as
    > possible.
    >
    > Bill


    Not a bad idea - thank you very much! The Compaq has an INCREDIBLE
    amount of dust collected. And, it's *dirty* dust, unlike the clean
    dust I usually see in other computers. Ick.

    Five hours of intense disk activity so far and no crash. Intermittent
    problem. Good news, bad news.

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  4. Re: Kernel mode trap type E

    On 18 Jun, 04:20, "N. Yaakov Ziskind" wrote:
    >
    > Five hours of intense disk activity so far and no crash. Intermittent
    > problem. Good news, bad news.


    I think the hardware is trying in its own way to gently push you to
    upgrade to a supported version of OpenServer on newer hardware.
    I would recommend that you look to upgrade to OpenServer 6
    and at the same time, upgrade the hardware.

    John

  5. Re: Kernel mode trap type E

    On Tue, 2008-06-17 at 20:08 -0700, Bill Campbell wrote:
    > On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    > >When it rains, it pours.
    > >
    > >End user B, running 5.0.6 on a Proliant 1600 sold during
    > >the Clinton administration, had two crashes tonight in the
    > >space of an hour, trap type E. He thinks he saw an NMI message.
    > >After eight years, it's gotta be bad hardware, right?
    > >Let it be the memory. Oh, please, let it be the memory.
    .
    > >

    > ...
    > >So, anyone out there care to volunteer some troubleshooting tips?

    >
    > The trap ``E'' probably indicates a hardware error. I would check all
    > fans, remove and replace RAM and other cards to clean possible corrosion on
    > the contacts. Clean as much dust out of the innards of the machine as
    > possible.
    >
    > Bill,


    I was never so surprised when I saw a hardware veteran take out a pencil
    with an eraser on the end, and clean the tabs of the memory of a fallen
    machine. To my surprise the problem was fixed. I have been a 'tab
    eraser' ever since, and have fixed more problems with a pencil than I
    would have expected. It always surprises me Bill's advice is
    certainly the first place to start.

    Greg

  6. Re: Kernel mode trap type E

    Surprised about an erasure to clean contacts?
    Then you'll love this:

    I take my keyboards and other electronic boards and
    run them thru the dishwasher (NO HEAT DRY). Let them dry
    for 2 days and now I got like new hardware again.
    Never messed up one piece of hardware with this method.

    Carl



  7. Re: Kernel mode trap type E

    ceaton@dpcsystems.com wrote (on Wed, Jun 18, 2008 at 01:18:59PM +0000):
    > Surprised about an erasure to clean contacts?
    > Then you'll love this:
    >
    > I take my keyboards and other electronic boards and
    > run them thru the dishwasher (NO HEAT DRY). Let them dry
    > for 2 days and now I got like new hardware again.
    > Never messed up one piece of hardware with this method.
    >
    > Carl


    You're right, I do love it.

    I'm *not* putting a fifty pound Compaq server in the dishwasher,
    however. :-)

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  8. Re: Kernel mode trap type E

    On Wed, Jun 18, 2008, Gregory P. Ennis wrote:
    >On Tue, 2008-06-17 at 20:08 -0700, Bill Campbell wrote:
    >> On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    >> >When it rains, it pours.
    >> >
    >> >End user B, running 5.0.6 on a Proliant 1600 sold during
    >> >the Clinton administration, had two crashes tonight in the
    >> >space of an hour, trap type E. He thinks he saw an NMI message.
    >> >After eight years, it's gotta be bad hardware, right?
    >> >Let it be the memory. Oh, please, let it be the memory.
    .
    >> >

    >> ...
    >> >So, anyone out there care to volunteer some troubleshooting tips?

    >>
    >> The trap ``E'' probably indicates a hardware error. I would check all
    >> fans, remove and replace RAM and other cards to clean possible corrosion on
    >> the contacts. Clean as much dust out of the innards of the machine as
    >> possible.
    >>
    >> Bill,

    >
    >I was never so surprised when I saw a hardware veteran take out a pencil
    >with an eraser on the end, and clean the tabs of the memory of a fallen
    >machine. To my surprise the problem was fixed. I have been a 'tab
    >eraser' ever since, and have fixed more problems with a pencil than I
    >would have expected. It always surprises me Bill's advice is
    >certainly the first place to start.


    I think the first time I saw this procedure was over 40 years ago
    when the Bendix field engineer was doing that with boards in a
    Bendix G-20 main frame.

    A standard cure for electrical issues on the old 6-volt VW Bugs
    was to pull the panel behind the dash, then remove and replace
    each of the little spade connectors to clean the contacts. Doing
    this every 6 months or so cured a multitude of ills (a 12-volt
    battery with a center tap for 6-volts really cured starting
    problems as well :-).

    Dust also has some ``interesting'' electrical properties, none of
    the particularly useful in a computer.

    Bill
    --
    INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
    URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
    Voice: (206) 236-1676 Mercer Island, WA 98040-0820
    Fax: (206) 232-9186

    We contend that for a nation to try to tax itself into prosperity is like a
    man standing in a bucket and trying to lift himself up by the handle.
    -- Winston Churchill

  9. Re: Kernel mode trap type E


    "N. Yaakov Ziskind" wrote in message
    news:20080617223816.A4333@egps.egps.com...
    > When it rains, it pours.
    >
    > End user B, running 5.0.6 on a Proliant 1600 sold during
    > the Clinton administration, had two crashes tonight in the
    > space of an hour, trap type E. He thinks he saw an NMI message.
    > After eight years, it's gotta be bad hardware, right?
    > Let it be the memory. Oh, please, let it be the memory.
    .
    >
    > Reading Tony L's article on the subject makes me wonder
    > what to replace, however. He says that it (almost certainly)
    > cannot be the memory if it is ECC. And the (not very clear)
    > Compaq docs seem to point to ECC memory being installed at
    > the factory. Bummer. But I think (hope) after-market memory was
    > installed, as the factory sticker says "128" and hw reports
    > 384, so someone (probably me) installed more. Can you mix ECC
    > and non-ECC?


    Answering the unanswered question above:
    No, you shouldn't mix ECC and non-ECC memory.
    And you should avoid mixing different memory brands.

    >
    > Anyway, the Proliant articles I've found tend to deal with new
    > installs, which is not my problem - this server has been rock
    > solid for ages.
    >
    > So, anyone out there care to volunteer some troubleshooting tips?
    >
    > Thanks!



    After cleaning out all the dust (don't forget the power supply and the
    CPU heatsink) and reseating the memory and boards, grab a copy
    of memtest86+ and let it run overnight to fully exercise the memory.

    While you have the case open to clean it out, carefully check the
    numerous capacitors surrounding the CPU for leaks and bulges.
    This was a problem with mb's from the early 2000's so it probably
    won't affect your customer's, but it's worthwhile to check.

    Bob



  10. Re: Kernel mode trap type E

    Bob Bailin wrote (on Wed, Jun 18, 2008 at 11:41:06AM -0400):
    >
    > "N. Yaakov Ziskind" wrote in message
    > news:20080617223816.A4333@egps.egps.com...
    > > When it rains, it pours.
    > >
    > > End user B, running 5.0.6 on a Proliant 1600 sold during
    > > the Clinton administration, had two crashes tonight in the
    > > space of an hour, trap type E. He thinks he saw an NMI message.
    > > After eight years, it's gotta be bad hardware, right?
    > > Let it be the memory. Oh, please, let it be the memory.
    .
    > >
    > > Reading Tony L's article on the subject makes me wonder
    > > what to replace, however. He says that it (almost certainly)
    > > cannot be the memory if it is ECC. And the (not very clear)
    > > Compaq docs seem to point to ECC memory being installed at
    > > the factory. Bummer. But I think (hope) after-market memory was
    > > installed, as the factory sticker says "128" and hw reports
    > > 384, so someone (probably me) installed more. Can you mix ECC
    > > and non-ECC?

    >
    > Answering the unanswered question above:
    > No, you shouldn't mix ECC and non-ECC memory.
    > And you should avoid mixing different memory brands.


    Oh. Does 'shouldn't' mean 'don't do it - the machine won't boot'
    or 'don't do it - the machine may run fine for years and years
    and then drop dead'?

    You can imagine which one I'm hoping for.

    This just in: another panic, after three hours. Screen message, I'm
    told, had 'cpqw' (the compaq wellness driver), 'nmi' and 'memory'
    in it.

    That means for sure the RAM, right?

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  11. Autoboot (was: Kernel mode trap type E)

    Can anyone hazard a guess why autoboot doesn't work
    on this 506 server after a panic? Server reboots, and sits
    at the Boot: prompt. Here's /etc/default/boot:

    DEFBOOTSTR=hd(40)unix swap=hd(41) dump=hd(41) root=hd(42)
    TIMEOUT=10
    AUTOBOOT=YES
    FSCKFIX=YES
    MULTIUSER=YES
    PANICBOOT=YES
    MAPKEY=YES
    SERIAL8=NO
    SLEEPTIME=0
    BOOTMNT=RO

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  12. Re: Kernel mode trap type E

    bonixsas@gmail.com wrote (on Wed, Jun 18, 2008 at 02:28:33AM -0700):
    > On 18 Jun, 04:20, "N. Yaakov Ziskind" wrote:
    > >
    > > Five hours of intense disk activity so far and no crash. Intermittent
    > > problem. Good news, bad news.

    >
    > I think the hardware is trying in its own way to gently push you to
    > upgrade to a supported version of OpenServer on newer hardware.
    > I would recommend that you look to upgrade to OpenServer 6
    > and at the same time, upgrade the hardware.


    and the client will tell me to go take a hike, while spending another
    hundred grand with the windows guy. unix is being phased out ...

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  13. Re: Autoboot (was: Kernel mode trap type E)

    N. Yaakov Ziskind wrote (on Wed, Jun 18, 2008 at 12:48:49PM -0400):
    > Can anyone hazard a guess why autoboot doesn't work
    > on this 506 server after a panic? Server reboots, and sits
    > at the Boot: prompt. Here's /etc/default/boot:
    >
    > DEFBOOTSTR=hd(40)unix swap=hd(41) dump=hd(41) root=hd(42)
    > TIMEOUT=10
    > AUTOBOOT=YES
    > FSCKFIX=YES
    > MULTIUSER=YES
    > PANICBOOT=YES
    > MAPKEY=YES
    > SERIAL8=NO
    > SLEEPTIME=0
    > BOOTMNT=RO


    i answered my own question, i think. from the boot man page:

    Two distinct copies of /etc/default/boot exist; in the
    root filesystem and on the boot filesystem, /stand. (This
    permits /etc/default/boot to be read by programs that run
    from the boot filesystem before the kernel has loaded.)
    [...] /etc/default/boot is copied to /stand/etc/default/boot
    whenever the system makes an orderly shutdown, so that
    any changes are loaded at the next reboot.

    so, i tried 'btmnt -w' and copied it myself. we'll find out at the
    next crash if i was successful. :-(

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  14. Re: Kernel mode trap type E


    "N. Yaakov Ziskind" wrote in message
    news:20080618124609.A2814@egps.egps.com...
    > Bob Bailin wrote (on Wed, Jun 18, 2008 at 11:41:06AM -0400):
    >>
    >> "N. Yaakov Ziskind" wrote in message
    >> news:20080617223816.A4333@egps.egps.com...
    >> > When it rains, it pours.
    >> >
    >> > End user B, running 5.0.6 on a Proliant 1600 sold during
    >> > the Clinton administration, had two crashes tonight in the
    >> > space of an hour, trap type E. He thinks he saw an NMI message.
    >> > After eight years, it's gotta be bad hardware, right?
    >> > Let it be the memory. Oh, please, let it be the memory.
    .
    >> >
    >> > Reading Tony L's article on the subject makes me wonder
    >> > what to replace, however. He says that it (almost certainly)
    >> > cannot be the memory if it is ECC. And the (not very clear)
    >> > Compaq docs seem to point to ECC memory being installed at
    >> > the factory. Bummer. But I think (hope) after-market memory was
    >> > installed, as the factory sticker says "128" and hw reports
    >> > 384, so someone (probably me) installed more. Can you mix ECC
    >> > and non-ECC?

    >>
    >> Answering the unanswered question above:
    >> No, you shouldn't mix ECC and non-ECC memory.
    >> And you should avoid mixing different memory brands.

    >
    > Oh. Does 'shouldn't' mean 'don't do it - the machine won't boot'
    > or 'don't do it - the machine may run fine for years and years
    > and then drop dead'?
    >
    > You can imagine which one I'm hoping for.
    >
    > This just in: another panic, after three hours. Screen message, I'm
    > told, had 'cpqw' (the compaq wellness driver), 'nmi' and 'memory'
    > in it.
    >
    > That means for sure the RAM, right?


    Not necessarily. An NMI (non-maskable interrupt) can be result of
    any fault on the motherboard not attributable to a known interrupt.

    Regarding mixing ECC and non-ECC: It depends on the order they
    are placed in the slots (banks) on the motherboard. Most motherboard
    assume that the SPD info read from the memory in bank 1 applies to
    the rest of the banks. So if the memory in bank 1 is non-ECC, then
    ECC is not used and all is well (insofar as you're running with
    non-ECC memory). Reverse the situation, and the computer will
    probably not boot after failing the memory test with ECC on the
    non-ECC memory bank 2.

    Again I urge you to boot up memtest86+ from floppy or CD.
    It tests ECC memory if your motherboard uses it and supports
    it.

    Bob


  15. Re: Kernel mode trap type E

    Bob Bailin wrote (on Wed, Jun 18, 2008 at 10:17:33PM -0400):
    >
    > "N. Yaakov Ziskind" wrote in message
    > news:20080618124609.A2814@egps.egps.com...
    > >Bob Bailin wrote (on Wed, Jun 18, 2008 at 11:41:06AM -0400):
    > >>
    > >>"N. Yaakov Ziskind" wrote in message
    > >>news:20080617223816.A4333@egps.egps.com...
    > >>> When it rains, it pours.
    > >>>
    > >>> End user B, running 5.0.6 on a Proliant 1600 sold during
    > >>> the Clinton administration, had two crashes tonight in the
    > >>> space of an hour, trap type E. He thinks he saw an NMI message.
    > >>> After eight years, it's gotta be bad hardware, right?
    > >>> Let it be the memory. Oh, please, let it be the memory.
    .
    > >>>
    > >>> Reading Tony L's article on the subject makes me wonder
    > >>> what to replace, however. He says that it (almost certainly)
    > >>> cannot be the memory if it is ECC. And the (not very clear)
    > >>> Compaq docs seem to point to ECC memory being installed at
    > >>> the factory. Bummer. But I think (hope) after-market memory was
    > >>> installed, as the factory sticker says "128" and hw reports
    > >>> 384, so someone (probably me) installed more. Can you mix ECC
    > >>> and non-ECC?
    > >>
    > >>Answering the unanswered question above:
    > >>No, you shouldn't mix ECC and non-ECC memory.
    > >>And you should avoid mixing different memory brands.

    > >
    > >Oh. Does 'shouldn't' mean 'don't do it - the machine won't boot'
    > >or 'don't do it - the machine may run fine for years and years
    > >and then drop dead'?
    > >
    > >You can imagine which one I'm hoping for.
    > >
    > >This just in: another panic, after three hours. Screen message, I'm
    > >told, had 'cpqw' (the compaq wellness driver), 'nmi' and 'memory'
    > >in it.
    > >
    > >That means for sure the RAM, right?

    >
    > Not necessarily. An NMI (non-maskable interrupt) can be result of
    > any fault on the motherboard not attributable to a known interrupt.
    >
    > Regarding mixing ECC and non-ECC: It depends on the order they
    > are placed in the slots (banks) on the motherboard. Most motherboard
    > assume that the SPD info read from the memory in bank 1 applies to
    > the rest of the banks. So if the memory in bank 1 is non-ECC, then
    > ECC is not used and all is well (insofar as you're running with
    > non-ECC memory). Reverse the situation, and the computer will
    > probably not boot after failing the memory test with ECC on the
    > non-ECC memory bank 2.
    >
    > Again I urge you to boot up memtest86+ from floppy or CD.
    > It tests ECC memory if your motherboard uses it and supports
    > it.
    >
    > Bob


    Well, thanks to all who responded.
    memtest86 takes too long on a production server, but i did run
    compaq's own memory diagnostics, which told me i had i had three DIMMS,
    and an error occurred in the middle one. switched a couple of them
    around, and the error followed the DIMM around. took it out entirely,
    and the error disappeared. so, i booted up the server, except for a
    couple of disk hiccups related to the unclean shutdown, it's now been
    running for fifteen hours straight, under the severest stress testing
    i could devise. woo-hoo!

    but i'm now down to 256 meg. fedex dumped more ram on my desk. install
    it, or wait?

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  16. Re: Kernel mode trap type E

    N. Yaakov Ziskind wrote:
    > bonixsas@gmail.com wrote (on Wed, Jun 18, 2008 at 02:28:33AM -0700):
    >> On 18 Jun, 04:20, "N. Yaakov Ziskind" wrote:
    >>> Five hours of intense disk activity so far and no crash. Intermittent
    >>> problem. Good news, bad news.

    >> I think the hardware is trying in its own way to gently push you to
    >> upgrade to a supported version of OpenServer on newer hardware.
    >> I would recommend that you look to upgrade to OpenServer 6
    >> and at the same time, upgrade the hardware.

    >
    > and the client will tell me to go take a hike, while spending another
    > hundred grand with the windows guy. unix is being phased out ...
    >


    You've my sympathies: the stripping of UNIX and Linux setups to the bone and
    then comparing them to an overpriced Windows resources is one of the banes of
    my work.

    Can you virtualize what you need? This gets you away fro ma lot of the
    hardware integration problems with SCO.

  17. Re: Kernel mode trap type E

    Bill Campbell wrote:
    > On Wed, Jun 18, 2008, Gregory P. Ennis wrote:
    >> On Tue, 2008-06-17 at 20:08 -0700, Bill Campbell wrote:
    >>> On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    >>>> When it rains, it pours.
    >>>>
    >>>> End user B, running 5.0.6 on a Proliant 1600 sold during
    >>>> the Clinton administration, had two crashes tonight in the
    >>>> space of an hour, trap type E. He thinks he saw an NMI message.
    >>>> After eight years, it's gotta be bad hardware, right?
    >>>> Let it be the memory. Oh, please, let it be the memory.
    .
    >>>>
    >>> ...
    >>>> So, anyone out there care to volunteer some troubleshooting tips?
    >>> The trap ``E'' probably indicates a hardware error. I would check all
    >>> fans, remove and replace RAM and other cards to clean possible corrosion on
    >>> the contacts. Clean as much dust out of the innards of the machine as
    >>> possible.
    >>>
    >>> Bill,

    >> I was never so surprised when I saw a hardware veteran take out a pencil
    >> with an eraser on the end, and clean the tabs of the memory of a fallen
    >> machine. To my surprise the problem was fixed. I have been a 'tab
    >> eraser' ever since, and have fixed more problems with a pencil than I
    >> would have expected. It always surprises me Bill's advice is
    >> certainly the first place to start.

    >
    > I think the first time I saw this procedure was over 40 years ago
    > when the Bendix field engineer was doing that with boards in a
    > Bendix G-20 main frame.
    >
    > A standard cure for electrical issues on the old 6-volt VW Bugs
    > was to pull the panel behind the dash, then remove and replace
    > each of the little spade connectors to clean the contacts. Doing
    > this every 6 months or so cured a multitude of ills (a 12-volt
    > battery with a center tap for 6-volts really cured starting
    > problems as well :-).
    >
    > Dust also has some ``interesting'' electrical properties, none of
    > the particularly useful in a computer.
    >
    > Bill


    There's an interesting reason for this eraser trick. The gold contacts are
    soft, and it gets warm inside computers, and as they soften you get slick
    surfaces that do *not* mesh tightly enough together on contacts that are
    relaxed enough for you to insert the card gracefully. Rubbing them with an
    eraser roughs up the gold, so that the rough surfaces form better contacts.

  18. Re: Kernel mode trap type E

    Bill Campbell wrote:
    > On Wed, Jun 18, 2008, Gregory P. Ennis wrote:
    >> On Tue, 2008-06-17 at 20:08 -0700, Bill Campbell wrote:
    >>> On Tue, Jun 17, 2008, N. Yaakov Ziskind wrote:
    >>>> When it rains, it pours.
    >>>>
    >>>> End user B, running 5.0.6 on a Proliant 1600 sold during
    >>>> the Clinton administration, had two crashes tonight in the
    >>>> space of an hour, trap type E. He thinks he saw an NMI message.
    >>>> After eight years, it's gotta be bad hardware, right?
    >>>> Let it be the memory. Oh, please, let it be the memory.
    .
    >>>>
    >>> ...
    >>>> So, anyone out there care to volunteer some troubleshooting tips?
    >>> The trap ``E'' probably indicates a hardware error. I would check all
    >>> fans, remove and replace RAM and other cards to clean possible corrosion on
    >>> the contacts. Clean as much dust out of the innards of the machine as
    >>> possible.
    >>>
    >>> Bill,

    >> I was never so surprised when I saw a hardware veteran take out a pencil
    >> with an eraser on the end, and clean the tabs of the memory of a fallen
    >> machine. To my surprise the problem was fixed. I have been a 'tab
    >> eraser' ever since, and have fixed more problems with a pencil than I
    >> would have expected. It always surprises me Bill's advice is
    >> certainly the first place to start.

    >
    > I think the first time I saw this procedure was over 40 years ago
    > when the Bendix field engineer was doing that with boards in a
    > Bendix G-20 main frame.
    >
    > A standard cure for electrical issues on the old 6-volt VW Bugs
    > was to pull the panel behind the dash, then remove and replace
    > each of the little spade connectors to clean the contacts. Doing
    > this every 6 months or so cured a multitude of ills (a 12-volt
    > battery with a center tap for 6-volts really cured starting
    > problems as well :-).
    >
    > Dust also has some ``interesting'' electrical properties, none of
    > the particularly useful in a computer.
    >
    > Bill


    There's an interesting reason for this eraser trick. The gold contacts are
    soft, and it gets warm inside computers, and as they soften you get slick
    surfaces that do *not* mesh tightly enough together on contacts that are
    relaxed enough for you to insert the card gracefully. Rubbing them with an
    eraser roughs up the gold, so that the rough surfaces form better contacts.

  19. Re: Kernel mode trap type E

    On Fri, Jun 20, 2008, Nico Kadel-Garcia wrote:
    >N. Yaakov Ziskind wrote:
    >> bonixsas@gmail.com wrote (on Wed, Jun 18, 2008 at 02:28:33AM -0700):
    >>> On 18 Jun, 04:20, "N. Yaakov Ziskind" wrote:
    >>>> Five hours of intense disk activity so far and no crash. Intermittent
    >>>> problem. Good news, bad news.
    >>> I think the hardware is trying in its own way to gently push you to
    >>> upgrade to a supported version of OpenServer on newer hardware.
    >>> I would recommend that you look to upgrade to OpenServer 6
    >>> and at the same time, upgrade the hardware.

    >>
    >> and the client will tell me to go take a hike, while spending another
    >> hundred grand with the windows guy. unix is being phased out ...
    >>

    >
    >You've my sympathies: the stripping of UNIX and Linux setups to the bone and
    >then comparing them to an overpriced Windows resources is one of the banes of
    >my work.
    >

    I have a customer who just migrated a large FilePro application
    which has run on SCO Unix for the better part of 20 years to a
    Windows ``solution'' (which doesn't do nearly as much as the FP).

    The Windows system is acting like the normal Windows Virus,
    crashing, requiring reboots, etc. (and the software vendors do
    not have a clue about basic networking security or practices).

    One of the owners of the company asked their IR guy how many
    crashes they had had on their SCO box over the last 15 years, to
    which the answer was ``None''.

    We have moved our own OSR 5.0.6a system to a VMware server VM,
    and it is doing very nicely, and is condiderably faster than the
    same system running on native iron. It's on a Supermicro box
    with dual Opteron 2000s 4GB RAM, with the primary OS being CentOS
    5.1 x86_64.

    Bill
    --
    INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
    URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
    Voice: (206) 236-1676 Mercer Island, WA 98040-0820
    Fax: (206) 232-9186

    More laws, less justice. -- Marcus Tulius Ciceroca (42 BD)

  20. Re: Electrical contacts, was Re: Kernel mode trap type E

    At the start of the micro age one of the manufacturers used tin plated
    molex pins as the bus connector, what appeared to be nickel in the
    edge connectors, and the best of the repair tools was a "pink pearl"
    eraser. For some, it was a daily process of pulling the cards and
    running the eraser down the contact side. Builders of later machines
    went to gold in both locations.

    About the time of the 486 there were issues when there was gold in the
    memory slots and tin on the memory, or reversed, and the cleaning
    issue hit again.

    I've always been hesitant to use anything more abrasive than a cotton
    cloth to do gold contacts. Worst case has been to rub the contacts
    against my pants leg.

+ Reply to Thread
Page 1 of 2 1 2 LastLast