Kernel mode trap type E - SCO

This is a discussion on Kernel mode trap type E - SCO ; N. Yaakov Ziskind wrote (on Thu, Jun 19, 2008 at 12:02:26PM -0400): > Bob Bailin wrote (on Wed, Jun 18, 2008 at 10:17:33PM -0400): > > > > "N. Yaakov Ziskind" wrote in message > > news:20080618124609.A2814@egps.egps.com... > > >Bob ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 25 of 25

Thread: Kernel mode trap type E

  1. Re: Kernel mode trap type E

    N. Yaakov Ziskind wrote (on Thu, Jun 19, 2008 at 12:02:26PM -0400):
    > Bob Bailin wrote (on Wed, Jun 18, 2008 at 10:17:33PM -0400):
    > >
    > > "N. Yaakov Ziskind" wrote in message
    > > news:20080618124609.A2814@egps.egps.com...
    > > >Bob Bailin wrote (on Wed, Jun 18, 2008 at 11:41:06AM -0400):
    > > >>
    > > >>"N. Yaakov Ziskind" wrote in message
    > > >>news:20080617223816.A4333@egps.egps.com...
    > > >>> When it rains, it pours.
    > > >>>
    > > >>> End user B, running 5.0.6 on a Proliant 1600 sold during
    > > >>> the Clinton administration, had two crashes tonight in the
    > > >>> space of an hour, trap type E. He thinks he saw an NMI message.
    > > >>> After eight years, it's gotta be bad hardware, right?
    > > >>> Let it be the memory. Oh, please, let it be the memory.
    .
    > > >>>
    > > >>> Reading Tony L's article on the subject makes me wonder
    > > >>> what to replace, however. He says that it (almost certainly)
    > > >>> cannot be the memory if it is ECC. And the (not very clear)
    > > >>> Compaq docs seem to point to ECC memory being installed at
    > > >>> the factory. Bummer. But I think (hope) after-market memory was
    > > >>> installed, as the factory sticker says "128" and hw reports
    > > >>> 384, so someone (probably me) installed more. Can you mix ECC
    > > >>> and non-ECC?
    > > >>
    > > >>Answering the unanswered question above:
    > > >>No, you shouldn't mix ECC and non-ECC memory.
    > > >>And you should avoid mixing different memory brands.
    > > >
    > > >Oh. Does 'shouldn't' mean 'don't do it - the machine won't boot'
    > > >or 'don't do it - the machine may run fine for years and years
    > > >and then drop dead'?
    > > >
    > > >You can imagine which one I'm hoping for.
    > > >
    > > >This just in: another panic, after three hours. Screen message, I'm
    > > >told, had 'cpqw' (the compaq wellness driver), 'nmi' and 'memory'
    > > >in it.
    > > >
    > > >That means for sure the RAM, right?

    > >
    > > Not necessarily. An NMI (non-maskable interrupt) can be result of
    > > any fault on the motherboard not attributable to a known interrupt.
    > >
    > > Regarding mixing ECC and non-ECC: It depends on the order they
    > > are placed in the slots (banks) on the motherboard. Most motherboard
    > > assume that the SPD info read from the memory in bank 1 applies to
    > > the rest of the banks. So if the memory in bank 1 is non-ECC, then
    > > ECC is not used and all is well (insofar as you're running with
    > > non-ECC memory). Reverse the situation, and the computer will
    > > probably not boot after failing the memory test with ECC on the
    > > non-ECC memory bank 2.
    > >
    > > Again I urge you to boot up memtest86+ from floppy or CD.
    > > It tests ECC memory if your motherboard uses it and supports
    > > it.
    > >
    > > Bob

    >
    > Well, thanks to all who responded.
    > memtest86 takes too long on a production server, but i did run
    > compaq's own memory diagnostics, which told me i had i had three DIMMS,
    > and an error occurred in the middle one. switched a couple of them
    > around, and the error followed the DIMM around. took it out entirely,
    > and the error disappeared. so, i booted up the server, except for a
    > couple of disk hiccups related to the unclean shutdown, it's now been
    > running for fifteen hours straight, under the severest stress testing
    > i could devise. woo-hoo!
    >
    > but i'm now down to 256 meg. fedex dumped more ram on my desk. install
    > it, or wait?


    Well, i cheered too soon. after a couple of quiet days, server started
    locking up: no response on the network, console keyboard dead, screen
    in 'power save' mode (which apparently doesn't happen normally, X is
    not installed). Only cure is cycling the power.

    SO, client is convinced that a new server is needed. suggestions?
    i'd like to (ideally) just plop the hot-pluggable SCSI drives in it
    and boot right up. Are the SCSI controllers Dell makes compatible with
    506?

    Thanks!

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  2. Re: Kernel mode trap type E

    N. Yaakov Ziskind wrote:
    >>>>> N. Yaakov Ziskind wrote:
    >>>>>>
    >>>>>> End user B, running 5.0.6 on a Proliant 1600 sold during
    >>>>>> the Clinton administration, had two crashes tonight in the
    >>>>>> space of an hour, trap type E. He thinks he saw an NMI message.
    >>>>>> After eight years, it's gotta be bad hardware, right?
    >>>>>> Let it be the memory. Oh, please, let it be the memory.
    .

    >
    > [...]
    >
    > SO, client is convinced that a new server is needed. suggestions?
    > i'd like to (ideally) just plop the hot-pluggable SCSI drives in it
    > and boot right up.


    I think you may be out of luck. I expect you'd need an identical 1600
    for that to work. The 1600 was discontinued a long time ago (mine still
    runs fine, although I've replaced the PSU, the DAT drive and the CD-ROM
    drive)

    The 1600 I have has hot plug drives that are considerably larger
    (physically) than any recent Proliants. I have wondered whether it is
    feasible to remove the drives from the 1600 carriers and attach them to
    a different carrier. There's the question of SCSI version, speed, width,
    connectors too. I don't think I'd bother.

    I recall some recipes for transplanting a SCO system to new and
    different hardware. They used the supertars to backup on old and restore
    on new. I'd look into that. I'd be stunned if you didn't at least need
    some new drivers though.


    > Are the SCSI controllers Dell makes compatible with 506?


    At least some were :-) I think I'd find out the details and check the
    CHWP. http://wdb1.caldera.com/chwp/owa/hch_search_form



    --
    RGB

  3. Re: Kernel mode trap type E

    RedGrittyBrick wrote:
    > N. Yaakov Ziskind wrote:
    >>
    >> SO, client is convinced that a new server is needed. suggestions?
    >> i'd like to (ideally) just plop the hot-pluggable SCSI drives in it
    >> and boot right up.

    >
    > I recall some recipes for transplanting a SCO system to new and
    > different hardware. They used the supertars to backup on old and restore
    > on new. I'd look into that. I'd be stunned if you didn't at least need
    > some new drivers though.
    >


    http://aplawrence.com/Unixart/supertarxfer.html

    --
    RGB

  4. Re: Kernel mode trap type E

    On Mon, Jun 23, 2008, N. Yaakov Ziskind wrote:
    ....
    >Well, i cheered too soon. after a couple of quiet days, server started
    >locking up: no response on the network, console keyboard dead, screen
    >in 'power save' mode (which apparently doesn't happen normally, X is
    >not installed). Only cure is cycling the power.
    >
    >SO, client is convinced that a new server is needed. suggestions?
    >i'd like to (ideally) just plop the hot-pluggable SCSI drives in it
    >and boot right up. Are the SCSI controllers Dell makes compatible with
    >506?


    I would seriously consider moving this to a VMware virtual
    machine running under Linux. We are running SCO 5.0.6a on a
    dual Opteron200 Supermicro box with 3ware controller and multiple
    250GB SATA drives on CentOS 5.1 x86_64. The SCO VM has 20GB
    assigned, which could be pretty much anything you want depending
    mostly on the Openserver size limits. It's running on VMware
    server, the free version.

    The OpenServer VM is considerably faster than it was on the old
    machine it replaced which was running on multiple SCSI drives.

    The current crop of hard drives are so much larger than the ones
    available when your Proliant box was built, it doesn't generally
    make a lot of sense to stick with the old drives.

    Linux support for 3ware is excellent, and you have several RAID
    options available.

    Bill
    --
    INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
    URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
    Voice: (206) 236-1676 Mercer Island, WA 98040-0820
    Fax: (206) 232-9186

    When a place gets crowded enough to require ID's, social collapse is
    not far away. It is time to go elsewhere. The best thing about space
    travel is that it made it possible to go elsewhere. -- Robert Heinlein

  5. Re: Kernel mode trap type E

    N. Yaakov Ziskind wrote:
    > N. Yaakov Ziskind wrote (on Thu, Jun 19, 2008 at 12:02:26PM -0400):
    >> Bob Bailin wrote (on Wed, Jun 18, 2008 at 10:17:33PM -0400):
    >>> "N. Yaakov Ziskind" wrote in message
    >>> news:20080618124609.A2814@egps.egps.com...
    >>>> Bob Bailin wrote (on Wed, Jun 18, 2008 at 11:41:06AM -0400):
    >>>>> "N. Yaakov Ziskind" wrote in message
    >>>>> news:20080617223816.A4333@egps.egps.com...
    >>>>>> When it rains, it pours.
    >>>>>>
    >>>>>> End user B, running 5.0.6 on a Proliant 1600 sold during
    >>>>>> the Clinton administration, had two crashes tonight in the
    >>>>>> space of an hour, trap type E. He thinks he saw an NMI message.
    >>>>>> After eight years, it's gotta be bad hardware, right?
    >>>>>> Let it be the memory. Oh, please, let it be the memory.
    .
    >>>>>>
    >>>>>> Reading Tony L's article on the subject makes me wonder
    >>>>>> what to replace, however. He says that it (almost certainly)
    >>>>>> cannot be the memory if it is ECC. And the (not very clear)
    >>>>>> Compaq docs seem to point to ECC memory being installed at
    >>>>>> the factory. Bummer. But I think (hope) after-market memory was
    >>>>>> installed, as the factory sticker says "128" and hw reports
    >>>>>> 384, so someone (probably me) installed more. Can you mix ECC
    >>>>>> and non-ECC?
    >>>>> Answering the unanswered question above:
    >>>>> No, you shouldn't mix ECC and non-ECC memory.
    >>>>> And you should avoid mixing different memory brands.
    >>>> Oh. Does 'shouldn't' mean 'don't do it - the machine won't boot'
    >>>> or 'don't do it - the machine may run fine for years and years
    >>>> and then drop dead'?
    >>>>
    >>>> You can imagine which one I'm hoping for.
    >>>>
    >>>> This just in: another panic, after three hours. Screen message, I'm
    >>>> told, had 'cpqw' (the compaq wellness driver), 'nmi' and 'memory'
    >>>> in it.
    >>>>
    >>>> That means for sure the RAM, right?
    >>> Not necessarily. An NMI (non-maskable interrupt) can be result of
    >>> any fault on the motherboard not attributable to a known interrupt.
    >>>
    >>> Regarding mixing ECC and non-ECC: It depends on the order they
    >>> are placed in the slots (banks) on the motherboard. Most motherboard
    >>> assume that the SPD info read from the memory in bank 1 applies to
    >>> the rest of the banks. So if the memory in bank 1 is non-ECC, then
    >>> ECC is not used and all is well (insofar as you're running with
    >>> non-ECC memory). Reverse the situation, and the computer will
    >>> probably not boot after failing the memory test with ECC on the
    >>> non-ECC memory bank 2.
    >>>
    >>> Again I urge you to boot up memtest86+ from floppy or CD.
    >>> It tests ECC memory if your motherboard uses it and supports
    >>> it.
    >>>
    >>> Bob

    >> Well, thanks to all who responded.
    >> memtest86 takes too long on a production server, but i did run
    >> compaq's own memory diagnostics, which told me i had i had three DIMMS,
    >> and an error occurred in the middle one. switched a couple of them
    >> around, and the error followed the DIMM around. took it out entirely,
    >> and the error disappeared. so, i booted up the server, except for a
    >> couple of disk hiccups related to the unclean shutdown, it's now been
    >> running for fifteen hours straight, under the severest stress testing
    >> i could devise. woo-hoo!
    >>
    >> but i'm now down to 256 meg. fedex dumped more ram on my desk. install
    >> it, or wait?

    >
    > Well, i cheered too soon. after a couple of quiet days, server started
    > locking up: no response on the network, console keyboard dead, screen
    > in 'power save' mode (which apparently doesn't happen normally, X is
    > not installed). Only cure is cycling the power.
    >
    > SO, client is convinced that a new server is needed. suggestions?
    > i'd like to (ideally) just plop the hot-pluggable SCSI drives in it
    > and boot right up. Are the SCSI controllers Dell makes compatible with
    > 506?


    I have a client running SCO 6.0 on a Dell 2600 he picked up at one
    of the surplus dealers here in town. Getting Openserver 6.0 to run
    on it is another long a painful story.

    But what makes this somewhat relative to your situation is that the
    box failed over a weekend when someone turned off the building air conditioning
    and the server died from heat stroke. When I could not get the machine
    back up, he just ordered an identical 2600 sans hard disks from
    www.stikc.com (local to us so he just dropped by and picked it up the
    same day) and we moved the RAID-1 hard disks from the old box to the new
    box, booted and ran fsck and it was up and running in 20 minutes.

    See this link for refurbished Proliant 1600 for $189.95

    http://www.pcandserverstore.com/serv...00-Dual/Detail

    >
    > Thanks!
    >


    --
    Steve Fabac
    S.M. Fabac & Associates
    816/765-1670

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2