SCO OSR5.06 box doesn't boot anymore - SCO

This is a discussion on SCO OSR5.06 box doesn't boot anymore - SCO ; Hi, SCO Unix OSR5.06 + 5.06a supplement + SMP SuperMicro motherboard bi-PIII xeon 700 cpu 1Gb ECC ram icp-vortex gdt7523rn scsi controler external array with 5 + 1 hdd in raid-5 hdd's on channel-A scsi ctrl tape drive SLR60 on ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: SCO OSR5.06 box doesn't boot anymore

  1. SCO OSR5.06 box doesn't boot anymore

    Hi,

    SCO Unix OSR5.06 + 5.06a supplement + SMP
    SuperMicro motherboard
    bi-PIII xeon 700 cpu
    1Gb ECC ram
    icp-vortex gdt7523rn scsi controler
    external array with 5 + 1 hdd in raid-5
    hdd's on channel-A scsi ctrl
    tape drive SLR60 on channel-B scsi ctrl,

    server is running since 1994 without any problem.
    this morning, a kernel trap occured; I tried to reboot the server, but
    sometimes it worked, and sometimes the system freezed and/or kernel trap
    occured.

    sometimes, "bad address" message displayed about swap during boot
    process, or system was unable to know if there was something in
    /dev/swap or no.

    So I finally could boot, made a / filesystem backup with Lone-tar
    utility, the /u/filesystem was backuped saturday.

    So what could it be ?
    Could it be a corrupted /dev/swap filesystem ?

    What are the solutions ? (restore / filesystem from tape with Air-Bag
    utility ?).

    Thank's.

    F. STOCK

  2. Re: SCO OSR5.06 box doesn't boot anymore (missed element)



    Frédéric STOCK a écrit :
    > Hi,
    >
    > SCO Unix OSR5.06 + 5.06a supplement + SMP
    > SuperMicro motherboard
    > bi-PIII xeon 700 cpu
    > 1Gb ECC ram
    > icp-vortex gdt7523rn scsi controler
    > external array with 5 + 1 hdd in raid-5
    > hdd's on channel-A scsi ctrl
    > tape drive SLR60 on channel-B scsi ctrl,
    >
    > server is running since 1994 without any problem.
    > this morning, a kernel trap occured; I tried to reboot the server, but
    > sometimes it worked, and sometimes the system freezed and/or kernel trap
    > occured.
    >
    > sometimes, "bad address" message displayed about swap during boot
    > process, or system was unable to know if there was something in
    > /dev/swap or no.
    >
    > So I finally could boot, made a / filesystem backup with Lone-tar
    > utility, the /u/filesystem was backuped saturday.
    >
    > So what could it be ?
    > Could it be a corrupted /dev/swap filesystem ?
    >
    > What are the solutions ? (restore / filesystem from tape with Air-Bag
    > utility ?).
    >
    > Thank's.
    >
    > F. STOCK


    Hi ,

    I forgot to tell that the system hangs at

    %cpu ... unit=2 ...
    %cpuid ... unit = 2 ...
    %ftp ... unit = 2 ...

    just before when it worked
    "type root password ..."
    or memory dump image when kernel trap occured.

    Thank's.
    F. STOCK

  3. Re: SCO OSR5.06 box doesn't boot anymore (missed element)


    "Frédéric STOCK" wrote in message
    news:462cd759$0$21145$7a628cd7@news.club-internet.fr...
    >
    >
    > Frédéric STOCK a écrit :
    >> Hi,
    >>
    >> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >> SuperMicro motherboard
    >> bi-PIII xeon 700 cpu
    >> 1Gb ECC ram
    >> icp-vortex gdt7523rn scsi controler
    >> external array with 5 + 1 hdd in raid-5
    >> hdd's on channel-A scsi ctrl
    >> tape drive SLR60 on channel-B scsi ctrl,
    >>
    >> server is running since 1994 without any problem.
    >> this morning, a kernel trap occured; I tried to reboot the server, but sometimes it
    >> worked, and sometimes the system freezed and/or kernel trap occured.
    >>
    >> sometimes, "bad address" message displayed about swap during boot process, or system
    >> was unable to know if there was something in /dev/swap or no.
    >>
    >> So I finally could boot, made a / filesystem backup with Lone-tar utility, the
    >> /u/filesystem was backuped saturday.
    >>
    >> So what could it be ?
    >> Could it be a corrupted /dev/swap filesystem ?
    >>
    >> What are the solutions ? (restore / filesystem from tape with Air-Bag utility ?).
    >>
    >> Thank's.
    >>
    >> F. STOCK

    >
    > Hi ,
    >
    > I forgot to tell that the system hangs at
    >
    > %cpu ... unit=2 ...
    > %cpuid ... unit = 2 ...
    > %ftp ... unit = 2 ...
    >
    > just before when it worked
    > "type root password ..."
    > or memory dump image when kernel trap occured.
    >
    > Thank's.
    > F. STOCK


    Please tell us what the kernel trap number is. If it's a trap E, then you have
    some sort of hardware problem, possibly bad memory.

    Bob



  4. Re: SCO OSR5.06 box doesn't boot anymore (missed element)



    Bob Bailin a écrit :
    > "Frédéric STOCK" wrote in message
    > news:462cd759$0$21145$7a628cd7@news.club-internet.fr...
    >>
    >> Frédéric STOCK a écrit :
    >>> Hi,
    >>>
    >>> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >>> SuperMicro motherboard
    >>> bi-PIII xeon 700 cpu
    >>> 1Gb ECC ram
    >>> icp-vortex gdt7523rn scsi controler
    >>> external array with 5 + 1 hdd in raid-5
    >>> hdd's on channel-A scsi ctrl
    >>> tape drive SLR60 on channel-B scsi ctrl,
    >>>
    >>> server is running since 1994 without any problem.
    >>> this morning, a kernel trap occured; I tried to reboot the server, but sometimes it
    >>> worked, and sometimes the system freezed and/or kernel trap occured.
    >>>
    >>> sometimes, "bad address" message displayed about swap during boot process, or system
    >>> was unable to know if there was something in /dev/swap or no.
    >>>
    >>> So I finally could boot, made a / filesystem backup with Lone-tar utility, the
    >>> /u/filesystem was backuped saturday.
    >>>
    >>> So what could it be ?
    >>> Could it be a corrupted /dev/swap filesystem ?
    >>>
    >>> What are the solutions ? (restore / filesystem from tape with Air-Bag utility ?).
    >>>
    >>> Thank's.
    >>>
    >>> F. STOCK

    >> Hi ,
    >>
    >> I forgot to tell that the system hangs at
    >>
    >> %cpu ... unit=2 ...
    >> %cpuid ... unit = 2 ...
    >> %ftp ... unit = 2 ...
    >>
    >> just before when it worked
    >> "type root password ..."
    >> or memory dump image when kernel trap occured.
    >>
    >> Thank's.
    >> F. STOCK

    >
    > Please tell us what the kernel trap number is. If it's a trap E, then you have
    > some sort of hardware problem, possibly bad memory.
    >
    > Bob
    >
    >


    Hi Bob,

    yes the kernel trap where always type 0x0000000E

    Thank's.
    F. STOCK

  5. Re: SCO OSR5.06 box doesn't boot anymore

    In article <462cd464$0$21151$7a628cd7@news.club-internet.fr>,
    =?ISO-8859-1?Q?Fr=E9d=E9ric_STOCK?= wrote:
    >Hi,
    >
    >SCO Unix OSR5.06 + 5.06a supplement + SMP
    >SuperMicro motherboard
    >bi-PIII xeon 700 cpu
    >1Gb ECC ram
    >icp-vortex gdt7523rn scsi controler
    >external array with 5 + 1 hdd in raid-5
    >hdd's on channel-A scsi ctrl
    >tape drive SLR60 on channel-B scsi ctrl,
    >
    >server is running since 1994 without any problem.
    >this morning, a kernel trap occured; I tried to reboot the server, but
    >sometimes it worked, and sometimes the system freezed and/or kernel trap
    >occured.
    >
    >sometimes, "bad address" message displayed about swap during boot
    >process, or system was unable to know if there was something in
    >/dev/swap or no.
    >
    >So I finally could boot, made a / filesystem backup with Lone-tar
    >utility, the /u/filesystem was backuped saturday.
    >
    >So what could it be ?
    >Could it be a corrupted /dev/swap filesystem ?
    >
    >What are the solutions ? (restore / filesystem from tape with Air-Bag
    >utility ?).
    >
    >Thank's.
    >
    >F. STOCK


    Besides Bob's answer that it is probably memory because of the
    trap number, if that machine has been running since 1994, has it
    been cleaned out regularly. Dust on chips will let the heat in
    them increase and will also lead to memory errors.

    Bill
    Bill
    --
    Bill Vermillion - bv @ wjv . com

  6. Re: SCO OSR5.06 box doesn't boot anymore



    Bill Vermillion a écrit :
    > In article <462cd464$0$21151$7a628cd7@news.club-internet.fr>,
    > =?ISO-8859-1?Q?Fr=E9d=E9ric_STOCK?= wrote:
    >> Hi,
    >>
    >> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >> SuperMicro motherboard
    >> bi-PIII xeon 700 cpu
    >> 1Gb ECC ram
    >> icp-vortex gdt7523rn scsi controler
    >> external array with 5 + 1 hdd in raid-5
    >> hdd's on channel-A scsi ctrl
    >> tape drive SLR60 on channel-B scsi ctrl,
    >>
    >> server is running since 1994 without any problem.
    >> this morning, a kernel trap occured; I tried to reboot the server, but
    >> sometimes it worked, and sometimes the system freezed and/or kernel trap
    >> occured.
    >>
    >> sometimes, "bad address" message displayed about swap during boot
    >> process, or system was unable to know if there was something in
    >> /dev/swap or no.
    >>
    >> So I finally could boot, made a / filesystem backup with Lone-tar
    >> utility, the /u/filesystem was backuped saturday.
    >>
    >> So what could it be ?
    >> Could it be a corrupted /dev/swap filesystem ?
    >>
    >> What are the solutions ? (restore / filesystem from tape with Air-Bag
    >> utility ?).
    >>
    >> Thank's.
    >>
    >> F. STOCK

    >
    > Besides Bob's answer that it is probably memory because of the
    > trap number, if that machine has been running since 1994, has it
    > been cleaned out regularly. Dust on chips will let the heat in
    > them increase and will also lead to memory errors.
    >
    > Bill
    > Bill


    Hi Bill,

    well, today I'm going (if I can find) to buy 2x 512Gb ECC sdram and try
    it again.
    I'm restoring with Air-Bag the OS using another machine, for the moment
    the restore is going well (on the server it began and stopped at any
    time, systeme freezed), and I'll see.

    Thank's.

    F. STOCK

  7. Re: SCO OSR5.06 box doesn't boot anymore (missed element)

    Get a new computer.
    This is exactly what happens when a motherbard (and every other part that
    has components that age and go out of spec over time, specificaly
    capacitors) gets old.
    New ram / new hard drives / etc won't fix it, but might seem to long enough
    to waste everyones time and cause the customer to suffer more crashes and
    increased chances of filesystem corruption and data loss/corruption.

    My suggestion:

    1) Do not waste time trying to fix this one. It will seem to work sometimes
    just enough to make you think it's fixed, so you walk away, and the next day
    it does something unpredictable and then there is some file corruption, or
    worse file-system corruption, etc...

    2) Whatever else you do, _immediately_ find the most recent tape that you
    know is good and was from before the first hint of problems even if that's a
    few days or even weeks old, and move it's write-protect tab into place and
    keep it to the side from now on for reference. 2 or more such tapes is even
    better. I predict gradually worsening data and filesystem corruption from
    this point on until you get a new box, and by then you may no longer have
    any tape good enough to do a trustworthy restore from unless you take one
    out of circulation right now while you still have any.

    Remember, disable hyperthreading for 5.0.6 and get rs5.0.6a on as soon as
    possible since you will probably not be able to find a PIII motherboard that
    isn't so old it will probably be just as flaky as the one you're replacing,
    even it was never used! (Not counting some mobile/embedded/low-power/small
    form factor boards which wouldn't make great servers.)

    --
    Brian K. White brian@aljex.com http://www.myspace.com/KEYofR
    +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!


    ----- Original Message -----
    From: "Frédéric STOCK"
    Newsgroups: comp.unix.sco.misc
    To:
    Sent: Monday, April 23, 2007 12:31 PM
    Subject: Re: SCO OSR5.06 box doesn't boot anymore (missed element)


    >
    >
    > Bob Bailin a écrit :
    >> "Frédéric STOCK" wrote in message
    >> news:462cd759$0$21145$7a628cd7@news.club-internet.fr...
    >>>
    >>> Frédéric STOCK a écrit :
    >>>> Hi,
    >>>>
    >>>> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >>>> SuperMicro motherboard
    >>>> bi-PIII xeon 700 cpu
    >>>> 1Gb ECC ram
    >>>> icp-vortex gdt7523rn scsi controler
    >>>> external array with 5 + 1 hdd in raid-5
    >>>> hdd's on channel-A scsi ctrl
    >>>> tape drive SLR60 on channel-B scsi ctrl,
    >>>>
    >>>> server is running since 1994 without any problem.
    >>>> this morning, a kernel trap occured; I tried to reboot the server, but
    >>>> sometimes it worked, and sometimes the system freezed and/or kernel
    >>>> trap occured.
    >>>>
    >>>> sometimes, "bad address" message displayed about swap during boot
    >>>> process, or system was unable to know if there was something in
    >>>> /dev/swap or no.
    >>>>
    >>>> So I finally could boot, made a / filesystem backup with Lone-tar
    >>>> utility, the /u/filesystem was backuped saturday.
    >>>>
    >>>> So what could it be ?
    >>>> Could it be a corrupted /dev/swap filesystem ?
    >>>>
    >>>> What are the solutions ? (restore / filesystem from tape with Air-Bag
    >>>> utility ?).
    >>>>
    >>>> Thank's.
    >>>>
    >>>> F. STOCK
    >>> Hi ,
    >>>
    >>> I forgot to tell that the system hangs at
    >>>
    >>> %cpu ... unit=2 ...
    >>> %cpuid ... unit = 2 ...
    >>> %ftp ... unit = 2 ...
    >>>
    >>> just before when it worked
    >>> "type root password ..."
    >>> or memory dump image when kernel trap occured.
    >>>
    >>> Thank's.
    >>> F. STOCK

    >>
    >> Please tell us what the kernel trap number is. If it's a trap E, then you
    >> have
    >> some sort of hardware problem, possibly bad memory.
    >>
    >> Bob

    >
    > Hi Bob,
    >
    > yes the kernel trap where always type 0x0000000E
    >
    > Thank's.
    > F. STOCK
    >



  8. Re: SCO OSR5.06 box doesn't boot anymore


    "Frédéric STOCK" wrote in message
    news:462dac4f$0$21150$7a628cd7@news.club-internet.fr...
    >
    >
    > Bill Vermillion a écrit :
    >> In article <462cd464$0$21151$7a628cd7@news.club-internet.fr>,
    >> =?ISO-8859-1?Q?Fr=E9d=E9ric_STOCK?= wrote:
    >>> Hi,
    >>>
    >>> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >>> SuperMicro motherboard
    >>> bi-PIII xeon 700 cpu
    >>> 1Gb ECC ram
    >>> icp-vortex gdt7523rn scsi controler
    >>> external array with 5 + 1 hdd in raid-5
    >>> hdd's on channel-A scsi ctrl
    >>> tape drive SLR60 on channel-B scsi ctrl,
    >>>
    >>> server is running since 1994 without any problem.
    >>> this morning, a kernel trap occured; I tried to reboot the server, but
    >>> sometimes it worked, and sometimes the system freezed and/or kernel trap
    >>> occured.
    >>>
    >>> sometimes, "bad address" message displayed about swap during boot
    >>> process, or system was unable to know if there was something in
    >>> /dev/swap or no.
    >>>
    >>> So I finally could boot, made a / filesystem backup with Lone-tar
    >>> utility, the /u/filesystem was backuped saturday.
    >>>
    >>> So what could it be ?
    >>> Could it be a corrupted /dev/swap filesystem ?
    >>>
    >>> What are the solutions ? (restore / filesystem from tape with Air-Bag
    >>> utility ?).
    >>>
    >>> Thank's.
    >>>
    >>> F. STOCK

    >>
    >> Besides Bob's answer that it is probably memory because of the
    >> trap number, if that machine has been running since 1994, has it
    >> been cleaned out regularly. Dust on chips will let the heat in
    >> them increase and will also lead to memory errors.
    >>
    >> Bill
    >> Bill

    >
    > Hi Bill,
    >
    > well, today I'm going (if I can find) to buy 2x 512Gb ECC sdram and try it
    > again.
    > I'm restoring with Air-Bag the OS using another machine, for the moment
    > the restore is going well (on the server it began and stopped at any time,
    > systeme freezed), and I'll see.
    >
    > Thank's.
    >
    > F. STOCK


    Don't forget to obtain a copy of memtest86+ from www.memtest.org
    Let it run for an hour or more to see what turns up *before* you swap
    out the memory.

    Follow Bill's advice and blow out the dust with a can of compressed
    air. Pay special attention to the case fans, CPU fan & heatsink fins,
    and the power supply. Also examine carefully the little capacitors
    surrounding the CPU sockets (they look like tiny cans) to make sure
    than none are burst or leaking.

    If you decide you need replacement memory (and not a new motherboard),
    get a quality brand such as Kingston so that you don't have to consider
    the possibility that you've just purchased more defective memory.

    Bob


  9. Re: SCO OSR5.06 box doesn't boot anymore

    In article ,
    Bob Bailin <72027.3605@compuserve.com> wrote:
    >
    >"Frédéric STOCK" wrote in message
    >news:462dac4f$0$21150$7a628cd7@news.club-internet.fr...
    >>
    >>
    >> Bill Vermillion a écrit :
    >>> In article <462cd464$0$21151$7a628cd7@news.club-internet.fr>,
    >>> =?ISO-8859-1?Q?Fr=E9d=E9ric_STOCK?= wrote:
    >>>> Hi,
    >>>>
    >>>> SCO Unix OSR5.06 + 5.06a supplement + SMP
    >>>> SuperMicro motherboard
    >>>> bi-PIII xeon 700 cpu
    >>>> 1Gb ECC ram
    >>>> icp-vortex gdt7523rn scsi controler
    >>>> external array with 5 + 1 hdd in raid-5
    >>>> hdd's on channel-A scsi ctrl
    >>>> tape drive SLR60 on channel-B scsi ctrl,
    >>>>
    >>>> server is running since 1994 without any problem.
    >>>> this morning, a kernel trap occured; I tried to reboot the server, but
    >>>> sometimes it worked, and sometimes the system freezed and/or kernel trap
    >>>> occured.
    >>>>
    >>>> sometimes, "bad address" message displayed about swap during boot
    >>>> process, or system was unable to know if there was something in
    >>>> /dev/swap or no.
    >>>>
    >>>> So I finally could boot, made a / filesystem backup with Lone-tar
    >>>> utility, the /u/filesystem was backuped saturday.
    >>>>
    >>>> So what could it be ?
    >>>> Could it be a corrupted /dev/swap filesystem ?
    >>>>
    >>>> What are the solutions ? (restore / filesystem from tape with Air-Bag
    >>>> utility ?).
    >>>>
    >>>> Thank's.
    >>>>
    >>>> F. STOCK
    >>>
    >>> Besides Bob's answer that it is probably memory because of the
    >>> trap number, if that machine has been running since 1994, has it
    >>> been cleaned out regularly. Dust on chips will let the heat in
    >>> them increase and will also lead to memory errors.
    >>>
    >>> Bill
    >>> Bill

    >>
    >> Hi Bill,
    >>
    >> well, today I'm going (if I can find) to buy 2x 512Gb ECC sdram and try it
    >> again.
    >> I'm restoring with Air-Bag the OS using another machine, for the moment
    >> the restore is going well (on the server it began and stopped at any time,
    >> systeme freezed), and I'll see.
    >>
    >> Thank's.
    >>
    >> F. STOCK

    >
    >Don't forget to obtain a copy of memtest86+ from www.memtest.org
    >Let it run for an hour or more to see what turns up *before* you swap
    >out the memory.
    >
    >Follow Bill's advice and blow out the dust with a can of compressed
    >air. Pay special attention to the case fans, CPU fan & heatsink fins,
    >and the power supply. Also examine carefully the little capacitors
    >surrounding the CPU sockets (they look like tiny cans) to make sure
    >than none are burst or leaking.
    >
    >If you decide you need replacement memory (and not a new motherboard),
    >get a quality brand such as Kingston so that you don't have to consider
    >the possibility that you've just purchased more defective memory.
    >
    >Bob


    Just a comment on Kingston memory. This dates back a few years.
    They have a memory catalog that lists more machines than you can
    imagine.

    I needed a memory upgrade for a Cisco 25xx router, and called them.

    Tech support was like few others I've encounted, such as you'd find
    a Specialix or in the early days of Digi.

    The tech told me that also needed a ROM upgrade for the Cicso, and
    gave me the exact part number I needed to ensure the new memory
    would work.

    Did I mention here last week about a place we provide mail for
    through a HW/SW support house who handles all the preliminary calls
    so I guess I'd be second-level support :-)

    I was at the HW/SW house working on getting a computer I support
    remotely back in service after a failure.

    The client had been on a shared T1 service with 20 allocated
    phone numbers, leaving 4 64K channels for data. That meant that if
    all 20 phones were in use and all people were trying to get
    to the 'net, the 20 users were sharing 256K of bandwidth.

    So we convinced them to go DSL.

    Bell South came in to install the DSL. And the tech got totally
    lost, got the vox lines disconnected at one point, and finally
    admitted he couldn't get the DSL installed in their system
    which had a firewall - even when given the instructions over the
    phone.

    So the HW/SW house owner drove to the site and installed the DSL
    that BellSouth [abbreviated BS :-)] couldn't get running.

    Tech support seems to reach lower levels each day - if that is
    possible. My impression is that there are far too many
    'certified techs' who studied to pass the tests, but never
    understood the underlying technology that the questions were based
    upon.

    Yuk.

    Bill

    --
    Bill Vermillion - bv @ wjv . com

+ Reply to Thread