My unix server unexpectadly shut down - Unix

This is a discussion on My unix server unexpectadly shut down - Unix ; Just this morning, I was in the middle of working on the database on our unix server and then, Blip, it was gone. After a little bit of panicing I managed get the kvm working on the server itself and ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: My unix server unexpectadly shut down

  1. My unix server unexpectadly shut down

    Just this morning, I was in the middle of working on the database on
    our unix server and then, Blip, it was gone.
    After a little bit of panicing I managed get the kvm working on the
    server itself and found that it had simply restarted itself. It halted
    on boot up because of some error message about not being able to find
    a hard disk.

    So my question is, since i'm somewhat of a unix noob and the usual IT
    guy is away:
    Does it keep any kind of logs of this kind of thing?

    I'd love to know
    a) Why it shut down in the first place
    b) What the error message was about when it was booting.

    It's SCO 3.2 if it would be at all platform specific.

    Cheers

  2. Re: My unix server unexpectadly shut down

    James wrote:
    > Just this morning, I was in the middle of working on the database on
    > our unix server and then, Blip, it was gone.
    > After a little bit of panicing I managed get the kvm working on the
    > server itself and found that it had simply restarted itself. It halted
    > on boot up because of some error message about not being able to find
    > a hard disk.
    >
    > So my question is, since i'm somewhat of a unix noob and the usual IT
    > guy is away:
    > Does it keep any kind of logs of this kind of thing?
    >
    > I'd love to know
    > a) Why it shut down in the first place
    > b) What the error message was about when it was booting.
    >
    > It's SCO 3.2 if it would be at all platform specific.
    >
    > Cheers


    If this happens after a long uptime, it is likely a hardware defect,
    mostly a bad disk.
    Also your boot message indicates a bad disk.
    A "kernel panic" happens if there is a severe I/O problem where
    the kernel has no recovery option.
    Readable messages are then written to the console and to the disk,
    also a binary kernel memory dump is written to the disk for later
    analysis. And finally the kernel does a reboot.
    But of course any disk writes can fail if the disk is damaged.
    If you can bring your disk back to life (e.g. by a power-cycle),
    so you can login, you can run command

    more /var/adm/syslog

    But please be aware that disk faults will become more and more
    frequent, until finally the disk is permanently dead.


    --
    echo imhcea\.lophc.tcs.hmo |
    sed 's2\(....\)\(.\{5\}\)2\2\122;s1\(.\)\(.\)1\2\11g;1 s;\.;::;2'

  3. Re: My unix server unexpectadly shut down

    James wrote:
    > Just this morning, I was in the middle of working on the database on
    > our unix server and then, Blip, it was gone.
    > After a little bit of panicing I managed get the kvm working on the
    > server itself and found that it had simply restarted itself. It halted
    > on boot up because of some error message about not being able to find
    > a hard disk.
    >
    > So my question is, since i'm somewhat of a unix noob and the usual IT
    > guy is away:
    > Does it keep any kind of logs of this kind of thing?
    >
    > I'd love to know
    > a) Why it shut down in the first place
    > b) What the error message was about when it was booting.
    >
    > It's SCO 3.2 if it would be at all platform specific.
    >
    > Cheers


    The version of Unix is very important, especially when it comes to
    hardware problems as the appropriate remedies vary wildly amongst Unix
    versions.

    It has 'lost' the hard disk, which could be due to

    That is probably SCO Unix 3.2.4.2 - although it could be an even older
    version like 3.2.4.0 or a variation like ODT.

    I haven't dealt with that *very* old version of SCO for 12 years, and
    would have to dig up some support docs for it.

    Try reposting the problem at the comp.unix.sco.misc NG, which
    specializes in SCO Unix.

    --
    ----------------------------------------------------
    Pat Welch, UBB Computer Services, a WCS Affiliate
    SCO Authorized Partner
    Microlite BackupEdge Certified Reseller
    Unix/Linux/Windows/Hardware Sales/Support
    (209) 745-1401 Cell: (209) 251-9120
    E-mail: patubb@inreach.com
    ----------------------------------------------------

  4. Re: My unix server unexpectadly shut down

    Pat Welch wrote:
    > James wrote:
    >> Just this morning, I was in the middle of working on the database on
    >> our unix server and then, Blip, it was gone.
    >> After a little bit of panicing I managed get the kvm working on the
    >> server itself and found that it had simply restarted itself. It halted
    >> on boot up because of some error message about not being able to find
    >> a hard disk.
    >>
    >> So my question is, since i'm somewhat of a unix noob and the usual IT
    >> guy is away:
    >> Does it keep any kind of logs of this kind of thing?
    >>
    >> I'd love to know
    >> a) Why it shut down in the first place
    >> b) What the error message was about when it was booting.
    >>
    >> It's SCO 3.2 if it would be at all platform specific.
    >>
    >> Cheers

    >
    > The version of Unix is very important, especially when it comes to
    > hardware problems as the appropriate remedies vary wildly amongst Unix
    > versions.
    >
    > It has 'lost' the hard disk, which could be due to
    >
    > That is probably SCO Unix 3.2.4.2 - although it could be an even older
    > version like 3.2.4.0 or a variation like ODT.
    >
    > I haven't dealt with that *very* old version of SCO for 12 years, and
    > would have to dig up some support docs for it.
    >
    > Try reposting the problem at the comp.unix.sco.misc NG, which
    > specializes in SCO Unix.
    >

    whoops:
    It has 'lost' the hard disk, which could be due to a hard disk crash, or
    the SCSI/IDE controller has failed, or (if RAID), the logical disk setup
    was lost.

    I hope you have a good recent backup.


    --
    ----------------------------------------------------
    Pat Welch, UBB Computer Services, a WCS Affiliate
    SCO Authorized Partner
    Microlite BackupEdge Certified Reseller
    Unix/Linux/Windows/Hardware Sales/Support
    (209) 745-1401 Cell: (209) 251-9120
    E-mail: patubb@inreach.com
    ----------------------------------------------------

  5. Re: My unix server unexpectadly shut down

    Cheers for the replies.

    The OS is SCO 3.2.0 if i remember right.

    The server has successfully booted, and it appears that no data has
    been lost.
    we have 4 disks in a RAID setup (i'm not sure which setup) - so 1 disk
    could have failed and it can keep going.

    I looked at the file /var/adm/syslog
    but unfortnately it is 2.5GB.
    How would I go about reading through a file this size?
    If i run it through 'more' - well i wouldnt like to guess how many
    pages it is, but it will take me a long long time to track through to
    the end.
    Unless there are some options in more that I am missing?

    I'm going to repost this in comp.unix.sco.misc and see what they say.
    But any more ideas are welcomed here.

    Are there any HDD diagnostic tools in *nix? to check the status of a
    disk, see what disks are installed and see how raid is configured?
    I'm sure there are, but i wouldnt know where to look

    I'm just gettnig paranoid now that it might fail completely.
    We do make daily backups, but its a server we really cannot afford to
    lose.

  6. Re: My unix server unexpectadly shut down

    James wrote:
    > Cheers for the replies.
    >
    > The OS is SCO 3.2.0 if i remember right.
    >
    > The server has successfully booted, and it appears that no data has
    > been lost.
    > we have 4 disks in a RAID setup (i'm not sure which setup) - so 1 disk
    > could have failed and it can keep going.
    >
    > I looked at the file /var/adm/syslog
    > but unfortnately it is 2.5GB.
    > How would I go about reading through a file this size?
    > If i run it through 'more' - well i wouldnt like to guess how many
    > pages it is, but it will take me a long long time to track through to
    > the end.
    > Unless there are some options in more that I am missing?
    >
    > I'm going to repost this in comp.unix.sco.misc and see what they say.
    > But any more ideas are welcomed here.
    >
    > Are there any HDD diagnostic tools in *nix? to check the status of a
    > disk, see what disks are installed and see how raid is configured?
    > I'm sure there are, but i wouldnt know where to look
    >
    > I'm just gettnig paranoid now that it might fail completely.
    > We do make daily backups, but its a server we really cannot afford to
    > lose.


    uname -X (capital X) will tell you the exact version and other info.

    HOW big is syslog???

    That might have caused the problem if it's really over 2 GB - that
    version of SCO has a 2GB limit on files, and if syslog is over that,
    then you might see PANIC's and sudden shutdowns.

    3rd party drivers (your RAID driver perhaps) trying to write info to
    syslog would be at special risk to cause PANIC's if they can't write to
    syslog.

    copy (NOT move) the syslog file:

    cp /var/adm/syslog /var/adm/syslog.old

    erase the file in place (zero out) by simply entering the '>' symbol and
    the name:

    > /var/adm/syslog


    Do the same to the /var/adm/messages file if it's also at or close to
    the 2 GB limit.

    You can examine the old syslog file with various tools, tail or split
    come to mind. man tail or man split for the options.

    cd /var/adm
    tail syslog.old

    - displays the last x amount of lines in the file (might be 10 lines - I
    forget the default on that very old version of SCO)

    tail -100 syslog.old

    - would display the last 100 lines. You can redirect that into another
    file so you can vi the info if you'd like:

    tail -100 syslog.old > syslog.small

    Note there's an old bug in tail - you won't get more than around 200
    lines from the end of the file no matter what the line argument is (?
    the exact limit)

    split allows you to chunk the file into files with smaller segments if
    you need to go back further than ~200 lines from the end.

    split -l 100000 /var/adm/syslog.old

    - would split the file into files with 100,000 lines each, auto naming
    them, like

    syslog.old.aa syslog.old.ab syslog.old.ac ..... syslog.old.zz

    Time to upgrade the SCO to SCO Openserver 5.0.7 at least, if not
    Openserver 6.0, or migrate to Linux, before that machine dies completely.

    That version of SCO MAY not support any chip newer that Pentium 3, so if
    you want to keep the same OS (and you have the original install
    diskettes or tape AND the licenses (COLA's)) you have to get a Pentium 3
    based server. Supermicro still has some last time I checked a year or so
    ago.

    If you need help with migration or other issues, contact me privately.

    --
    ----------------------------------------------------
    Pat Welch, UBB Computer Services, a WCS Affiliate
    SCO Authorized Partner
    Microlite BackupEdge Certified Reseller
    Unix/Linux/Windows/Hardware Sales/Support
    (209) 745-1401 Cell: (209) 251-9120
    E-mail: patubb@inreach.com
    ----------------------------------------------------

+ Reply to Thread