Openserver 6.0 wc -l /usr/adm/syslog reboots system - SCO

This is a discussion on Openserver 6.0 wc -l /usr/adm/syslog reboots system - SCO ; Just returned from a new client with problems. I was called when the Backup Edge Verify pass was causing the machine to reboot. When I arrived, I checked the system and found it running OS 6.0 with MP1 and some ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: Openserver 6.0 wc -l /usr/adm/syslog reboots system

  1. Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Just returned from a new client with problems.
    I was called when the Backup Edge Verify pass
    was causing the machine to reboot.

    When I arrived, I checked the system and found it
    running OS 6.0 with MP1 and some package add
    patch.

    While surveying the system I used less /usr/adm/syslog
    to view the system log file. Pressing shift G to
    go to the bottom of the file took a long time.
    So long that I gave up and pressed del to interrupt.

    When I executed l -l /usr/adm/syslog it showed it
    at 108+ megabytes. I tried wc -l /usr/adm/syslog and
    within .5 to 1 second the screen went blank and the
    system was rebooting.

    I brought it up in single user mode and ran fsck -ofull
    several times with no unusual problems reported.

    In single user mode the wc -l /usr/adm/syslog would
    still cause the system to reboot.

    Funny thing: With the system back up in single user mode,
    running cat /usr/adm/syslog > /dev/null worked without
    problems.

    And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
    But wc -l /usr/adm/syslog (or /usr/adm/messages) will
    trigger the reboot.

    No panic messages, the monitor just goes black and
    then the boot up screen is displayed.

    Any suggestions on what to check first? I plan to install
    a new IDE hard drive tomorrow and use recovery media
    on the Backup Edge overnight backup to restore the system
    to the new drive.

    TIA

  2. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    The total size of the file might be more relevant than the number of
    entries. 6.0 has special large file utilities for handling files >
    1Gb which it sounds like this might be.

    Prepend /u95/bin to your PATH. I would then just blow away syslog:
    /u95/bin/cp /dev/null /usr/adm/syslog
    Restart syslogd and then read it to see what all the errors are.

    --Ray Robert

    Steve M. Fabac, Jr. wrote:
    > Just returned from a new client with problems.
    > I was called when the Backup Edge Verify pass
    > was causing the machine to reboot.
    >
    > When I arrived, I checked the system and found it
    > running OS 6.0 with MP1 and some package add
    > patch.
    >
    > While surveying the system I used less /usr/adm/syslog
    > to view the system log file. Pressing shift G to
    > go to the bottom of the file took a long time.
    > So long that I gave up and pressed del to interrupt.
    >
    > When I executed l -l /usr/adm/syslog it showed it
    > at 108+ megabytes. I tried wc -l /usr/adm/syslog and
    > within .5 to 1 second the screen went blank and the
    > system was rebooting.
    >
    > I brought it up in single user mode and ran fsck -ofull
    > several times with no unusual problems reported.
    >
    > In single user mode the wc -l /usr/adm/syslog would
    > still cause the system to reboot.
    >
    > Funny thing: With the system back up in single user mode,
    > running cat /usr/adm/syslog > /dev/null worked without
    > problems.
    >
    > And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
    > But wc -l /usr/adm/syslog (or /usr/adm/messages) will
    > trigger the reboot.
    >
    > No panic messages, the monitor just goes black and
    > then the boot up screen is displayed.
    >
    > Any suggestions on what to check first? I plan to install
    > a new IDE hard drive tomorrow and use recovery media
    > on the Backup Edge overnight backup to restore the system
    > to the new drive.
    >
    > TIA



  3. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    | The total size of the file might be more relevant than the number of
    | entries. 6.0 has special large file utilities for handling files >
    | 1Gb which it sounds like this might be.
    |
    | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    | /u95/bin/cp /dev/null /usr/adm/syslog
    | Restart syslogd and then read it to see what all the errors are.

    Oh? I thought that the binaries for "large" files had to to with files
    exceeding 2Gb, not 1GB.

    --
    JP
    ==> http://www.frappr.com/cusm <==

  4. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Jean-Pierre Radley wrote:
    > ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    > | The total size of the file might be more relevant than the number of
    > | entries. 6.0 has special large file utilities for handling files >
    > | 1Gb which it sounds like this might be.
    > |
    > | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    > | /u95/bin/cp /dev/null /usr/adm/syslog
    > | Restart syslogd and then read it to see what all the errors are.
    >
    > Oh? I thought that the binaries for "large" files had to to with files
    > exceeding 2Gb, not 1GB.
    >


    Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.

    I copied syslog to syslog.old and zeroed out syslog in maintenance mode.

    wc -l syslog.old caused immediate reboot.

    cat syslog.old | wc -l returned 1.5M+ lines.

    examining syslog.old shows no recorded messages about any problems on the
    hard disk.

    And attempting badtrk returns a message that the command is removed
    from OS6 and the kernel is in charge of automatically managing bad
    tracks.

    If this were a SCSI system, I would use the SCSI controller to perform
    a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    tool to check the disk.

    Replacing the disk is a shotgun approach to diagnosing the problem.

    Anyone have any suggestions as to why a system command would
    cause a reboot?

  5. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Steve M. Fabac, Jr. wrote (on Thu, Mar 08, 2007 at 03:27:31AM +0000):
    > Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >
    > I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >
    > wc -l syslog.old caused immediate reboot.
    >
    > cat syslog.old | wc -l returned 1.5M+ lines.
    >
    > examining syslog.old shows no recorded messages about any problems on the
    > hard disk.
    >
    > And attempting badtrk returns a message that the command is removed
    > from OS6 and the kernel is in charge of automatically managing bad
    > tracks.
    >
    > If this were a SCSI system, I would use the SCSI controller to perform
    > a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    > tool to check the disk.
    >
    > Replacing the disk is a shotgun approach to diagnosing the problem.
    >
    > Anyone have any suggestions as to why a system command would
    > cause a reboot?


    Perhaps you could a) copy it (both cp, say, and dd and/or tar) to a
    new file to see if the new file had the identical behavior; or
    b) split it into pieces to narrow it down to one piece; or c)
    create a brand new identical sized file to see if the file size
    has anything to do with it.

    --
    _________________________________________
    Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
    Attorney and Counselor-at-Law http://ziskind.us
    Economic Group Pension Services http://egps.com
    Actuaries and Employee Benefit Consultants

  6. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system


    ----- Original Message -----
    From: "Steve M. Fabac, Jr."
    Newsgroups: comp.unix.sco.misc
    To:
    Sent: Wednesday, March 07, 2007 10:27 PM
    Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system


    > Jean-Pierre Radley wrote:
    >> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    >> | The total size of the file might be more relevant than the number of
    >> | entries. 6.0 has special large file utilities for handling files >
    >> | 1Gb which it sounds like this might be.
    >> | | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    >> | /u95/bin/cp /dev/null /usr/adm/syslog
    >> | Restart syslogd and then read it to see what all the errors are.
    >>
    >> Oh? I thought that the binaries for "large" files had to to with files
    >> exceeding 2Gb, not 1GB.
    >>

    >
    > Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >
    > I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >
    > wc -l syslog.old caused immediate reboot.
    >
    > cat syslog.old | wc -l returned 1.5M+ lines.
    >
    > examining syslog.old shows no recorded messages about any problems on the
    > hard disk.
    >
    > And attempting badtrk returns a message that the command is removed
    > from OS6 and the kernel is in charge of automatically managing bad
    > tracks.
    >
    > If this were a SCSI system, I would use the SCSI controller to perform
    > a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    > tool to check the disk.
    >
    > Replacing the disk is a shotgun approach to diagnosing the problem.
    >
    > Anyone have any suggestions as to why a system command would
    > cause a reboot?


    First: I would NOT reduce the size of or delete the original bad file!
    curently syslog.old
    If there is some magic bad spot in the fs or on the disk, then by renaming
    the original syslog to syslog.old you have luckily still not freed that bad
    spot for some other file to chance upon with potentially much uglier
    results. In any further testing I'd read or rename that file at will, but
    not zero it out or reduce it's size, or delete it, unless and until you are
    sure you know what caused the crash and are sure it won't happen again.

    On to the good stuff

    I'm curious if wc -l on a copy (not just renamed) of the bad file also
    reboots.

    ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
    also in that case before even getting to wc, the copy might not match the
    original for that matter.

    Then I'd try feeding wc equivalent but different data to see if it's simply
    wc can't handle more than x lines or x bytes etc...
    maybe uuencode the bad file (to a new file) and try wc on that, just because
    it's an easy way to make an even bigger file, that has all different data
    than the original, but that is all text and all short lines.

    I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
    have an equivalent of the gnutools & gwxlibs packages for osr5?)
    it might also be in a skunkware package sh-utils

    Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
    etc?

    Does the stock wc reading the known bad file still crash the system if you
    are not root when you try it?

    I'd probably also try chopping up the bad file into pieces to see if it's a
    particular size (in lines or in bytes) that wc can't handle, or if there is
    some spot somewhere in the file that causes the crash by touching that spot.
    Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
    chunk to see if one crashes, That points to some funky data that wc chokes
    on, which would be pretty strange since your cat test shows it didn't choke
    on the same data via stdin, but, *shrug* it's gotta be something.
    maybe another loop that reassembles the chunks building a new file one at a
    time and wc-l the new growing file each time to see if it crashes at some
    point. that would suggest a magic size barrier.

    If I were really curious I'd maybe try to set up a sort of "high speed
    camera" that collects info using sar or vmstat repeatedly in a tight loop in
    the background just before trying the wc command, and then after the reboot
    see if the memory usage or process stack or some other resource went nuts
    just before the end.
    Of course the tight loop itself will hit the system pretty hard, so maybe a
    control run first that loops x thousand times but doesn't run wc.
    It would generate a lot of data very fast so I'd make it into one big
    command or script that starts the loop in the background and then runs the
    wc-of-death immediately after.
    Perhaps a sleep .5 in between.

    And of course, this should probably have been first,
    truss/trace/whatever-unixware-has.

    Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/
    +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!


  7. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Steve M. Fabac, Jr. wrote:

    > Just returned from a new client with problems.
    > I was called when the Backup Edge Verify pass
    > was causing the machine to reboot.
    >
    > When I arrived, I checked the system and found it
    > running OS 6.0 with MP1 and some package add
    > patch.
    >
    > While surveying the system I used less /usr/adm/syslog
    > to view the system log file. Pressing shift G to
    > go to the bottom of the file took a long time.
    > So long that I gave up and pressed del to interrupt.
    >
    > When I executed l -l /usr/adm/syslog it showed it
    > at 108+ megabytes. I tried wc -l /usr/adm/syslog and
    > within .5 to 1 second the screen went blank and the
    > system was rebooting.
    >
    > I brought it up in single user mode and ran fsck -ofull
    > several times with no unusual problems reported.
    >
    > In single user mode the wc -l /usr/adm/syslog would
    > still cause the system to reboot.
    >
    > Funny thing: With the system back up in single user mode,
    > running cat /usr/adm/syslog > /dev/null worked without
    > problems.
    >
    > And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
    > But wc -l /usr/adm/syslog (or /usr/adm/messages) will
    > trigger the reboot.
    >
    > No panic messages, the monitor just goes black and
    > then the boot up screen is displayed.
    >
    > Any suggestions on what to check first? I plan to install
    > a new IDE hard drive tomorrow and use recovery media
    > on the Backup Edge overnight backup to restore the system
    > to the new drive.


    You say that both /usr/adm/syslog and /usr/adm/messages trigger the
    problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
    | wc -l` doesn't. Peculiar.

    I don't have an OSR6 system here, but a quick `truss` test on OSR506
    shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
    mostly OSR5, so this is probably the same.

    See if:

    dd if=/usr/adm/syslog of=/dev/null bs=16k

    triggers a crash. If so, it has nothing to do with `wc`, only with a
    certain manner of reading that file.

    Then see if:

    dd if=/unix of=/dev/null bs=16k

    triggers the same crash. (/unix may not be big enough. Poke around the
    system for other 100MB-ish files to try.)

    I'm thinking it's either a filesystem bug or a hardware problem with the
    machine's memory. Seems unlikely to be a problem with the specific
    files, when two different files cause the same thing.

    Run memtest86 (almost all Linux CDs have it as a boot option).

    >Bela<


  8. Openserver 6.0 wc -l /usr/adm/syslog reboots system

    On Mar 8, 3:38 am, Bela Lubkin wrote:
    > Steve M. Fabac, Jr. wrote:
    >
    >
    >
    > > Just returned from a new client with problems.

    Hi Steve,

    Don't you love it? This situations always seem to crop up
    with the NEW clients.

    > > I was called when the Backup Edge Verify pass
    > > was causing the machine to reboot.


    Maybe check the verify_master.log to see what was being
    verified at the time of the crash. That might give you a similarly
    sized file to manipulate in your testing, as others have suggested.
    (Or maybe it barfed on ./usr/adm/syslog)

    A note of thanks to Bela, for dropping in and continuing to help
    us all out! Hope all is going well at VMware.

    Regards,
    Dan Martin



    >
    > > When I arrived, I checked the system and found it
    > > running OS 6.0 with MP1 and some package add
    > > patch.

    >
    > > While surveying the system I used less /usr/adm/syslog
    > > to view the system log file. Pressing shift G to
    > > go to the bottom of the file took a long time.
    > > So long that I gave up and pressed del to interrupt.

    >
    > > When I executed l -l /usr/adm/syslog it showed it
    > > at 108+ megabytes. I tried wc -l /usr/adm/syslog and
    > > within .5 to 1 second the screen went blank and the
    > > system was rebooting.

    >
    > > I brought it up in single user mode and ran fsck -ofull
    > > several times with no unusual problems reported.

    >
    > > In single user mode the wc -l /usr/adm/syslog would
    > > still cause the system to reboot.

    >
    > > Funny thing: With the system back up in single user mode,
    > > running cat /usr/adm/syslog > /dev/null worked without
    > > problems.

    >
    > > And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
    > > But wc -l /usr/adm/syslog (or /usr/adm/messages) will
    > > trigger the reboot.

    >
    > > No panic messages, the monitor just goes black and
    > > then the boot up screen is displayed.

    >
    > > Any suggestions on what to check first? I plan to install
    > > a new IDE hard drive tomorrow and use recovery media
    > > on the Backup Edge overnight backup to restore the system
    > > to the new drive.

    >
    > You say that both /usr/adm/syslog and /usr/adm/messages trigger the
    > problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
    > | wc -l` doesn't. Peculiar.
    >
    > I don't have an OSR6 system here, but a quick `truss` test on OSR506
    > shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
    > mostly OSR5, so this is probably the same.
    >
    > See if:
    >
    > dd if=/usr/adm/syslog of=/dev/null bs=16k
    >
    > triggers a crash. If so, it has nothing to do with `wc`, only with a
    > certain manner of reading that file.
    >
    > Then see if:
    >
    > dd if=/unix of=/dev/null bs=16k
    >
    > triggers the same crash. (/unix may not be big enough. Poke around the
    > system for other 100MB-ish files to try.)
    >
    > I'm thinking it's either a filesystem bug or a hardware problem with the
    > machine's memory. Seems unlikely to be a problem with the specific
    > files, when two different files cause the same thing.
    >
    > Run memtest86 (almost all Linux CDs have it as a boot option).
    >
    > >Bela<




  9. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Nachman Yaakov Ziskind wrote:
    > Steve M. Fabac, Jr. wrote (on Thu, Mar 08, 2007 at 03:27:31AM +0000):
    >> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >>
    >> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >>
    >> wc -l syslog.old caused immediate reboot.
    >>
    >> cat syslog.old | wc -l returned 1.5M+ lines.
    >>
    >> examining syslog.old shows no recorded messages about any problems on the
    >> hard disk.
    >>
    >> And attempting badtrk returns a message that the command is removed
    >> from OS6 and the kernel is in charge of automatically managing bad
    >> tracks.
    >>
    >> If this were a SCSI system, I would use the SCSI controller to perform
    >> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    >> tool to check the disk.
    >>
    >> Replacing the disk is a shotgun approach to diagnosing the problem.
    >>
    >> Anyone have any suggestions as to why a system command would
    >> cause a reboot?

    >
    > Perhaps you could a) copy it (both cp, say, and dd and/or tar) to a
    > new file to see if the new file had the identical behavior; or
    > b) split it into pieces to narrow it down to one piece; or c)
    > create a brand new identical sized file to see if the file size
    > has anything to do with it.
    >


    I did: cd /usr/adm ; cp syslog syslog.old; > syslog

    wc -l syslog.old reboots the system.

    wc -l messages reboots the system.

    cat syslog.old | wc -l returns 1.5M+ lines

    cat messages | wc -l works (don't remember number of lines)

  10. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Brian K. White wrote:
    > ----- Original Message -----
    > From: "Steve M. Fabac, Jr."
    > Newsgroups: comp.unix.sco.misc
    > To:
    > Sent: Wednesday, March 07, 2007 10:27 PM
    > Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
    >
    >
    >> Jean-Pierre Radley wrote:
    >>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    >>> | The total size of the file might be more relevant than the number of
    >>> | entries. 6.0 has special large file utilities for handling files >
    >>> | 1Gb which it sounds like this might be.
    >>> | | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    >>> | /u95/bin/cp /dev/null /usr/adm/syslog
    >>> | Restart syslogd and then read it to see what all the errors are.
    >>>
    >>> Oh? I thought that the binaries for "large" files had to to with files
    >>> exceeding 2Gb, not 1GB.
    >>>

    >> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >>
    >> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >>
    >> wc -l syslog.old caused immediate reboot.
    >>
    >> cat syslog.old | wc -l returned 1.5M+ lines.
    >>
    >> examining syslog.old shows no recorded messages about any problems on the
    >> hard disk.
    >>
    >> And attempting badtrk returns a message that the command is removed
    >> from OS6 and the kernel is in charge of automatically managing bad
    >> tracks.
    >>
    >> If this were a SCSI system, I would use the SCSI controller to perform
    >> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    >> tool to check the disk.
    >>
    >> Replacing the disk is a shotgun approach to diagnosing the problem.
    >>
    >> Anyone have any suggestions as to why a system command would
    >> cause a reboot?

    >
    > First: I would NOT reduce the size of or delete the original bad file!
    > curently syslog.old


    Too late, already reset to zero with > syslog

    > If there is some magic bad spot in the fs or on the disk, then by renaming
    > the original syslog to syslog.old you have luckily still not freed that bad
    > spot for some other file to chance upon with potentially much uglier
    > results. In any further testing I'd read or rename that file at will, but
    > not zero it out or reduce it's size, or delete it, unless and until you are
    > sure you know what caused the crash and are sure it won't happen again.


    Good suggestions.

    >
    > On to the good stuff
    >
    > I'm curious if wc -l on a copy (not just renamed) of the bad file also
    > reboots.


    Yes, wc -l syslog.old reboots, cat syslog.old | wc -l returns 1.5M+ lines.

    >
    > ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
    > also in that case before even getting to wc, the copy might not match the
    > original for that matter.


    Too bad I failed to run sum -r on syslog and syslog.old before zeroing out syslog.

    >
    > Then I'd try feeding wc equivalent but different data to see if it's simply
    > wc can't handle more than x lines or x bytes etc...
    > maybe uuencode the bad file (to a new file) and try wc on that, just because
    > it's an easy way to make an even bigger file, that has all different data
    > than the original, but that is all text and all short lines.


    I'm puzzled why a system command will cause a reboot. The most it should do is
    core dump. This has got to be a hardware or memory issue.

    >
    > I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
    > have an equivalent of the gnutools & gwxlibs packages for osr5?)
    > it might also be in a skunkware package sh-utils


    Looking for other software to replace wc is not the issue. I just happened
    upon wc -l failing when I was trying to determine why less syslog took
    so long to seek to the end of the file.

    The reason I was called to the client was because the nightly Backup Edge
    backup was triggering a reboot during the verify pass. So whatever is
    affecting Backup Edge is probably in play with wc.

    The system has not rebooted during the day with normal business operations.
    (At least, I have not been told that it has.)

    >
    > Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
    > etc?
    >
    > Does the stock wc reading the known bad file still crash the system if you
    > are not root when you try it?
    >
    > I'd probably also try chopping up the bad file into pieces to see if it's a
    > particular size (in lines or in bytes) that wc can't handle, or if there is
    > some spot somewhere in the file that causes the crash by touching that spot.
    > Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
    > chunk to see if one crashes, That points to some funky data that wc chokes
    > on, which would be pretty strange since your cat test shows it didn't choke
    > on the same data via stdin, but, *shrug* it's gotta be something.
    > maybe another loop that reassembles the chunks building a new file one at a
    > time and wc-l the new growing file each time to see if it crashes at some
    > point. that would suggest a magic size barrier.
    >
    > If I were really curious I'd maybe try to set up a sort of "high speed
    > camera" that collects info using sar or vmstat repeatedly in a tight loop in
    > the background just before trying the wc command, and then after the reboot
    > see if the memory usage or process stack or some other resource went nuts
    > just before the end.
    > Of course the tight loop itself will hit the system pretty hard, so maybe a
    > control run first that loops x thousand times but doesn't run wc.
    > It would generate a lot of data very fast so I'd make it into one big
    > command or script that starts the loop in the background and then runs the
    > wc-of-death immediately after.
    > Perhaps a sleep .5 in between.


    I thought about serial console on a Wyse terminal connected to tty1a. That should
    preserve the screen when the reboot is triggered. I've never used serial console
    and need step by step instructions on how to accomplish it. (I can dig it out
    of the documentation, but help from someone that has experience will help me
    set up the test much sooner that I could without the help.)

    >
    > And of course, this should probably have been first,
    > truss/trace/whatever-unixware-has.


    Again, a tool that I have not used before. How would I use truss to execute
    wc and see the result if that still causes a reboot?

    >
    > Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/
    > +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    > filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
    >


  11. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Bela Lubkin wrote:
    > Steve M. Fabac, Jr. wrote:
    >
    >> Just returned from a new client with problems.
    >> I was called when the Backup Edge Verify pass
    >> was causing the machine to reboot.
    >>
    >> When I arrived, I checked the system and found it
    >> running OS 6.0 with MP1 and some package add
    >> patch.
    >>
    >> While surveying the system I used less /usr/adm/syslog
    >> to view the system log file. Pressing shift G to
    >> go to the bottom of the file took a long time.
    >> So long that I gave up and pressed del to interrupt.
    >>
    >> When I executed l -l /usr/adm/syslog it showed it
    >> at 108+ megabytes. I tried wc -l /usr/adm/syslog and
    >> within .5 to 1 second the screen went blank and the
    >> system was rebooting.
    >>
    >> I brought it up in single user mode and ran fsck -ofull
    >> several times with no unusual problems reported.
    >>
    >> In single user mode the wc -l /usr/adm/syslog would
    >> still cause the system to reboot.
    >>
    >> Funny thing: With the system back up in single user mode,
    >> running cat /usr/adm/syslog > /dev/null worked without
    >> problems.
    >>
    >> And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
    >> But wc -l /usr/adm/syslog (or /usr/adm/messages) will
    >> trigger the reboot.
    >>
    >> No panic messages, the monitor just goes black and
    >> then the boot up screen is displayed.
    >>
    >> Any suggestions on what to check first? I plan to install
    >> a new IDE hard drive tomorrow and use recovery media
    >> on the Backup Edge overnight backup to restore the system
    >> to the new drive.

    >
    > You say that both /usr/adm/syslog and /usr/adm/messages trigger the
    > problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
    > | wc -l` doesn't. Peculiar.
    >
    > I don't have an OSR6 system here, but a quick `truss` test on OSR506
    > shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
    > mostly OSR5, so this is probably the same.
    >
    > See if:
    >
    > dd if=/usr/adm/syslog of=/dev/null bs=16k


    Good to hear from you Bela. I give dd a try when I get back on site
    today.

    I just talked to the client. I had sent him out to pick up a hard disk
    so that we can eliminate the disk by replacing it. He suggested that
    he pick up a new machine so that we can restore the backup to it
    and leave the current system in production for today's business operations
    (front counter POS, 8 stations two remote locations).

    He called back from his preferred used computer supplier and they have
    a Dell PowerEdge 2600 with 1G RAM, unspecified RAID controller, and three
    73G disks for $325. So I will be attempting to use the boot loadable
    drivers for Backup Edge RE2 to transfer last night's backup to the
    new hardware. Once that's accomplished, I test wc again prior to installing
    MP2 and other recommended patches.

    I've done this before on OS5 but not OS6 so I'll likely need recommendations on
    how to reconfigure the NIC (reconfigure?)


    Whoops. Just checked with Microlite tech support, and BackupEdge on OS6 does not
    support moving to non-identical hardware (no btld support). So it looks like a
    fresh install and porting their application and data to the new system. Rats.

    >
    > triggers a crash. If so, it has nothing to do with `wc`, only with a
    > certain manner of reading that file.
    >
    > Then see if:
    >
    > dd if=/unix of=/dev/null bs=16k
    >
    > triggers the same crash. (/unix may not be big enough. Poke around the
    > system for other 100MB-ish files to try.)
    >
    > I'm thinking it's either a filesystem bug or a hardware problem with the
    > machine's memory. Seems unlikely to be a problem with the specific
    > files, when two different files cause the same thing.
    >
    > Run memtest86 (almost all Linux CDs have it as a boot option).
    >
    >> Bela<


  12. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    In article ,
    Steve M. Fabac, Jr. wrote:
    >Jean-Pierre Radley wrote:
    >> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    >> | The total size of the file might be more relevant than the number of
    >> | entries. 6.0 has special large file utilities for handling files >
    >> | 1Gb which it sounds like this might be.
    >> |
    >> | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    >> | /u95/bin/cp /dev/null /usr/adm/syslog
    >> | Restart syslogd and then read it to see what all the errors are.
    >>
    >> Oh? I thought that the binaries for "large" files had to to with files
    >> exceeding 2Gb, not 1GB.
    >>

    >
    >Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >
    >I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >
    >wc -l syslog.old caused immediate reboot.
    >
    >cat syslog.old | wc -l returned 1.5M+ lines.
    >
    >examining syslog.old shows no recorded messages about any problems on the
    >hard disk.
    >
    >And attempting badtrk returns a message that the command is removed
    >from OS6 and the kernel is in charge of automatically managing bad
    >tracks.
    >
    >If this were a SCSI system, I would use the SCSI controller to perform
    >a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    >tool to check the disk.


    And many IDE manufacturers have utility disks for download with
    which you can verify and entire drive, and if needed reformat and
    lock out bad sectors, very similar to SCSI utilities.

    Check the IDE vendors site. The disks are handy.

    >Replacing the disk is a shotgun approach to diagnosing the problem.


    >Anyone have any suggestions as to why a system command would
    >cause a reboot?


    Could there be something strange IN the syslog file?




    --
    Bill Vermillion - bv @ wjv . com

  13. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    In article <01a001c7614b$9c4f9090$6800000a@venti>,
    Brian K. White wrote:

    >----- Original Message ----- From: "Steve M. Fabac, Jr."
    > Newsgroups: comp.unix.sco.misc To:
    > Sent: Wednesday, March 07, 2007 10:27 PM
    >Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
    >
    >
    >> Jean-Pierre Radley wrote:
    >>
    >>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    >>> | The total size of the file might be more relevant than the
    >>> number of | entries. 6.0 has special large file utilities for
    >>> handling files > | 1Gb which it sounds like this might be. |
    >>> | Prepend /u95/bin to your PATH. I would then just blow away
    >>> syslog: | /u95/bin/cp /dev/null /usr/adm/syslog | Restart
    >>> syslogd and then read it to see what all the errors are.
    >>>
    >>> Oh? I thought that the binaries for "large" files had to to
    >>> with files exceeding 2Gb, not 1GB.

    >>
    >>
    >> Even so, the size of syslog was 108M (about 10% of 1G). So not
    >> relevant.
    >>
    >> I copied syslog to syslog.old and zeroed out syslog in
    >> maintenance mode.
    >>
    >> wc -l syslog.old caused immediate reboot.
    >>
    >> cat syslog.old | wc -l returned 1.5M+ lines.
    >>
    >> examining syslog.old shows no recorded messages about any
    >> problems on the hard disk.
    >>
    >> And attempting badtrk returns a message that the command is
    >> removed from OS6 and the kernel is in charge of automatically
    >> managing bad tracks.
    >>
    >> If this were a SCSI system, I would use the SCSI controller to
    >> perform a "verify" (ala Adaptec 29160), but since it is IDE, I
    >> don't have that tool to check the disk.
    >>
    >> Replacing the disk is a shotgun approach to diagnosing the
    >> problem.
    >>
    >> Anyone have any suggestions as to why a system command would
    >> cause a reboot?


    >First: I would NOT reduce the size of or delete the original
    >bad file! curently syslog.old If there is some magic bad spot
    >in the fs or on the disk, then by renaming the original syslog
    >to syslog.old you have luckily still not freed that bad spot
    >for some other file to chance upon with potentially much uglier
    >results. In any further testing I'd read or rename that file at
    >will, but not zero it out or reduce it's size, or delete it,
    >unless and until you are sure you know what caused the crash and
    >are sure it won't happen again.


    When I had a problem like that with a bad-spot that could not be
    fixed [Irix's system had no fsck as it was not needed] I rename the
    file .bad, and then chmod 000 so that even the system
    couldn't see it. If it were in a normally used directory, I'd
    rename the directory it was in, recreate a directory with the
    original name and permissions, and move all the other files over.
    That will keep it hidden until you can do whatever it is you wish
    to recover/fix/change the system.


    >On to the good stuff
    >
    >I'm curious if wc -l on a copy (not just renamed) of the bad file also
    >reboots.
    >
    >ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
    >also in that case before even getting to wc, the copy might not match the
    >original for that matter.
    >
    >Then I'd try feeding wc equivalent but different data to see if it's simply
    >wc can't handle more than x lines or x bytes etc...
    >maybe uuencode the bad file (to a new file) and try wc on that, just because
    >it's an easy way to make an even bigger file, that has all different data
    >than the original, but that is all text and all short lines.
    >
    >I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
    >have an equivalent of the gnutools & gwxlibs packages for osr5?)
    >it might also be in a skunkware package sh-utils
    >
    >Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
    >etc?
    >
    >Does the stock wc reading the known bad file still crash the system if you
    >are not root when you try it?
    >
    >I'd probably also try chopping up the bad file into pieces to see if it's a
    >particular size (in lines or in bytes) that wc can't handle, or if there is
    >some spot somewhere in the file that causes the crash by touching that spot.
    >Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
    >chunk to see if one crashes, ....


    Most Unix systems I've used have had a 'split' utility [ and I used
    to use that on old SCO systems when the 'vi' had about a 500K file
    size limit] that will automatically do that so it obviates the need
    to write a 'dd loop'.

    Bill
    --
    Bill Vermillion - bv @ wjv . com

  14. Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system

    Bill Vermillion wrote:
    > In article ,
    > Steve M. Fabac, Jr. wrote:
    >> Jean-Pierre Radley wrote:
    >>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
    >>> | The total size of the file might be more relevant than the number of
    >>> | entries. 6.0 has special large file utilities for handling files >
    >>> | 1Gb which it sounds like this might be.
    >>> |
    >>> | Prepend /u95/bin to your PATH. I would then just blow away syslog:
    >>> | /u95/bin/cp /dev/null /usr/adm/syslog
    >>> | Restart syslogd and then read it to see what all the errors are.
    >>>
    >>> Oh? I thought that the binaries for "large" files had to to with files
    >>> exceeding 2Gb, not 1GB.
    >>>

    >> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
    >>
    >> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
    >>
    >> wc -l syslog.old caused immediate reboot.
    >>
    >> cat syslog.old | wc -l returned 1.5M+ lines.
    >>
    >> examining syslog.old shows no recorded messages about any problems on the
    >> hard disk.
    >>
    >> And attempting badtrk returns a message that the command is removed
    >>from OS6 and the kernel is in charge of automatically managing bad
    >> tracks.
    >>
    >> If this were a SCSI system, I would use the SCSI controller to perform
    >> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
    >> tool to check the disk.

    >
    > And many IDE manufacturers have utility disks for download with
    > which you can verify and entire drive, and if needed reformat and
    > lock out bad sectors, very similar to SCSI utilities.
    >
    > Check the IDE vendors site. The disks are handy.
    >
    >> Replacing the disk is a shotgun approach to diagnosing the problem.

    >
    >> Anyone have any suggestions as to why a system command would
    >> cause a reboot?

    >
    > Could there be something strange IN the syslog file?


    I'm beginning to suspect that this is a DMA issue.
    I see in the syslog that DMA has been disabled due to
    too many failures (or words to that effect, not on-site
    and I don't have the print out).

    Other failures that I have noted include reboot when executing
    mkisofs to create an ISO image containing a small subdirectory
    with files I wanted to move to the replacement system.

    But since the client has purchased a used Dell PowerEdge 2600,
    we have abandoned trying to get this system repaired and have
    concentrated on loading the OS on the replacement hardware.

    There is the rub: The Dell has a PERC4/Di RAID on motherboard
    controller and everything (every driver downloaded from Dell
    and SCO) has failed to resolve the "No root disk controller"
    issue.

    Just last night while searching Google for PERC4/Di, I came
    across a post to the Dell Forms that recommended that both
    channels be set to RAID. That may be my problem as in the BIOS
    only the first channel is set to RAID the second is set as SCSI.

    This apparently was not a problem for Windows Server 2000
    as when we first powered on the the machine, it booted to the
    login screen. The machine had a DLT tape drive on the second
    channel and I think that it was set to SCSI.

    >
    >
    >
    >


+ Reply to Thread