Openserver 6.0 wc -l /usr/adm/syslog reboots system - SCO
This is a discussion on Openserver 6.0 wc -l /usr/adm/syslog reboots system - SCO ; Just returned from a new client with problems.
I was called when the Backup Edge Verify pass
was causing the machine to reboot.
When I arrived, I checked the system and found it
running OS 6.0 with MP1 and some ...
-
Openserver 6.0 wc -l /usr/adm/syslog reboots system
Just returned from a new client with problems.
I was called when the Backup Edge Verify pass
was causing the machine to reboot.
When I arrived, I checked the system and found it
running OS 6.0 with MP1 and some package add
patch.
While surveying the system I used less /usr/adm/syslog
to view the system log file. Pressing shift G to
go to the bottom of the file took a long time.
So long that I gave up and pressed del to interrupt.
When I executed l -l /usr/adm/syslog it showed it
at 108+ megabytes. I tried wc -l /usr/adm/syslog and
within .5 to 1 second the screen went blank and the
system was rebooting.
I brought it up in single user mode and ran fsck -ofull
several times with no unusual problems reported.
In single user mode the wc -l /usr/adm/syslog would
still cause the system to reboot.
Funny thing: With the system back up in single user mode,
running cat /usr/adm/syslog > /dev/null worked without
problems.
And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
But wc -l /usr/adm/syslog (or /usr/adm/messages) will
trigger the reboot.
No panic messages, the monitor just goes black and
then the boot up screen is displayed.
Any suggestions on what to check first? I plan to install
a new IDE hard drive tomorrow and use recovery media
on the Backup Edge overnight backup to restore the system
to the new drive.
TIA
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
The total size of the file might be more relevant than the number of
entries. 6.0 has special large file utilities for handling files >
1Gb which it sounds like this might be.
Prepend /u95/bin to your PATH. I would then just blow away syslog:
/u95/bin/cp /dev/null /usr/adm/syslog
Restart syslogd and then read it to see what all the errors are.
--Ray Robert
Steve M. Fabac, Jr. wrote:
> Just returned from a new client with problems.
> I was called when the Backup Edge Verify pass
> was causing the machine to reboot.
>
> When I arrived, I checked the system and found it
> running OS 6.0 with MP1 and some package add
> patch.
>
> While surveying the system I used less /usr/adm/syslog
> to view the system log file. Pressing shift G to
> go to the bottom of the file took a long time.
> So long that I gave up and pressed del to interrupt.
>
> When I executed l -l /usr/adm/syslog it showed it
> at 108+ megabytes. I tried wc -l /usr/adm/syslog and
> within .5 to 1 second the screen went blank and the
> system was rebooting.
>
> I brought it up in single user mode and ran fsck -ofull
> several times with no unusual problems reported.
>
> In single user mode the wc -l /usr/adm/syslog would
> still cause the system to reboot.
>
> Funny thing: With the system back up in single user mode,
> running cat /usr/adm/syslog > /dev/null worked without
> problems.
>
> And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
> But wc -l /usr/adm/syslog (or /usr/adm/messages) will
> trigger the reboot.
>
> No panic messages, the monitor just goes black and
> then the boot up screen is displayed.
>
> Any suggestions on what to check first? I plan to install
> a new IDE hard drive tomorrow and use recovery media
> on the Backup Edge overnight backup to restore the system
> to the new drive.
>
> TIA
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
| The total size of the file might be more relevant than the number of
| entries. 6.0 has special large file utilities for handling files >
| 1Gb which it sounds like this might be.
|
| Prepend /u95/bin to your PATH. I would then just blow away syslog:
| /u95/bin/cp /dev/null /usr/adm/syslog
| Restart syslogd and then read it to see what all the errors are.
Oh? I thought that the binaries for "large" files had to to with files
exceeding 2Gb, not 1GB.
--
JP
==> http://www.frappr.com/cusm <==
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Jean-Pierre Radley wrote:
> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
> | The total size of the file might be more relevant than the number of
> | entries. 6.0 has special large file utilities for handling files >
> | 1Gb which it sounds like this might be.
> |
> | Prepend /u95/bin to your PATH. I would then just blow away syslog:
> | /u95/bin/cp /dev/null /usr/adm/syslog
> | Restart syslogd and then read it to see what all the errors are.
>
> Oh? I thought that the binaries for "large" files had to to with files
> exceeding 2Gb, not 1GB.
>
Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
wc -l syslog.old caused immediate reboot.
cat syslog.old | wc -l returned 1.5M+ lines.
examining syslog.old shows no recorded messages about any problems on the
hard disk.
And attempting badtrk returns a message that the command is removed
from OS6 and the kernel is in charge of automatically managing bad
tracks.
If this were a SCSI system, I would use the SCSI controller to perform
a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
tool to check the disk.
Replacing the disk is a shotgun approach to diagnosing the problem.
Anyone have any suggestions as to why a system command would
cause a reboot?
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Steve M. Fabac, Jr. wrote (on Thu, Mar 08, 2007 at 03:27:31AM +0000):
> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>
> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>
> wc -l syslog.old caused immediate reboot.
>
> cat syslog.old | wc -l returned 1.5M+ lines.
>
> examining syslog.old shows no recorded messages about any problems on the
> hard disk.
>
> And attempting badtrk returns a message that the command is removed
> from OS6 and the kernel is in charge of automatically managing bad
> tracks.
>
> If this were a SCSI system, I would use the SCSI controller to perform
> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
> tool to check the disk.
>
> Replacing the disk is a shotgun approach to diagnosing the problem.
>
> Anyone have any suggestions as to why a system command would
> cause a reboot?
Perhaps you could a) copy it (both cp, say, and dd and/or tar) to a
new file to see if the new file had the identical behavior; or
b) split it into pieces to narrow it down to one piece; or c)
create a brand new identical sized file to see if the file size
has anything to do with it.
--
_________________________________________
Nachman Yaakov Ziskind, FSPA, LLM awacs@ziskind.us
Attorney and Counselor-at-Law http://ziskind.us
Economic Group Pension Services http://egps.com
Actuaries and Employee Benefit Consultants
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
----- Original Message -----
From: "Steve M. Fabac, Jr."
Newsgroups: comp.unix.sco.misc
To:
Sent: Wednesday, March 07, 2007 10:27 PM
Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
> Jean-Pierre Radley wrote:
>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
>> | The total size of the file might be more relevant than the number of
>> | entries. 6.0 has special large file utilities for handling files >
>> | 1Gb which it sounds like this might be.
>> | | Prepend /u95/bin to your PATH. I would then just blow away syslog:
>> | /u95/bin/cp /dev/null /usr/adm/syslog
>> | Restart syslogd and then read it to see what all the errors are.
>>
>> Oh? I thought that the binaries for "large" files had to to with files
>> exceeding 2Gb, not 1GB.
>>
>
> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>
> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>
> wc -l syslog.old caused immediate reboot.
>
> cat syslog.old | wc -l returned 1.5M+ lines.
>
> examining syslog.old shows no recorded messages about any problems on the
> hard disk.
>
> And attempting badtrk returns a message that the command is removed
> from OS6 and the kernel is in charge of automatically managing bad
> tracks.
>
> If this were a SCSI system, I would use the SCSI controller to perform
> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
> tool to check the disk.
>
> Replacing the disk is a shotgun approach to diagnosing the problem.
>
> Anyone have any suggestions as to why a system command would
> cause a reboot?
First: I would NOT reduce the size of or delete the original bad file!
curently syslog.old
If there is some magic bad spot in the fs or on the disk, then by renaming
the original syslog to syslog.old you have luckily still not freed that bad
spot for some other file to chance upon with potentially much uglier
results. In any further testing I'd read or rename that file at will, but
not zero it out or reduce it's size, or delete it, unless and until you are
sure you know what caused the crash and are sure it won't happen again.
On to the good stuff
I'm curious if wc -l on a copy (not just renamed) of the bad file also
reboots.
ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
also in that case before even getting to wc, the copy might not match the
original for that matter.
Then I'd try feeding wc equivalent but different data to see if it's simply
wc can't handle more than x lines or x bytes etc...
maybe uuencode the bad file (to a new file) and try wc on that, just because
it's an easy way to make an even bigger file, that has all different data
than the original, but that is all text and all short lines.
I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
have an equivalent of the gnutools & gwxlibs packages for osr5?)
it might also be in a skunkware package sh-utils
Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
etc?
Does the stock wc reading the known bad file still crash the system if you
are not root when you try it?
I'd probably also try chopping up the bad file into pieces to see if it's a
particular size (in lines or in bytes) that wc can't handle, or if there is
some spot somewhere in the file that causes the crash by touching that spot.
Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
chunk to see if one crashes, That points to some funky data that wc chokes
on, which would be pretty strange since your cat test shows it didn't choke
on the same data via stdin, but, *shrug* it's gotta be something.
maybe another loop that reassembles the chunks building a new file one at a
time and wc-l the new growing file each time to see if it crashes at some
point. that would suggest a magic size barrier.
If I were really curious I'd maybe try to set up a sort of "high speed
camera" that collects info using sar or vmstat repeatedly in a tight loop in
the background just before trying the wc command, and then after the reboot
see if the memory usage or process stack or some other resource went nuts
just before the end.
Of course the tight loop itself will hit the system pretty hard, so maybe a
control run first that loops x thousand times but doesn't run wc.
It would generate a lot of data very fast so I'd make it into one big
command or script that starts the loop in the background and then runs the
wc-of-death immediately after.
Perhaps a sleep .5 in between.
And of course, this should probably have been first,
truss/trace/whatever-unixware-has.
Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Steve M. Fabac, Jr. wrote:
> Just returned from a new client with problems.
> I was called when the Backup Edge Verify pass
> was causing the machine to reboot.
>
> When I arrived, I checked the system and found it
> running OS 6.0 with MP1 and some package add
> patch.
>
> While surveying the system I used less /usr/adm/syslog
> to view the system log file. Pressing shift G to
> go to the bottom of the file took a long time.
> So long that I gave up and pressed del to interrupt.
>
> When I executed l -l /usr/adm/syslog it showed it
> at 108+ megabytes. I tried wc -l /usr/adm/syslog and
> within .5 to 1 second the screen went blank and the
> system was rebooting.
>
> I brought it up in single user mode and ran fsck -ofull
> several times with no unusual problems reported.
>
> In single user mode the wc -l /usr/adm/syslog would
> still cause the system to reboot.
>
> Funny thing: With the system back up in single user mode,
> running cat /usr/adm/syslog > /dev/null worked without
> problems.
>
> And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
> But wc -l /usr/adm/syslog (or /usr/adm/messages) will
> trigger the reboot.
>
> No panic messages, the monitor just goes black and
> then the boot up screen is displayed.
>
> Any suggestions on what to check first? I plan to install
> a new IDE hard drive tomorrow and use recovery media
> on the Backup Edge overnight backup to restore the system
> to the new drive.
You say that both /usr/adm/syslog and /usr/adm/messages trigger the
problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
| wc -l` doesn't. Peculiar.
I don't have an OSR6 system here, but a quick `truss` test on OSR506
shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
mostly OSR5, so this is probably the same.
See if:
dd if=/usr/adm/syslog of=/dev/null bs=16k
triggers a crash. If so, it has nothing to do with `wc`, only with a
certain manner of reading that file.
Then see if:
dd if=/unix of=/dev/null bs=16k
triggers the same crash. (/unix may not be big enough. Poke around the
system for other 100MB-ish files to try.)
I'm thinking it's either a filesystem bug or a hardware problem with the
machine's memory. Seems unlikely to be a problem with the specific
files, when two different files cause the same thing.
Run memtest86 (almost all Linux CDs have it as a boot option).
>Bela<
-
Openserver 6.0 wc -l /usr/adm/syslog reboots system
On Mar 8, 3:38 am, Bela Lubkin wrote:
> Steve M. Fabac, Jr. wrote:
>
>
>
> > Just returned from a new client with problems.
Hi Steve,
Don't you love it? This situations always seem to crop up
with the NEW clients.
> > I was called when the Backup Edge Verify pass
> > was causing the machine to reboot.
Maybe check the verify_master.log to see what was being
verified at the time of the crash. That might give you a similarly
sized file to manipulate in your testing, as others have suggested.
(Or maybe it barfed on ./usr/adm/syslog)
A note of thanks to Bela, for dropping in and continuing to help
us all out! Hope all is going well at VMware.
Regards,
Dan Martin
>
> > When I arrived, I checked the system and found it
> > running OS 6.0 with MP1 and some package add
> > patch.
>
> > While surveying the system I used less /usr/adm/syslog
> > to view the system log file. Pressing shift G to
> > go to the bottom of the file took a long time.
> > So long that I gave up and pressed del to interrupt.
>
> > When I executed l -l /usr/adm/syslog it showed it
> > at 108+ megabytes. I tried wc -l /usr/adm/syslog and
> > within .5 to 1 second the screen went blank and the
> > system was rebooting.
>
> > I brought it up in single user mode and ran fsck -ofull
> > several times with no unusual problems reported.
>
> > In single user mode the wc -l /usr/adm/syslog would
> > still cause the system to reboot.
>
> > Funny thing: With the system back up in single user mode,
> > running cat /usr/adm/syslog > /dev/null worked without
> > problems.
>
> > And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
> > But wc -l /usr/adm/syslog (or /usr/adm/messages) will
> > trigger the reboot.
>
> > No panic messages, the monitor just goes black and
> > then the boot up screen is displayed.
>
> > Any suggestions on what to check first? I plan to install
> > a new IDE hard drive tomorrow and use recovery media
> > on the Backup Edge overnight backup to restore the system
> > to the new drive.
>
> You say that both /usr/adm/syslog and /usr/adm/messages trigger the
> problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
> | wc -l` doesn't. Peculiar.
>
> I don't have an OSR6 system here, but a quick `truss` test on OSR506
> shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
> mostly OSR5, so this is probably the same.
>
> See if:
>
> dd if=/usr/adm/syslog of=/dev/null bs=16k
>
> triggers a crash. If so, it has nothing to do with `wc`, only with a
> certain manner of reading that file.
>
> Then see if:
>
> dd if=/unix of=/dev/null bs=16k
>
> triggers the same crash. (/unix may not be big enough. Poke around the
> system for other 100MB-ish files to try.)
>
> I'm thinking it's either a filesystem bug or a hardware problem with the
> machine's memory. Seems unlikely to be a problem with the specific
> files, when two different files cause the same thing.
>
> Run memtest86 (almost all Linux CDs have it as a boot option).
>
> >Bela<
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Nachman Yaakov Ziskind wrote:
> Steve M. Fabac, Jr. wrote (on Thu, Mar 08, 2007 at 03:27:31AM +0000):
>> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>>
>> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>>
>> wc -l syslog.old caused immediate reboot.
>>
>> cat syslog.old | wc -l returned 1.5M+ lines.
>>
>> examining syslog.old shows no recorded messages about any problems on the
>> hard disk.
>>
>> And attempting badtrk returns a message that the command is removed
>> from OS6 and the kernel is in charge of automatically managing bad
>> tracks.
>>
>> If this were a SCSI system, I would use the SCSI controller to perform
>> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
>> tool to check the disk.
>>
>> Replacing the disk is a shotgun approach to diagnosing the problem.
>>
>> Anyone have any suggestions as to why a system command would
>> cause a reboot?
>
> Perhaps you could a) copy it (both cp, say, and dd and/or tar) to a
> new file to see if the new file had the identical behavior; or
> b) split it into pieces to narrow it down to one piece; or c)
> create a brand new identical sized file to see if the file size
> has anything to do with it.
>
I did: cd /usr/adm ; cp syslog syslog.old; > syslog
wc -l syslog.old reboots the system.
wc -l messages reboots the system.
cat syslog.old | wc -l returns 1.5M+ lines
cat messages | wc -l works (don't remember number of lines)
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Brian K. White wrote:
> ----- Original Message -----
> From: "Steve M. Fabac, Jr."
> Newsgroups: comp.unix.sco.misc
> To:
> Sent: Wednesday, March 07, 2007 10:27 PM
> Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
>
>
>> Jean-Pierre Radley wrote:
>>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
>>> | The total size of the file might be more relevant than the number of
>>> | entries. 6.0 has special large file utilities for handling files >
>>> | 1Gb which it sounds like this might be.
>>> | | Prepend /u95/bin to your PATH. I would then just blow away syslog:
>>> | /u95/bin/cp /dev/null /usr/adm/syslog
>>> | Restart syslogd and then read it to see what all the errors are.
>>>
>>> Oh? I thought that the binaries for "large" files had to to with files
>>> exceeding 2Gb, not 1GB.
>>>
>> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>>
>> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>>
>> wc -l syslog.old caused immediate reboot.
>>
>> cat syslog.old | wc -l returned 1.5M+ lines.
>>
>> examining syslog.old shows no recorded messages about any problems on the
>> hard disk.
>>
>> And attempting badtrk returns a message that the command is removed
>> from OS6 and the kernel is in charge of automatically managing bad
>> tracks.
>>
>> If this were a SCSI system, I would use the SCSI controller to perform
>> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
>> tool to check the disk.
>>
>> Replacing the disk is a shotgun approach to diagnosing the problem.
>>
>> Anyone have any suggestions as to why a system command would
>> cause a reboot?
>
> First: I would NOT reduce the size of or delete the original bad file!
> curently syslog.old
Too late, already reset to zero with > syslog
> If there is some magic bad spot in the fs or on the disk, then by renaming
> the original syslog to syslog.old you have luckily still not freed that bad
> spot for some other file to chance upon with potentially much uglier
> results. In any further testing I'd read or rename that file at will, but
> not zero it out or reduce it's size, or delete it, unless and until you are
> sure you know what caused the crash and are sure it won't happen again.
Good suggestions.
>
> On to the good stuff
>
> I'm curious if wc -l on a copy (not just renamed) of the bad file also
> reboots.
Yes, wc -l syslog.old reboots, cat syslog.old | wc -l returns 1.5M+ lines.
>
> ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
> also in that case before even getting to wc, the copy might not match the
> original for that matter.
Too bad I failed to run sum -r on syslog and syslog.old before zeroing out syslog.
>
> Then I'd try feeding wc equivalent but different data to see if it's simply
> wc can't handle more than x lines or x bytes etc...
> maybe uuencode the bad file (to a new file) and try wc on that, just because
> it's an easy way to make an even bigger file, that has all different data
> than the original, but that is all text and all short lines.
I'm puzzled why a system command will cause a reboot. The most it should do is
core dump. This has got to be a hardware or memory issue.
>
> I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
> have an equivalent of the gnutools & gwxlibs packages for osr5?)
> it might also be in a skunkware package sh-utils
Looking for other software to replace wc is not the issue. I just happened
upon wc -l failing when I was trying to determine why less syslog took
so long to seek to the end of the file.
The reason I was called to the client was because the nightly Backup Edge
backup was triggering a reboot during the verify pass. So whatever is
affecting Backup Edge is probably in play with wc.
The system has not rebooted during the day with normal business operations.
(At least, I have not been told that it has.)
>
> Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
> etc?
>
> Does the stock wc reading the known bad file still crash the system if you
> are not root when you try it?
>
> I'd probably also try chopping up the bad file into pieces to see if it's a
> particular size (in lines or in bytes) that wc can't handle, or if there is
> some spot somewhere in the file that causes the crash by touching that spot.
> Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
> chunk to see if one crashes, That points to some funky data that wc chokes
> on, which would be pretty strange since your cat test shows it didn't choke
> on the same data via stdin, but, *shrug* it's gotta be something.
> maybe another loop that reassembles the chunks building a new file one at a
> time and wc-l the new growing file each time to see if it crashes at some
> point. that would suggest a magic size barrier.
>
> If I were really curious I'd maybe try to set up a sort of "high speed
> camera" that collects info using sar or vmstat repeatedly in a tight loop in
> the background just before trying the wc command, and then after the reboot
> see if the memory usage or process stack or some other resource went nuts
> just before the end.
> Of course the tight loop itself will hit the system pretty hard, so maybe a
> control run first that loops x thousand times but doesn't run wc.
> It would generate a lot of data very fast so I'd make it into one big
> command or script that starts the loop in the background and then runs the
> wc-of-death immediately after.
> Perhaps a sleep .5 in between.
I thought about serial console on a Wyse terminal connected to tty1a. That should
preserve the screen when the reboot is triggered. I've never used serial console
and need step by step instructions on how to accomplish it. (I can dig it out
of the documentation, but help from someone that has experience will help me
set up the test much sooner that I could without the help.)
>
> And of course, this should probably have been first,
> truss/trace/whatever-unixware-has.
Again, a tool that I have not used before. How would I use truss to execute
wc and see the result if that still causes a reboot?
>
> Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/
> +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
> filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
>
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Bela Lubkin wrote:
> Steve M. Fabac, Jr. wrote:
>
>> Just returned from a new client with problems.
>> I was called when the Backup Edge Verify pass
>> was causing the machine to reboot.
>>
>> When I arrived, I checked the system and found it
>> running OS 6.0 with MP1 and some package add
>> patch.
>>
>> While surveying the system I used less /usr/adm/syslog
>> to view the system log file. Pressing shift G to
>> go to the bottom of the file took a long time.
>> So long that I gave up and pressed del to interrupt.
>>
>> When I executed l -l /usr/adm/syslog it showed it
>> at 108+ megabytes. I tried wc -l /usr/adm/syslog and
>> within .5 to 1 second the screen went blank and the
>> system was rebooting.
>>
>> I brought it up in single user mode and ran fsck -ofull
>> several times with no unusual problems reported.
>>
>> In single user mode the wc -l /usr/adm/syslog would
>> still cause the system to reboot.
>>
>> Funny thing: With the system back up in single user mode,
>> running cat /usr/adm/syslog > /dev/null worked without
>> problems.
>>
>> And, cat /usr/adm/syslog | wc -l reported 1.5+ million lines.
>> But wc -l /usr/adm/syslog (or /usr/adm/messages) will
>> trigger the reboot.
>>
>> No panic messages, the monitor just goes black and
>> then the boot up screen is displayed.
>>
>> Any suggestions on what to check first? I plan to install
>> a new IDE hard drive tomorrow and use recovery media
>> on the Backup Edge overnight backup to restore the system
>> to the new drive.
>
> You say that both /usr/adm/syslog and /usr/adm/messages trigger the
> problem. And `wc -l /usr/adm/syslog` crashes while `cat /usr/adm/syslog
> | wc -l` doesn't. Peculiar.
>
> I don't have an OSR6 system here, but a quick `truss` test on OSR506
> shows that `cat` does 1KB reads while `wc` does 16KB. OSR6 userland is
> mostly OSR5, so this is probably the same.
>
> See if:
>
> dd if=/usr/adm/syslog of=/dev/null bs=16k
Good to hear from you Bela. I give dd a try when I get back on site
today.
I just talked to the client. I had sent him out to pick up a hard disk
so that we can eliminate the disk by replacing it. He suggested that
he pick up a new machine so that we can restore the backup to it
and leave the current system in production for today's business operations
(front counter POS, 8 stations two remote locations).
He called back from his preferred used computer supplier and they have
a Dell PowerEdge 2600 with 1G RAM, unspecified RAID controller, and three
73G disks for $325. So I will be attempting to use the boot loadable
drivers for Backup Edge RE2 to transfer last night's backup to the
new hardware. Once that's accomplished, I test wc again prior to installing
MP2 and other recommended patches.
I've done this before on OS5 but not OS6 so I'll likely need recommendations on
how to reconfigure the NIC (reconfigure?)
Whoops. Just checked with Microlite tech support, and BackupEdge on OS6 does not
support moving to non-identical hardware (no btld support). So it looks like a
fresh install and porting their application and data to the new system. Rats.
>
> triggers a crash. If so, it has nothing to do with `wc`, only with a
> certain manner of reading that file.
>
> Then see if:
>
> dd if=/unix of=/dev/null bs=16k
>
> triggers the same crash. (/unix may not be big enough. Poke around the
> system for other 100MB-ish files to try.)
>
> I'm thinking it's either a filesystem bug or a hardware problem with the
> machine's memory. Seems unlikely to be a problem with the specific
> files, when two different files cause the same thing.
>
> Run memtest86 (almost all Linux CDs have it as a boot option).
>
>> Bela<
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
In article ,
Steve M. Fabac, Jr. wrote:
>Jean-Pierre Radley wrote:
>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
>> | The total size of the file might be more relevant than the number of
>> | entries. 6.0 has special large file utilities for handling files >
>> | 1Gb which it sounds like this might be.
>> |
>> | Prepend /u95/bin to your PATH. I would then just blow away syslog:
>> | /u95/bin/cp /dev/null /usr/adm/syslog
>> | Restart syslogd and then read it to see what all the errors are.
>>
>> Oh? I thought that the binaries for "large" files had to to with files
>> exceeding 2Gb, not 1GB.
>>
>
>Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>
>I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>
>wc -l syslog.old caused immediate reboot.
>
>cat syslog.old | wc -l returned 1.5M+ lines.
>
>examining syslog.old shows no recorded messages about any problems on the
>hard disk.
>
>And attempting badtrk returns a message that the command is removed
>from OS6 and the kernel is in charge of automatically managing bad
>tracks.
>
>If this were a SCSI system, I would use the SCSI controller to perform
>a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
>tool to check the disk.
And many IDE manufacturers have utility disks for download with
which you can verify and entire drive, and if needed reformat and
lock out bad sectors, very similar to SCSI utilities.
Check the IDE vendors site. The disks are handy.
>Replacing the disk is a shotgun approach to diagnosing the problem.
>Anyone have any suggestions as to why a system command would
>cause a reboot?
Could there be something strange IN the syslog file?
--
Bill Vermillion - bv @ wjv . com
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
In article <01a001c7614b$9c4f9090$6800000a@venti>,
Brian K. White wrote:
>----- Original Message ----- From: "Steve M. Fabac, Jr."
> Newsgroups: comp.unix.sco.misc To:
> Sent: Wednesday, March 07, 2007 10:27 PM
>Subject: Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
>
>
>> Jean-Pierre Radley wrote:
>>
>>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
>>> | The total size of the file might be more relevant than the
>>> number of | entries. 6.0 has special large file utilities for
>>> handling files > | 1Gb which it sounds like this might be. |
>>> | Prepend /u95/bin to your PATH. I would then just blow away
>>> syslog: | /u95/bin/cp /dev/null /usr/adm/syslog | Restart
>>> syslogd and then read it to see what all the errors are.
>>>
>>> Oh? I thought that the binaries for "large" files had to to
>>> with files exceeding 2Gb, not 1GB.
>>
>>
>> Even so, the size of syslog was 108M (about 10% of 1G). So not
>> relevant.
>>
>> I copied syslog to syslog.old and zeroed out syslog in
>> maintenance mode.
>>
>> wc -l syslog.old caused immediate reboot.
>>
>> cat syslog.old | wc -l returned 1.5M+ lines.
>>
>> examining syslog.old shows no recorded messages about any
>> problems on the hard disk.
>>
>> And attempting badtrk returns a message that the command is
>> removed from OS6 and the kernel is in charge of automatically
>> managing bad tracks.
>>
>> If this were a SCSI system, I would use the SCSI controller to
>> perform a "verify" (ala Adaptec 29160), but since it is IDE, I
>> don't have that tool to check the disk.
>>
>> Replacing the disk is a shotgun approach to diagnosing the
>> problem.
>>
>> Anyone have any suggestions as to why a system command would
>> cause a reboot?
>First: I would NOT reduce the size of or delete the original
>bad file! curently syslog.old If there is some magic bad spot
>in the fs or on the disk, then by renaming the original syslog
>to syslog.old you have luckily still not freed that bad spot
>for some other file to chance upon with potentially much uglier
>results. In any further testing I'd read or rename that file at
>will, but not zero it out or reduce it's size, or delete it,
>unless and until you are sure you know what caused the crash and
>are sure it won't happen again.
When I had a problem like that with a bad-spot that could not be
fixed [Irix's system had no fsck as it was not needed] I rename the
file .bad, and then chmod 000 so that even the system
couldn't see it. If it were in a normally used directory, I'd
rename the directory it was in, recreate a directory with the
original name and permissions, and move all the other files over.
That will keep it hidden until you can do whatever it is you wish
to recover/fix/change the system.
>On to the good stuff
>
>I'm curious if wc -l on a copy (not just renamed) of the bad file also
>reboots.
>
>ie: if it's wonky disk or filesystem, probably the copy won't cause a crash.
>also in that case before even getting to wc, the copy might not match the
>original for that matter.
>
>Then I'd try feeding wc equivalent but different data to see if it's simply
>wc can't handle more than x lines or x bytes etc...
>maybe uuencode the bad file (to a new file) and try wc on that, just because
>it's an easy way to make an even bigger file, that has all different data
>than the original, but that is all text and all short lines.
>
>I'd also see if there is a gnu version of wc in /usr/gnu/bin (does unixware
>have an equivalent of the gnutools & gwxlibs packages for osr5?)
>it might also be in a skunkware package sh-utils
>
>Any other (besides gnu) versions of wc on there? like u95, ibin, obin, sbin,
>etc?
>
>Does the stock wc reading the known bad file still crash the system if you
>are not root when you try it?
>
>I'd probably also try chopping up the bad file into pieces to see if it's a
>particular size (in lines or in bytes) that wc can't handle, or if there is
>some spot somewhere in the file that causes the crash by touching that spot.
>Maybe a loop that uses dd to break it up into 100k chunks, then wc -l each
>chunk to see if one crashes, ....
Most Unix systems I've used have had a 'split' utility [ and I used
to use that on old SCO systems when the 'vi' had about a 500K file
size limit] that will automatically do that so it obviates the need
to write a 'dd loop'.
Bill
--
Bill Vermillion - bv @ wjv . com
-
Re: Openserver 6.0 wc -l /usr/adm/syslog reboots system
Bill Vermillion wrote:
> In article ,
> Steve M. Fabac, Jr. wrote:
>> Jean-Pierre Radley wrote:
>>> ThreeStar typed (on Wed, Mar 07, 2007 at 05:14:05PM -0800):
>>> | The total size of the file might be more relevant than the number of
>>> | entries. 6.0 has special large file utilities for handling files >
>>> | 1Gb which it sounds like this might be.
>>> |
>>> | Prepend /u95/bin to your PATH. I would then just blow away syslog:
>>> | /u95/bin/cp /dev/null /usr/adm/syslog
>>> | Restart syslogd and then read it to see what all the errors are.
>>>
>>> Oh? I thought that the binaries for "large" files had to to with files
>>> exceeding 2Gb, not 1GB.
>>>
>> Even so, the size of syslog was 108M (about 10% of 1G). So not relevant.
>>
>> I copied syslog to syslog.old and zeroed out syslog in maintenance mode.
>>
>> wc -l syslog.old caused immediate reboot.
>>
>> cat syslog.old | wc -l returned 1.5M+ lines.
>>
>> examining syslog.old shows no recorded messages about any problems on the
>> hard disk.
>>
>> And attempting badtrk returns a message that the command is removed
>>from OS6 and the kernel is in charge of automatically managing bad
>> tracks.
>>
>> If this were a SCSI system, I would use the SCSI controller to perform
>> a "verify" (ala Adaptec 29160), but since it is IDE, I don't have that
>> tool to check the disk.
>
> And many IDE manufacturers have utility disks for download with
> which you can verify and entire drive, and if needed reformat and
> lock out bad sectors, very similar to SCSI utilities.
>
> Check the IDE vendors site. The disks are handy.
>
>> Replacing the disk is a shotgun approach to diagnosing the problem.
>
>> Anyone have any suggestions as to why a system command would
>> cause a reboot?
>
> Could there be something strange IN the syslog file?
I'm beginning to suspect that this is a DMA issue.
I see in the syslog that DMA has been disabled due to
too many failures (or words to that effect, not on-site
and I don't have the print out).
Other failures that I have noted include reboot when executing
mkisofs to create an ISO image containing a small subdirectory
with files I wanted to move to the replacement system.
But since the client has purchased a used Dell PowerEdge 2600,
we have abandoned trying to get this system repaired and have
concentrated on loading the OS on the replacement hardware.
There is the rub: The Dell has a PERC4/Di RAID on motherboard
controller and everything (every driver downloaded from Dell
and SCO) has failed to resolve the "No root disk controller"
issue.
Just last night while searching Google for PERC4/Di, I came
across a post to the Dell Forms that recommended that both
channels be set to RAID. That may be my problem as in the BIOS
only the first channel is set to RAID the second is set as SCSI.
This apparently was not a problem for Windows Server 2000
as when we first powered on the the machine, it booted to the
login screen. The machine had a DLT tape drive on the second
channel and I think that it was set to SCSI.
>
>
>
>