"Mysterious" system crashes - VMS
This is a discussion on "Mysterious" system crashes - VMS ; Hello all,
Environment:
Digital PWS433au
VMS V8.3 with the following patches:
UPDATE V4
SYS V5
LAN V2
TCPware V5.7-2, with all required patches
This is a home hobbyist system. I've been experiencing "mysterious"
system reboots each day for the past ...
-
"Mysterious" system crashes
Hello all,
Environment:
Digital PWS433au
VMS V8.3 with the following patches:
UPDATE V4
SYS V5
LAN V2
TCPware V5.7-2, with all required patches
This is a home hobbyist system. I've been experiencing "mysterious"
system reboots each day for the past four days (when I say "mysterious",
I mean that there is no CLUE$*.LIS files in sys$errorlog, and
no entries in the CLUE$history.dat file, either). There are no console
error messages that appear before the crashes.
These crashes happen while I am away at work, so I guess the system
misses me. :-) They do not happen at the same time each day, and
questioning of family members that are home at the time reveal no
unusual "power" glitches.
I recently patched the system each day after the crashes, with the OS
patches listed above. I have now downloaded SHADOWING V1 and FIBRE_SCSI
V3 (and will patch/reboot the system shortly), in a last desperate
attempt to abate the (seemingly) inevitable daily crash.
A quick scan of the ITRC OpenVMS forum entries for the past 3 weeks
reveal nothing.
Has anyone out there seen a "footprint" like this recently? My
apologies in advance for asking a technical question in this forum,
:-)
but I feel a sense of loyalty to this group, and I want to give you
folks first crack at this before I venture onto ITRC.
TIA
-
Re: "Mysterious" system crashes
On Nov 15, 5:23 pm, bradhamilton wrote:
> Hello all,
>
> Environment:
>
> Digital PWS433au
> VMS V8.3 with the following patches:
> UPDATE V4
> SYS V5
> LAN V2
> TCPware V5.7-2, with all required patches
>
> This is a home hobbyist system. I've been experiencing "mysterious"
> system reboots each day for the past four days (when I say "mysterious",
> I mean that there is no CLUE$*.LIS files in sys$errorlog, and
> no entries in the CLUE$history.dat file, either). There are no console
> error messages that appear before the crashes.
>
> These crashes happen while I am away at work, so I guess the system
> misses me. :-) They do not happen at the same time each day, and
> questioning of family members that are home at the time reveal no
> unusual "power" glitches.
>
> I recently patched the system each day after the crashes, with the OS
> patches listed above. I have now downloaded SHADOWING V1 and FIBRE_SCSI
> V3 (and will patch/reboot the system shortly), in a last desperate
> attempt to abate the (seemingly) inevitable daily crash.
>
> A quick scan of the ITRC OpenVMS forum entries for the past 3 weeks
> reveal nothing.
>
> Has anyone out there seen a "footprint" like this recently? My
> apologies in advance for asking a technical question in this forum,
>
> :-)
>
> but I feel a sense of loyalty to this group, and I want to give you
> folks first crack at this before I venture onto ITRC.
>
> TIA
My DS10-L acted like this with faulty memory DIMMs in it. I
understand a failing power supply _can_ do this also but in my case
the system hung rather than rebooted. Over time mine got worse and
worse until it would only start to boot before failing (and I found
out I had a faulty main logic board too after the DIMMs were replaced
so thats also a possibility).
I'd do a preventative maintenance. Open it up, clean out the dust
bunnies, clear the fans and grilles, reseat cards, connectors, and
cables, reseat the DIMMs, etc. As long as your careful it shouldn't
hurt and just might help.
-
Re: "Mysterious" system crashes
Rich Jordan wrote:
> On Nov 15, 5:23 pm, bradhamilton wrote:
[...]
> My DS10-L acted like this with faulty memory DIMMs in it. I
> understand a failing power supply _can_ do this also but in my case
> the system hung rather than rebooted. Over time mine got worse and
> worse until it would only start to boot before failing (and I found
> out I had a faulty main logic board too after the DIMMs were replaced
> so thats also a possibility).
>
> I'd do a preventative maintenance. Open it up, clean out the dust
> bunnies, clear the fans and grilles, reseat cards, connectors, and
> cables, reseat the DIMMs, etc. As long as your careful it shouldn't
> hurt and just might help.
Hi Rich,
Thanks for the suggestion - I'll schedule a PM this weekend, and I might
do a cursory external cleaning of the grilles and fan before then. Of
course, my system runs in a "clean-room" environment, with filtered
power supplies and plenty-o'-AC, so that *can't* be the problem.
:-)
-
Re: "Mysterious" system crashes
bradhamilton wrote:
> Hello all,
>
> Environment:
>
> Digital PWS433au
> VMS V8.3 with the following patches:
> UPDATE V4
> SYS V5
> LAN V2
> TCPware V5.7-2, with all required patches
>
> This is a home hobbyist system. I've been experiencing "mysterious"
> system reboots each day for the past four days (when I say "mysterious",
> I mean that there is no CLUE$*.LIS files in sys$errorlog, and
> no entries in the CLUE$history.dat file, either). There are no console
> error messages that appear before the crashes.
>
> These crashes happen while I am away at work, so I guess the system
> misses me. :-) They do not happen at the same time each day, and
> questioning of family members that are home at the time reveal no
> unusual "power" glitches.
>
> I recently patched the system each day after the crashes, with the OS
> patches listed above. I have now downloaded SHADOWING V1 and FIBRE_SCSI
> V3 (and will patch/reboot the system shortly), in a last desperate
> attempt to abate the (seemingly) inevitable daily crash.
>
> A quick scan of the ITRC OpenVMS forum entries for the past 3 weeks
> reveal nothing.
>
> Has anyone out there seen a "footprint" like this recently? My
> apologies in advance for asking a technical question in this forum,
>
> :-)
>
> but I feel a sense of loyalty to this group, and I want to give you
> folks first crack at this before I venture onto ITRC.
>
> TIA
Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
Is a dump being written to it?
Note that your family might not even notice a "power glitch" that could
cause a computer to reboot.
Invest in a UPS. Small units, suitable for PCs and workstations can
frequently be found at flea markets or yard sales. I've seen them up to
1000 VA. If you buy used you may have to replace the battery which,
depending on the size of the unit, may cost you anywhere from $25 to
$100 US. Note that automobile batteries are NOT suitable for this
service!!!
-
Re: "Mysterious" system crashes
Richard B. Gilbert wrote:
[...]
> Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
> Is a dump being written to it?
RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
Directory SYS$SYSROOT:[SYSEXE]
SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
12-NOV-2006 22:20
:02.28
Not for a year or so...
> Note that your family might not even notice a "power glitch" that could
> cause a computer to reboot.
True.
After the second question this evening, my wife asked, "No we haven't
had a power problem - why? Are you expecting one??" :-)
She always gets a glazed-over look in her eye when my explanations are
too detailed or technical, so I've learned to keep my questions (and
answers) simple. :-) :-)
> Invest in a UPS. Small units, suitable for PCs and workstations can
> frequently be found at flea markets or yard sales. I've seen them up to
> 1000 VA. If you buy used you may have to replace the battery which,
> depending on the size of the unit, may cost you anywhere from $25 to
> $100 US. Note that automobile batteries are NOT suitable for this
> service!!!
Yes, I've thought about the "small" APC units that I see for sale at HW
stores, but I've never experienced these problems until this week, so I
never saw a need for one. I guess I'll price some units (keeping in
mind that I need to match capacity with my estimated power draw from the
CPU, disks and disk shelves). My wife keeps asking me what I want for
Yule - perhaps I have a legitimate need...
-
Re: "Mysterious" system crashes
bradhamilton wrote:
> Richard B. Gilbert wrote:
> [...]
>
>> Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
>> Is a dump being written to it?
>
>
> RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
>
> Directory SYS$SYSROOT:[SYSEXE]
>
> SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
> 12-NOV-2006 22:20
> :02.28
>
> Not for a year or so...
>
>> Note that your family might not even notice a "power glitch" that could
>> cause a computer to reboot.
>
>
> True.
>
> After the second question this evening, my wife asked, "No we haven't
> had a power problem - why? Are you expecting one??" :-)
>
> She always gets a glazed-over look in her eye when my explanations are
> too detailed or technical, so I've learned to keep my questions (and
> answers) simple. :-) :-)
>
>> Invest in a UPS. Small units, suitable for PCs and workstations can
>> frequently be found at flea markets or yard sales. I've seen them up
>> to 1000 VA. If you buy used you may have to replace the battery
>> which, depending on the size of the unit, may cost you anywhere from
>> $25 to $100 US. Note that automobile batteries are NOT suitable for
>> this service!!!
>
>
> Yes, I've thought about the "small" APC units that I see for sale at HW
> stores, but I've never experienced these problems until this week, so I
> never saw a need for one. I guess I'll price some units (keeping in
> mind that I need to match capacity with my estimated power draw from the
> CPU, disks and disk shelves). My wife keeps asking me what I want for
> Yule - perhaps I have a legitimate need...
>
"Disk Shelves" implies a rather large installation. How many VA for the
whole thing?
-
Re: "Mysterious" system crashes
bradhamilton wrote:
> Hello all,
>
> Environment:
>
> Digital PWS433au
> VMS V8.3 with the following patches:
> UPDATE V4
> SYS V5
> LAN V2
> TCPware V5.7-2, with all required patches
>
> This is a home hobbyist system. I've been experiencing "mysterious"
> system reboots each day for the past four days (when I say "mysterious",
> I mean that there is no CLUE$*.LIS files in sys$errorlog, and
> no entries in the CLUE$history.dat file, either). There are no console
> error messages that appear before the crashes.
>
> These crashes happen while I am away at work, so I guess the system
> misses me. :-) They do not happen at the same time each day, and
> questioning of family members that are home at the time reveal no
> unusual "power" glitches.
>
> I recently patched the system each day after the crashes, with the OS
> patches listed above. I have now downloaded SHADOWING V1 and FIBRE_SCSI
> V3 (and will patch/reboot the system shortly), in a last desperate
> attempt to abate the (seemingly) inevitable daily crash.
>
> A quick scan of the ITRC OpenVMS forum entries for the past 3 weeks
> reveal nothing.
>
> Has anyone out there seen a "footprint" like this recently? My
> apologies in advance for asking a technical question in this forum,
>
> :-)
>
> but I feel a sense of loyalty to this group, and I want to give you
> folks first crack at this before I venture onto ITRC.
>
> TIA
Since you are not running a production system, I suggest that you run
some hardware diagnostics to try to "beat the hell" out of the CPU,
memory, and I/O bus to see if you get any failures when "VMS" is not
running. "Mysterious" system crashes suggest possible hardware
issue(s). In any event, it would be nice to KNOW that the cpu, memory,
and I/O controllers are "rock solid" before trying to track down any
possible flaky software. While you are at work, the hobbyist system has
plenty of time to .... When some things don't work 'right', it is
really helpful to know which ones of the pieces and parts ARE working
correctly. I suggest running the diagnostics 2 to 4 time as long as the
average failure time you are currently experiencing. And , yes, an
appropriate small UPS is a good suggestion to rule out power issues.
Good luck at isolating this problem!
Joe H. Gallagher, Ph. D.
Former DATATRIEVE/4GL SIG Chair/Newsletter Editor
Former WRUG LUG Chair
-
Re: "Mysterious" system crashes
Richard B. Gilbert wrote:
[...]
>> Yes, I've thought about the "small" APC units that I see for sale at
>> HW stores, but I've never experienced these problems until this week,
>> so I never saw a need for one. I guess I'll price some units (keeping
>> in mind that I need to match capacity with my estimated power draw
>> from the CPU, disks and disk shelves). My wife keeps asking me what I
>> want for Yule - perhaps I have a legitimate need...
>>
>
> "Disk Shelves" implies a rather large installation. How many VA for the
> whole thing?
PWS433au: 100-120 VAC 5.5A (550-660 VA?)
2 fully-populated BA356 (top-gun blue) shelves: 100-120 VAC 7.0A
(1400-1680 VA?)
For an approximate total of:
1950-2340 VA
Assuming my math is not too far off, then approximately $675-750 for an
APC rated at 2500 VA.
Out of my price range as a mere hobbyist. :-(
-
Re: "Mysterious" system crashes
In article <473CF41C.1060500@comcast.net>,
bradhamilton wrote:
> Richard B. Gilbert wrote:
> [...]
> > Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
> > Is a dump being written to it?
>
> RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
>
> Directory SYS$SYSROOT:[SYSEXE]
>
> SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
> 12-NOV-2006 22:20
> :02.28
>
> Not for a year or so...
Dunno if the modification date on the file is a good indicator.
Use ANALYZE/CRASH to see what, if anything, is in the dump file.
When you can tolerate downtime, force a crash, and verify that the dump
is written to the dump file as it should be. If there are any problems
with this, fix them.
-- Robert
-
Re: "Mysterious" system crashes
Now that I've had some time to think about this, another possibility
comes to mind...
The Alpha is located in a small workshop room, off of our home "office".
The office has no heating ducts, and is located next to the kitchen,
so to keep the office warm, we use a combination of:
Opening the kitchen door, and
using an oil-filled electric heater (one that looks like an
old-fashioned steam radiator).
My wife works nights, so she gets up between 3-5 PM, and opens the
kitchen door, and turns on the electric heater, to warm up the office.
She's at work right now, so I can't ask her, but tomorrow I'll ask her
what time she performed those activities on the last three days, and
I'll ask her to hold off the "ritual" until after I come home from work
tomorrow night...
Assuming my suspicion is correct, why did this only start happening a
few days ago? We've been running the heater in that "pattern" for a
month or so now...
-
Re: "Mysterious" system crashes
Robert Deininger wrote:
> In article <473CF41C.1060500@comcast.net>,
> bradhamilton wrote:
>
>> Richard B. Gilbert wrote:
>> [...]
>>> Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
>>> Is a dump being written to it?
>> RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
>>
>> Directory SYS$SYSROOT:[SYSEXE]
>>
>> SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
>> 12-NOV-2006 22:20
>> :02.28
>>
>> Not for a year or so...
>
> Dunno if the modification date on the file is a good indicator.
>
> Use ANALYZE/CRASH to see what, if anything, is in the dump file.
Hi Robert,
Just an old crash from July of this year (unrelated).
See my latest post in this thread for a possible hypothesis...
[...]
-
Re: "Mysterious" system crashes
bradhamilton wrote:
> Assuming my suspicion is correct, why did this only start happening a
> few days ago? We've been running the heater in that "pattern" for a
> month or so now...
Do you have an electronic voltage meter for AC lines ? Utilities
sometimes drop voltages when the power load is very high as this reduces
demand. (although this is not true for compact fluorescent and switched
power supplies, but was true of incadescent, ovens and conventional
electric heating).
A deffective/old transformer that serves your house may also end up
proviing lower than expected voltages, and if the computer's power suppl
y is getting old, it could explain.
Another possibility is lint accumulating in your computer's navel(s) and
when your wife turns on the heater, it causes the temperature in the
computer to rise to a point when it automatically shuts off. The
temperature range isn't very high between normal operating temperature
(about 40-48-C) and the default shutoff at 55°C. So if ventilation is
poor and you have a heater near the unit that causes warmer than usual
air being sucked into the computer, it could explain it.
-
Re: "Mysterious" system crashes
look in the errorlog as crashes should show as bug check and startup
entries
-
Re: "Mysterious" system crashes
IanMiller wrote:
> look in the errorlog as crashes should show as bug check and startup
> entries
BTW - thanks for reminding me - DIAG is not installed (I can't use WEBES
or such, because of the age of the system). ISTR that I've attempted to
install DIAG in the past, and met with failure, perhaps because of my
status as a hobbyist - wasn't DIAG "crippled" in some major way because
of the lack of a license (which implies a service contract)?
I'll hunt around to see if I still have a DIAA kit hanging around somewhere.
-
Re: "Mysterious" system crashes
IanMiller wrote:
> look in the errorlog as crashes should show as bug check and startup
> entries
Thanks for the reminder - DIAG installed OK, but isn't telling me much...
**** V3.4 ********************* ENTRY 119 ********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V8.3
Event sequence number 20.
Timestamp of occurrence 15-NOV-2007 17:04:03
Time since reboot 0 Day(s) 20:41:11
Host name RABBIT
System Model Digital Personal WorkStation
Entry Type 38. Time Stamp Entry
SWI Minor class 7. Timestamp
**** V3.4 ********************* ENTRY 120 ********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V8.3
Event sequence number 0.
Timestamp of occurrence 15-NOV-2007 17:12:24
Time since reboot 0 Day(s) 0:00:16
Host name RABBIT
System Model Digital Personal WorkStation
Entry Type 32. Cold Start (ie: System Boot)
SWI Minor class 2. System startup
TODR xB4225913
...unless xB4225913 can be translated to read, "power spike or
transient, don't *do* that, it hurts..."
:-)
-
Re: "Mysterious" system crashes
In article <473CF41C.1060500@comcast.net>, bradhamilton writes:
>
>
>Richard B. Gilbert wrote:
>[...]
>> Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
>> Is a dump being written to it?
>
>RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
>
>Directory SYS$SYSROOT:[SYSEXE]
>
>SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
>12-NOV-2006 22:20
>:02.28
>
>Not for a year or so...
>
>> Note that your family might not even notice a "power glitch" that could
>> cause a computer to reboot.
>
>True.
>
>After the second question this evening, my wife asked, "No we haven't
>had a power problem - why? Are you expecting one??" :-)
>
>She always gets a glazed-over look in her eye when my explanations are
>too detailed or technical, so I've learned to keep my questions (and
>answers) simple. :-) :-)
>
>> Invest in a UPS. Small units, suitable for PCs and workstations can
>> frequently be found at flea markets or yard sales. I've seen them up to
>> 1000 VA. If you buy used you may have to replace the battery which,
>> depending on the size of the unit, may cost you anywhere from $25 to
>> $100 US. Note that automobile batteries are NOT suitable for this
>> service!!!
>
>Yes, I've thought about the "small" APC units that I see for sale at HW
>stores, but I've never experienced these problems until this week, so I
>never saw a need for one. I guess I'll price some units (keeping in
>mind that I need to match capacity with my estimated power draw from the
>CPU, disks and disk shelves). My wife keeps asking me what I want for
>Yule - perhaps I have a legitimate need...
I have written and I have been selling (no, it hasn't made me wealthy)
software for VMS and APC's UPS units. I often do not see glitches in
line power that the APC and the software report. I can be working and,
if not for the email reports I have the software report, I would never
know there was a power event.
FWIW, get:
APC's Smart UPS
---------------
APC's Back UPS
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM
"Well my son, life is like a beanstalk, isn't it?"
http://tmesis.com/drat.html
-
Re: "Mysterious" system crashes
In article <9fb3d$473d39d1$cef8887a$31145@TEKSAVVY.COM>, JF Mezei writes:
>
>
>bradhamilton wrote:
>> Assuming my suspicion is correct, why did this only start happening a
>> few days ago? We've been running the heater in that "pattern" for a
>> month or so now...
>
>Do you have an electronic voltage meter for AC lines ? Utilities
>sometimes drop voltages when the power load is very high as this reduces
>demand. (although this is not true for compact fluorescent and switched
>power supplies, but was true of incadescent, ovens and conventional
>electric heating).
The software I wrote about in previous post in this thread can show the
voltage as monitored off of an APC Smart-UPS (or better) UPS. I have,
during the long hot spells of summer, seem the line voltage in my area
drop in excess of 10% nominal line voltage. I have my Smart-UPS 'trip'
when the voltage drops below 106Vac (with 117Vac being nominal voltage)
to provide the proper line voltage to my equipment.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM
"Well my son, life is like a beanstalk, isn't it?"
http://tmesis.com/drat.html
-
Re: "Mysterious" system crashes
In article <473D0765.2050904@comcast.net>,
bradhamilton wrote:
> Robert Deininger wrote:
> > In article <473CF41C.1060500@comcast.net>,
> > bradhamilton wrote:
> >
> >> Richard B. Gilbert wrote:
> >> [...]
> >>> Do you have a dump file? SYS$SYSTEM:SYSDUMP.DMP
> >>> Is a dump being written to it?
> >> RABBIT::SYSTEM$ dir/dat=(cre,mod) sys$system:sysdump.dmp
> >>
> >> Directory SYS$SYSROOT:[SYSEXE]
> >>
> >> SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
> >> 12-NOV-2006 22:20
> >> :02.28
> >>
> >> Not for a year or so...
> >
> > Dunno if the modification date on the file is a good indicator.
> >
> > Use ANALYZE/CRASH to see what, if anything, is in the dump file.
>
> Hi Robert,
>
> Just an old crash from July of this year (unrelated).
> See my latest post in this thread for a possible hypothesis...
> [...]
Unless you force a crash, and actually see the resulting crash dump in
the dump file, you can't rule out a configuration error. (See the
System Manager's Manual for advice on dump files and forcing a crash.)
This is 5 or 10 minutes work, and should be standard procedure whenever
VMS has been installed, upgraded, or reconfigured.
Another useful diagnostic in your situation would be a serial console,
connected to something that can log all the output.
You might be getting some useful information from the FW or from VMS,
but if you have a graphics console, and automatically reboot, the output
will be gone by the time you get home.
-- Robert
-
Re: "Mysterious" system crashes
In article <473CD50D.70109@comcast.net>, bradhamilton writes:
>
> These crashes happen while I am away at work, so I guess the system
> misses me. :-) They do not happen at the same time each day, and
> questioning of family members that are home at the time reveal no
> unusual "power" glitches.
>
What does analyze/crash sys$system:sysdump.dmp first report?
Minor power glitches, too small to be seen by most consumer
electronics, can disturb older VAXen.
-
Re: "Mysterious" system crashes
In article <473CF41C.1060500@comcast.net>, bradhamilton writes:
>
> SYSDUMP.DMP;2 245215/245248 28-DEC-2002 23:33:40.29
> 12-NOV-2006 22:20
> :02.28
>
The system crash code doesn't mess with anything it doesn't
absolutely have to. Ignore the dates.