OpenServer 6.0.0 hangs randomly. Please help! - SCO

This is a discussion on OpenServer 6.0.0 hangs randomly. Please help! - SCO ; Hi all, I have a customer with a big problem in a very critical server. SCO OpenServer 6.0.0 hangs randomly without any error message... The scenario: Server: ----------------------------------------------------- - Hewlett Packard ProLiant ML370-G4 - Two Intel Xeon @ 3.6 GHz, ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 24

Thread: OpenServer 6.0.0 hangs randomly. Please help!

  1. OpenServer 6.0.0 hangs randomly. Please help!


    Hi all,

    I have a customer with a big problem in a very critical server. SCO
    OpenServer 6.0.0 hangs randomly without any error message...

    The scenario:

    Server:
    -----------------------------------------------------

    - Hewlett Packard ProLiant ML370-G4
    - Two Intel Xeon @ 3.6 GHz, 3 GB RAM
    - Dual channel on-board LSI U320-SCSI controller
    - HP SmartArray 6404, four channel U320-SCSI RAID adapter
    - Two disks HP, 146 GB, RAID-1, installed in the server's hot-swap
    cabinet, connected to 6404's channel #1 (Logical Unit 1)
    - External cabinet Hewlett Packard StorageWorks Modular Smart Array 30
    (MSA30), dual bus, 14 bays. First bus (bays 1 to 7) connected to
    6404's channel #3. Second bus (bays 8 to 14) connected to 6404's
    channel #4
    - 6404's channel #2, not used
    - Four HP disks, 76 GB, U320-SCSI, RAID 0+1, located in bays 1, 2, 8,
    9 of MSA30 (Logical Unit 2)
    - Five HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 3, 4, 5,
    6, 7 of MSA30 (Logical Unit 3)
    - Three HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 10, 11,
    12 of MSA30 (Logical Unit 4)
    - Bays 13 and 14 are empty
    - HP StorageWorks Ultrium 460 SCSI Tape connected to LSI HBA
    - Redundant Power Supplies in the ML370 and MSA30
    - ML370 and MSA 30 plugged to an on-line Merlin Gerin UPS with two
    power cords each
    - All firmwares and bios (motherboard, HBAs, disks) updated to the
    latest version (HP FW Maintenance CD 7.90, 03/aug/2007)


    Operating System and Patches (custom & pkginfo)
    ----------------------------------------------------

    - SCO OpenServer 6.0.0, Enterprise Edition
    - Maintenance Pack 2 (MP2)
    - OSS703A
    - OSS706C
    - Network Drivers (nd 8.0.6e)
    - mtools 3.9.10Sa
    - Graphical User Interface (Qt)
    - Mozilla 1.7.13Ba
    - KDE 3.5.2
    - KDE i18n Language Support 3.5.2 (set to Spanish)
    - Java 2 Standard Edition Runtime Environment 1.4.2.06
    - Java 2 Standard Edition Runtime Environment 1.5.0.06
    - mpt driver (LSI) 8.0.2 (from OSR6 CD)
    - ciss driver (SmartArray 6404) 8.0.2 (from OSR6 CD)
    - Generic IDE/ATAPI Driver
    - HP ProLiant Support Pack 7.770a for SCO OpenServer 6 (latest at HP
    website), including:
    --- HP Proliant Extended Feature Supplement
    --- HP Proliant EFS Documentation Package 7.770a
    - Samba 3.0.20



    Logical Units
    --------------------------------------------

    Logical Unit 1 (RAID 1) contains root, swap and /u filesystems. /u -->
    TransTOOLs MultiBase 3.0 Database Engine
    Logical Unit 2 (RAID 0+1) contains one filesystem --> Progress 8.3B
    Database Engine
    Logical Unit 3 (RAID 5) contains three filesystems --> Files
    Logical Unit 4 (RAID 5) contains one filesystem --> Files


    About 50 PCs (Windows 2000/XP) running a Client-Server application
    with Progress Runtime
    A few PCs (Windows 2000/XP) running a Client-Server application with
    MultiBase Runtime
    About 20 HP Network Printers driven with "netcat"
    scologin is disabled and only runs from the shell with "startx".
    Graphical environment not used in daily work.



    And now the weird problem...


    System running flawlessly, and suddenly hangs. No Notices, Warnings or
    Panics in /usr/adm/messages and /usr/adm/syslog...

    No keyboard at console, no telnet connections... NO NOTHING! It just
    hangs!

    The only action allowed is to reset or power-cycle the server, enter
    in maintenance mode, full fsck all filesystems (about 45 min.) and
    running again... until the next crash.

    System installed last February 2007. Problems happening from the
    begining.

    Hewlett Packard replaced motherboard and power supplies, with no
    success.

    There is not a pattern for hangs. It may occur once in 15 days or
    twice in the same day.

    Same software and applications have been running without problems in
    an old HP NetServer LH 6000 for years, with OSR 5.0.5

    The only strange messages in /usr/adm/messages and /usr/adm/syslog are
    related to Samba. There are A LOT of error messages, but this happens
    in all OSR6 boxes I've installed and I think this must be treated in
    another thread.


    What can I do? I'm desperate, not to mention my customer...

    My apologies for long post.

    Any help will be very appreciated!!!

    Many thanks in advance and best regards.


    Greetings from Spain.
    --
    Alberto Rodriguez Rodriguez
    alberto----@----unilogic.es


  2. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On Tue, 21 Aug 2007, Alberto Rodriguez wrote:
    > Hi all,
    >
    > I have a customer with a big problem in a very critical server. SCO
    > OpenServer 6.0.0 hangs randomly without any error message...

    ....

    I have seen this happen with bad memory. That is why I have customer only
    use paritiy memory modules with it enabled. I have found that using a
    linux utility to check memory over 96 hours. I often find random memory
    problems cause what you are seeing.

    --
    Boyd Gerber
    ZENEZ 1042 East Fort Union #135, Midvale Utah 84047

  3. Re: OpenServer 6.0.0 hangs randomly. Please help!

    In article <1187725153.026239.209070@r29g2000hsg.googlegroups. com>,
    Alberto Rodriguez wrote:
    >
    >Hi all,
    >
    >I have a customer with a big problem in a very critical server. SCO
    >OpenServer 6.0.0 hangs randomly without any error message...


    I'll second what Boyd says.

    The HW vendor did NOT follow my specs and had a machine that would
    not recognize nor use ECC memory.

    We tried several things including putting in an AC recording line
    monitor. We looked around the area to see if anyone had a
    high-power transmitter that could be causing the problem.

    And - since I really thought it had ECC we never suspected that.

    Got the HW vendor to replace the MB with one that supported
    ECC and put in ECC and problem solved.

    I've also seen places where the computer is on a cirtuit that
    has some devices that make the line unstable. Such things
    as a new refrigerator in the lunch room next to the computer room,
    setting up PCs while on the other side of the wall was an incoming
    power panel, microwave ovens, and anything else that may be
    on the power line.

    And the strangest one I had was where we put in an excellent BEST
    power supply for a critical office [they thought they were critical
    and it was the secretaries of the President that were using this].

    Things were good for awhile and then unexpected reboots - and there
    were power-sags in that area that the BEST should have taken care
    of. I monitored it's output on install and we could see less than
    ..1 V difference in outpu with over a 20V swing on input.

    It turns out the secretaries re-arranged their office.

    The computer was plugged into the wall socket. Their cheap
    transistor radio was the only thing plugged into the $$$ BEST.

    As Boyd says - it really sounds like memory.

    If that doesn't fix it, go through the litany I posted above [
    which is far from complete]

    Bill
    Bill
    --
    Bill Vermillion - bv @ wjv . com

  4. Re: OpenServer 6.0.0 hangs randomly. Please help!

    In article ,
    Bill Campbell wrote:
    >On Tue, Aug 21, 2007, Bill Vermillion wrote:
    >>In article <1187725153.026239.209070@r29g2000hsg.googlegroups. com>,
    >>Alberto Rodriguez wrote:


    >>>Hi all,


    >>>I have a customer with a big problem in a very critical server. SCO
    >>>OpenServer 6.0.0 hangs randomly without any error message...


    >>I'll second what Boyd says.


    >>The HW vendor did NOT follow my specs and had a machine that would
    >>not recognize nor use ECC memory.


    >Bela and others went into long rants when Intel and others were
    >pushing the idea that we don't need parity checking RAM.


    Yup. And when I was running several SGI machines they really
    had error correcting RAM, and every now and then I'd see a message
    in the log file about a correction being made. That was much nicer
    than just parity checking to tell you that there was a problem.

    >>We tried several things including putting in an AC recording line
    >>monitor. We looked around the area to see if anyone had a
    >>high-power transmitter that could be causing the problem.


    >>And - since I really thought it had ECC we never suspected that.


    >>Got the HW vendor to replace the MB with one that supported
    >>ECC and put in ECC and problem solved.


    >>I've also seen places where the computer is on a cirtuit that
    >>has some devices that make the line unstable. Such things
    >>as a new refrigerator in the lunch room next to the computer room,
    >>setting up PCs while on the other side of the wall was an incoming
    >>power panel, microwave ovens, and anything else that may be
    >>on the power line.


    >Welding and machine shops are fun too.


    The only problem I had with a machine shop - which had HUGE
    Poreba lathes from Poland as nothing that big was being made in the
    US anymore. One lathe would take a piece 23 inches in diamter
    and FORTY FEET LONG - and cut threads on it - used for feedscrews
    in the plastic injection industry. He also had a gun-barrel
    drilling machine to put a hole through these 40 foot pieces, so
    that device had to have 80 feet of space to be able to get
    the drilling tool back far enough to get to the end.

    He's gotten out of that and is now running web-sites geared toward
    that industry and I have about 10 of those on a dedicated server
    he has in our Level 3 rack.

    But back to the story. I had just gotten a new UPS installed so
    the owner decided to test it by pulling the plug from the wall.

    Instantly everything crashed.

    The building was fairly large and the Wyse terminals in the far
    end were on a different circuit - and the serial board had
    pin 1 - frame ground - connected. {better quality multi-port
    serial boards had this pin disconnected].

    Since the terminal were in another part of the building they were
    on a different leg of the circuit and there was 110V coming down
    the pin 1 wire from the terminal back to the computer.

    Luckily nothing was damaged, but the next step was to make SURE
    that no frame grounds were EVER connected to the serial cables.

    >>And the strangest one I had was where we put in an excellent BEST
    >>power supply for a critical office [they thought they were critical
    >>and it was the secretaries of the President that were using this].


    >>Things were good for awhile and then unexpected reboots - and there
    >>were power-sags in that area that the BEST should have taken care
    >>of. I monitored it's output on install and we could see less than
    >>.1 V difference in outpu with over a 20V swing on input.


    >>It turns out the secretaries re-arranged their office.


    >>The computer was plugged into the wall socket. Their cheap
    >>transistor radio was the only thing plugged into the $$$ BEST.


    >Did the secretaries also have floppies stored on the side of filing
    >cabinets with refrigerator magnets?


    Not really. These two ladies had been there since the college
    started - about 25 years before. It took one lady a LONG time
    to stop using 'l' instead of '1' on the keyboard - as she grew up
    in the typewriter days when there was no '1' on the keyboard.

    But they were running Wyse 160 terminals - and I had two ports
    on those set up as the apps wanted entirely different terminal
    setups, and that was the best and easiest way to run things.
    They also liked being able to hot-key between applications.
    This was on SCO's Xenix and I had the Specialix RIO installed
    with a huge loop running through about 6 offices. I really liked
    that system as it was pretty much self healing. Lose a machine in
    the middle and it would re-route. Sort of like a mini sonet ring.

    >One of the strangest problems I came across was a Radio Shack
    >Model II that was intermittently failing. We finally traced
    >the problem to a bad ballast in the lighting fixture above the
    >computer.


    Argh. And I've had a problem - in the place above - where whoever
    tan the serial cables from room to room ran them over and along
    side the flouresecne lights.

    >In 1981 or 1982 I got a bunch of calls about strange problems,
    >which turned out to be the result of major sun spot activity.


    Ah yes. That was back when chips were quite susceptible to outside
    interference and the best chips were the ones in ceramic not
    plastic.

    I'm really glad we don't have problems like that anymore - at least
    not very often.

    Bill

    --
    Bill Vermillion - bv @ wjv . com

  5. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On 21 Aug, 21:39, Alberto Rodriguez wrote:
    > Hi all,
    >
    > I have a customer with a big problem in a very critical server. SCO
    > OpenServer 6.0.0 hangs randomly without any error message...

    [ Detail removed]
    > What can I do? I'm desperate, not to mention my customer...


    Alberto,

    As a start check out:

    http://wdb1.sco.com/kb/showta?taid=116163

    If the server is critical and the issue has been happening since last
    Februray
    I would recommend that you also escalate the issue to SCO Support via
    your
    support provider.

    John





  6. Re: OpenServer 6.0.0 hangs randomly. Please help!

    Alberto,

    I suggest you try the procedure here:
    http://osr600doc.sco.com/en/SM_troub...ing_crash.html

    It's been many years since I tried this myself, and it's going to
    lengthen the time to reboot after the next crash. But you will likely
    be able to isolate the issue.

    Or you can just replace the memory.

    Mark


    Alberto Rodriguez wrote:
    > Hi all,
    >
    > I have a customer with a big problem in a very critical server. SCO
    > OpenServer 6.0.0 hangs randomly without any error message...
    >
    > The scenario:
    >
    > Server:
    > -----------------------------------------------------
    >
    > - Hewlett Packard ProLiant ML370-G4
    > - Two Intel Xeon @ 3.6 GHz, 3 GB RAM
    > - Dual channel on-board LSI U320-SCSI controller
    > - HP SmartArray 6404, four channel U320-SCSI RAID adapter
    > - Two disks HP, 146 GB, RAID-1, installed in the server's hot-swap
    > cabinet, connected to 6404's channel #1 (Logical Unit 1)
    > - External cabinet Hewlett Packard StorageWorks Modular Smart Array 30
    > (MSA30), dual bus, 14 bays. First bus (bays 1 to 7) connected to
    > 6404's channel #3. Second bus (bays 8 to 14) connected to 6404's
    > channel #4
    > - 6404's channel #2, not used
    > - Four HP disks, 76 GB, U320-SCSI, RAID 0+1, located in bays 1, 2, 8,
    > 9 of MSA30 (Logical Unit 2)
    > - Five HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 3, 4, 5,
    > 6, 7 of MSA30 (Logical Unit 3)
    > - Three HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 10, 11,
    > 12 of MSA30 (Logical Unit 4)
    > - Bays 13 and 14 are empty
    > - HP StorageWorks Ultrium 460 SCSI Tape connected to LSI HBA
    > - Redundant Power Supplies in the ML370 and MSA30
    > - ML370 and MSA 30 plugged to an on-line Merlin Gerin UPS with two
    > power cords each
    > - All firmwares and bios (motherboard, HBAs, disks) updated to the
    > latest version (HP FW Maintenance CD 7.90, 03/aug/2007)
    >
    >
    > Operating System and Patches (custom & pkginfo)
    > ----------------------------------------------------
    >
    > - SCO OpenServer 6.0.0, Enterprise Edition
    > - Maintenance Pack 2 (MP2)
    > - OSS703A
    > - OSS706C
    > - Network Drivers (nd 8.0.6e)
    > - mtools 3.9.10Sa
    > - Graphical User Interface (Qt)
    > - Mozilla 1.7.13Ba
    > - KDE 3.5.2
    > - KDE i18n Language Support 3.5.2 (set to Spanish)
    > - Java 2 Standard Edition Runtime Environment 1.4.2.06
    > - Java 2 Standard Edition Runtime Environment 1.5.0.06
    > - mpt driver (LSI) 8.0.2 (from OSR6 CD)
    > - ciss driver (SmartArray 6404) 8.0.2 (from OSR6 CD)
    > - Generic IDE/ATAPI Driver
    > - HP ProLiant Support Pack 7.770a for SCO OpenServer 6 (latest at HP
    > website), including:
    > --- HP Proliant Extended Feature Supplement
    > --- HP Proliant EFS Documentation Package 7.770a
    > - Samba 3.0.20
    >
    >
    >
    > Logical Units
    > --------------------------------------------
    >
    > Logical Unit 1 (RAID 1) contains root, swap and /u filesystems. /u -->
    > TransTOOLs MultiBase 3.0 Database Engine
    > Logical Unit 2 (RAID 0+1) contains one filesystem --> Progress 8.3B
    > Database Engine
    > Logical Unit 3 (RAID 5) contains three filesystems --> Files
    > Logical Unit 4 (RAID 5) contains one filesystem --> Files
    >
    >
    > About 50 PCs (Windows 2000/XP) running a Client-Server application
    > with Progress Runtime
    > A few PCs (Windows 2000/XP) running a Client-Server application with
    > MultiBase Runtime
    > About 20 HP Network Printers driven with "netcat"
    > scologin is disabled and only runs from the shell with "startx".
    > Graphical environment not used in daily work.
    >
    >
    >
    > And now the weird problem...
    >
    >
    > System running flawlessly, and suddenly hangs. No Notices, Warnings or
    > Panics in /usr/adm/messages and /usr/adm/syslog...
    >
    > No keyboard at console, no telnet connections... NO NOTHING! It just
    > hangs!
    >
    > The only action allowed is to reset or power-cycle the server, enter
    > in maintenance mode, full fsck all filesystems (about 45 min.) and
    > running again... until the next crash.
    >
    > System installed last February 2007. Problems happening from the
    > begining.
    >
    > Hewlett Packard replaced motherboard and power supplies, with no
    > success.
    >
    > There is not a pattern for hangs. It may occur once in 15 days or
    > twice in the same day.
    >
    > Same software and applications have been running without problems in
    > an old HP NetServer LH 6000 for years, with OSR 5.0.5
    >
    > The only strange messages in /usr/adm/messages and /usr/adm/syslog are
    > related to Samba. There are A LOT of error messages, but this happens
    > in all OSR6 boxes I've installed and I think this must be treated in
    > another thread.
    >
    >
    > What can I do? I'm desperate, not to mention my customer...
    >
    > My apologies for long post.
    >
    > Any help will be very appreciated!!!
    >
    > Many thanks in advance and best regards.
    >
    >
    > Greetings from Spain.
    > --
    > Alberto Rodriguez Rodriguez
    > alberto----@----unilogic.es



  7. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On Aug 21, 3:39 pm, Alberto Rodriguez wrote:
    > Hi all,
    >
    > I have a customer with a big problem in a very critical server. SCO
    > OpenServer 6.0.0 hangs randomly without any error message...
    >
    > The scenario:
    >
    > Server:
    > -----------------------------------------------------
    >
    > - Hewlett Packard ProLiant ML370-G4
    > - Two Intel Xeon @ 3.6 GHz, 3 GB RAM
    > - Dual channel on-board LSI U320-SCSI controller
    > - HP SmartArray 6404, four channel U320-SCSI RAID adapter
    > - Two disks HP, 146 GB, RAID-1, installed in the server's hot-swap
    > cabinet, connected to 6404's channel #1 (Logical Unit 1)
    > - External cabinet Hewlett Packard StorageWorks Modular Smart Array 30
    > (MSA30), dual bus, 14 bays. First bus (bays 1 to 7) connected to
    > 6404's channel #3. Second bus (bays 8 to 14) connected to 6404's
    > channel #4
    > - 6404's channel #2, not used
    > - Four HP disks, 76 GB, U320-SCSI, RAID 0+1, located in bays 1, 2, 8,
    > 9 of MSA30 (Logical Unit 2)
    > - Five HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 3, 4, 5,
    > 6, 7 of MSA30 (Logical Unit 3)
    > - Three HP disks, 300 GB, U320-SCSI, RAID 5, located in bays 10, 11,
    > 12 of MSA30 (Logical Unit 4)
    > - Bays 13 and 14 are empty
    > - HP StorageWorks Ultrium 460 SCSI Tape connected to LSI HBA
    > - Redundant Power Supplies in the ML370 and MSA30
    > - ML370 and MSA 30 plugged to an on-line Merlin Gerin UPS with two
    > power cords each
    > - All firmwares and bios (motherboard, HBAs, disks) updated to the
    > latest version (HP FW Maintenance CD 7.90, 03/aug/2007)
    >
    > Operating System and Patches (custom & pkginfo)
    > ----------------------------------------------------
    >
    > - SCO OpenServer 6.0.0, Enterprise Edition
    > - Maintenance Pack 2 (MP2)
    > - OSS703A
    > - OSS706C
    > - Network Drivers (nd 8.0.6e)
    > - mtools 3.9.10Sa
    > - Graphical User Interface (Qt)
    > - Mozilla 1.7.13Ba
    > - KDE 3.5.2
    > - KDE i18n Language Support 3.5.2 (set to Spanish)
    > - Java 2 Standard Edition Runtime Environment 1.4.2.06
    > - Java 2 Standard Edition Runtime Environment 1.5.0.06
    > - mpt driver (LSI) 8.0.2 (from OSR6 CD)
    > - ciss driver (SmartArray 6404) 8.0.2 (from OSR6 CD)
    > - Generic IDE/ATAPI Driver
    > - HP ProLiant Support Pack 7.770a for SCO OpenServer 6 (latest at HP
    > website), including:
    > --- HP Proliant Extended Feature Supplement
    > --- HP Proliant EFS Documentation Package 7.770a
    > - Samba 3.0.20
    >
    > Logical Units
    > --------------------------------------------
    >
    > Logical Unit 1 (RAID 1) contains root, swap and /u filesystems. /u -->
    > TransTOOLs MultiBase 3.0 Database Engine
    > Logical Unit 2 (RAID 0+1) contains one filesystem --> Progress 8.3B
    > Database Engine
    > Logical Unit 3 (RAID 5) contains three filesystems --> Files
    > Logical Unit 4 (RAID 5) contains one filesystem --> Files
    >
    > About 50 PCs (Windows 2000/XP) running a Client-Server application
    > with Progress Runtime
    > A few PCs (Windows 2000/XP) running a Client-Server application with
    > MultiBase Runtime
    > About 20 HP Network Printers driven with "netcat"
    > scologin is disabled and only runs from the shell with "startx".
    > Graphical environment not used in daily work.
    >
    > And now the weird problem...
    >
    > System running flawlessly, and suddenly hangs. No Notices, Warnings or
    > Panics in /usr/adm/messages and /usr/adm/syslog...
    >
    > No keyboard at console, no telnet connections... NO NOTHING! It just
    > hangs!
    >
    > The only action allowed is to reset or power-cycle the server, enter
    > in maintenance mode, full fsck all filesystems (about 45 min.) and
    > running again... until the next crash.
    >
    > System installed last February 2007. Problems happening from the
    > begining.
    >
    > Hewlett Packard replaced motherboard and power supplies, with no
    > success.
    >
    > There is not a pattern for hangs. It may occur once in 15 days or
    > twice in the same day.
    >
    > Same software and applications have been running without problems in
    > an old HP NetServer LH 6000 for years, with OSR 5.0.5
    >
    > The only strange messages in /usr/adm/messages and /usr/adm/syslog are
    > related to Samba. There are A LOT of error messages, but this happens
    > in all OSR6 boxes I've installed and I think this must be treated in
    > another thread.
    >
    > What can I do? I'm desperate, not to mention my customer...
    >
    > My apologies for long post.
    >
    > Any help will be very appreciated!!!
    >
    > Many thanks in advance and best regards.
    >
    > Greetings from Spain.
    > --
    > Alberto Rodriguez Rodriguez
    > alberto-...@----unilogic.es


    Check the file /stand/boot. Make sure a line "KHZ=100" is there, or
    add it and reboot.


    Mike


  8. Re: OpenServer 6.0.0 hangs randomly. Please help!

    Hi all,

    THANK YOU to all of you guys for your interest and fast answers. You
    are GREAT!!!

    John wrote:
    >
    > As a start check out:
    >
    > http://wdb1.sco.com/kb/showta?taid=116163




    Mike wrote:
    >
    > Check the file /stand/boot. Make sure a line "KHZ=100" is there, or
    > add it and reboot.
    >
    > Mike



    Following the link supplied by John, there is another link:
    (http://www.sco.com/ta/126735) relative to UnixWare 7.1.4 in wich
    it's explained the behaviour of the parameter KHZ (also valid for
    OSR6).

    This parameter is modified from KHZ=100 to KHZ=1000 in OpenServer
    6.0.0 when patch OSS706B or OSS706C are loaded (not OSS706A).

    It is known that this change "... may be related to some system hangs
    or reboots reported to SCO, as described above. These issues are under
    investigation."

    Last ptf9052g for UW 7.1.4 resets this parameter to its original value
    (KHZ=100)

    So I agree with Mike. This may be the origin of the trouble. I've
    added KHZ=100 to /etc/default/boot, and the server will be rebooted
    this midnight. I must wait for resluts in the next days. I'll post the
    results.

    To all that speak about memory problems, I forgot to say that memory
    in this server is ECC memory from HP, 6 DIMMs, 512 MB each, DDR2.
    Memory, motherboard, RAID controllers, etc. have been tested with test
    programs supplied by HP without errors (not tested 96 hours as Boyd
    proposed...)

    Thank you all again. I'll keep you informed.

    Best regards.
    --
    Alberto Rodriguez Rodriguez







  9. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On Aug 22, 3:52 pm, Alberto Rodriguez wrote:
    > Hi all,
    >
    > THANK YOU to all of you guys for your interest and fast answers. You
    > are GREAT!!!
    >
    > John wrote:
    >
    > > As a start check out:

    >
    > >http://wdb1.sco.com/kb/showta?taid=116163

    > Mike wrote:
    >
    > > Check the file /stand/boot. Make sure a line "KHZ=100" is there, or
    > > add it and reboot.

    >
    > > Mike

    >
    > Following the link supplied by John, there is another link:
    > (http://www.sco.com/ta/126735) relative to UnixWare 7.1.4 in wich
    > it's explained the behaviour of the parameter KHZ (also valid for
    > OSR6).
    >
    > This parameter is modified from KHZ=100 to KHZ=1000 in OpenServer
    > 6.0.0 when patch OSS706B or OSS706C are loaded (not OSS706A).
    >
    > It is known that this change "... may be related to some system hangs
    > or reboots reported to SCO, as described above. These issues are under
    > investigation."
    >
    > Last ptf9052g for UW 7.1.4 resets this parameter to its original value
    > (KHZ=100)
    >
    > So I agree with Mike. This may be the origin of the trouble. I've
    > added KHZ=100 to /etc/default/boot, and the server will be rebooted
    > this midnight. I must wait for resluts in the next days. I'll post the
    > results.
    >
    > To all that speak about memory problems, I forgot to say that memory
    > in this server is ECC memory from HP, 6 DIMMs, 512 MB each, DDR2.
    > Memory, motherboard, RAID controllers, etc. have been tested with test
    > programs supplied by HP without errors (not tested 96 hours as Boyd
    > proposed...)
    >
    > Thank you all again. I'll keep you informed.
    >
    > Best regards.
    > --
    > Alberto Rodriguez Rodriguez


    Hi Alberto,

    the ML370G4 has a very robust advanced ECC memory subsystem,
    I suppose hardware problems are possible but they would be recorded
    in the HW log. If you have the Management Agents loaded you can
    access the log through a web browser to port 2301. I think it is the
    KHZ tunable, I have seen the same issue on other servers.

    Here is a 10 page PDF on the ProLiant 300 series memory features:

    http://h20000.www2.hp.com/bizsupport...Fc00218059.pdf

    Mike


  10. Re: OpenServer 6.0.0 hangs randomly. Please help!


    ----- Original Message -----
    From: "Bill Campbell"
    Newsgroups: comp.unix.sco.misc
    To:
    Sent: Tuesday, August 21, 2007 7:12 PM
    Subject: Re: OpenServer 6.0.0 hangs randomly. Please help!


    > On Tue, Aug 21, 2007, Bill Vermillion wrote:
    >>In article <1187725153.026239.209070@r29g2000hsg.googlegroups. com>,
    >>Alberto Rodriguez wrote:
    >>>
    >>>Hi all,
    >>>
    >>>I have a customer with a big problem in a very critical server. SCO
    >>>OpenServer 6.0.0 hangs randomly without any error message...

    >>
    >>I'll second what Boyd says.
    >>
    >>The HW vendor did NOT follow my specs and had a machine that would
    >>not recognize nor use ECC memory.

    >
    > Bela and others went into long rants when Intel and others were pushing
    > the
    > idea that we don't need parity checking RAM.


    Actually why weren't they right?
    Yes you need more robust memory solutions, but why couldn't you impliment
    the parity checking and-or error correcting in the chipset using extra
    sticks of any-old ram instead of building it into each stick of ram?
    Ram-raid as it were. Which I think they have actually. That would be quite a
    big chunk of overhead off our backs and better for everyone if a whole type
    of ram and it's design and manufacture chain went completely away and those
    resources just went into making more regular ram.

    Brian K. White brian@aljex.com http://www.myspace.com/KEYofR
    +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!


  11. Re: OpenServer 6.0.0 hangs randomly. Please help!

    Brian K. White wrote:

    > From: "Bill Campbell"
    >
    > > On Tue, Aug 21, 2007, Bill Vermillion wrote:
    > >>
    > >>The HW vendor did NOT follow my specs and had a machine that would
    > >>not recognize nor use ECC memory.

    > >
    > > Bela and others went into long rants when Intel and others were pushing
    > > the idea that we don't need parity checking RAM.

    >
    > Actually why weren't they right?
    > Yes you need more robust memory solutions, but why couldn't you impliment
    > the parity checking and-or error correcting in the chipset using extra
    > sticks of any-old ram instead of building it into each stick of ram?
    > Ram-raid as it were. Which I think they have actually. That would be quite a
    > big chunk of overhead off our backs and better for everyone if a whole type
    > of ram and it's design and manufacture chain went completely away and those
    > resources just went into making more regular ram.


    Such a scheme would have met my requirements -- it's still a parity or
    ECC scheme, even if the memory sticks you install into the machine don't
    have extra parity bits.

    But... your scheme basically wouldn't work. It would limit the
    amount of RAM in the machine to the amount that could be covered by
    the parity/ECC RAM bits in the chipset (so you would have to buy a
    fancier chipset model for a larger machine). Worse, you would need
    a different speed of chipset internal parity/ECC RAM for each speed
    of external memory stick (look how much trouble ensues when you use
    mismatched sticks in a single machine, even if they're the _same_
    speed and possibly even the same SKU, but different batches from the
    manufacturer...) Worst of all, the timing for it just wouldn't work
    very well and you'd end up with a flaky design.

    And all that just to save 1 bit in 9. It's not "quite a big chunk of
    overhead". If parity/ECC RAM is more expensive than regular RAM, it
    certainly isn't because of the 1/9 extra hardware, it's due to low
    demand because the chipset and system designers don't use it.

    (I was going to say something here about how a certain large
    hardware-spec-setting OS vendor should push parity or ECC for all
    systems (down to desktops & laptops) so they could show us how much of
    the instability really isn't their fault at all, it's the memory -- then
    it occurred to me that perhaps it wouldn't make much difference and
    would only further highlight the true source of instability...)

    >Bela<


  12. Re: OpenServer 6.0.0 hangs randomly. Please help!

    This is very similar to my issue.

    I have a ML350 G5, put into service around the first of June.

    It randomly locks up, I still have access to the terminal, but network
    shuts down. Nothing can come in nothing can go out.

    Currently, have a new motherboard sitting in my floor I plan to swap
    that out tomorrow and see what that does.

    Hate to say this, but I'm glad I'm not the only one with issues.

    Alberto, please keep us/me informed as I will also update on my
    situation when I know something

    -David


  13. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On 29 ago, 20:58, "David C. Moody" wrote:
    > This is very similar to my issue.
    >
    > I have a ML350 G5, put into service around the first of June.
    >
    > It randomly locks up, I still have access to the terminal, but network
    > shuts down. Nothing can come in nothing can go out.
    >
    > Currently, have a new motherboard sitting in my floor I plan to swap
    > that out tomorrow and see what that does.
    >
    > Hate to say this, but I'm glad I'm not the only one with issues.
    >
    > Alberto, please keep us/me informed as I will also update on my
    > situation when I know something
    >
    > -David


    David,

    The server is now seven days up and running without problems, from the
    day that I set the parameter KHZ=100 in /stand/boot.

    But I think it's soon for bells and whistles. I should wait for at
    least three or four weeks without hangs to think that things are OK.

    Anyway, I'll keep you informed weekly.

    Best regards
    --
    Alberto Rodriguez Rodriguez


  14. Re: OpenServer 6.0.0 hangs randomly. Please help!

    Eighteen days without hangs!!!

    --
    Alberto Rodriguez Rodriguez


  15. Re: OpenServer 6.0.0 hangs randomly. Please help!

    Just an update from me..

    I tried the KHZ=100 parameter and my problem remains the same. Every
    3-5 days my network connection shuts down and I cannot access anything
    via the NIC card.

    I've already replaced the motherboard (built-in NIC), now HP is
    sending me through another array of tests.

    This is just ridiculous.

    Any other help would be greatly appreciated.

    -David


  16. Re: OpenServer 6.0.0 hangs randomly. Please help!

    In article <1189473775.705261.268090@w3g2000hsg.googlegroups.c om>,
    David C. Moody wrote:
    >Just an update from me..
    >
    >I tried the KHZ=100 parameter and my problem remains the same. Every
    >3-5 days my network connection shuts down and I cannot access anything
    >via the NIC card.
    >
    >I've already replaced the motherboard (built-in NIC), now HP is
    >sending me through another array of tests.
    >
    >This is just ridiculous.
    >
    >Any other help would be greatly appreciated.
    >
    >-David


    How about some more information. I had a client with
    a Sonic-Wall [I think that was it - I get some clients confused]
    who would set the system on the weekend so the owner could log in
    from home, and then on Monday they could not access the network.

    So they would reboot the machine. After doing this several times
    they called me, and the fix for them was 1) don't do this [which
    was not a solution] or 2) just restart the network.

    I just wrote a 2 or 3 line script so they did not have to remember
    the CLI interface other than just login in and type one command.

    What do the network stats show when you can't connect. eg IP
    number, and any other messages. Check them again after you
    restart and see what happens.

    What is your network connection type. direct-link such
    as shared or dedicated T1, ?DSL, Cable modem, whatever.

    I really expect some external changes are causing this and not
    anything internal to SCO.

    Bill


    --
    Bill Vermillion - bv @ wjv . com

  17. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On Sep 11, 8:02 am, b...@wjv.com (Bill Vermillion) wrote:
    > In article <1189473775.705261.268...@w3g2000hsg.googlegroups.c om>,
    > David C. Moody wrote:
    >
    > >Just an update from me..

    >
    > >I tried the KHZ=100 parameter and my problem remains the same. Every
    > >3-5 days my network connection shuts down and I cannot access anything
    > >via the NIC card.

    >
    > >I've already replaced the motherboard (built-in NIC), now HP is
    > >sending me through another array of tests.

    >
    > >This is just ridiculous.

    >
    > >Any other help would be greatly appreciated.

    >
    > >-David

    >
    > How about some more information. I had a client with
    > a Sonic-Wall [I think that was it - I get some clients confused]
    > who would set the system on the weekend so the owner could log in
    > from home, and then on Monday they could not access the network.
    >
    > So they would reboot the machine. After doing this several times
    > they called me, and the fix for them was 1) don't do this [which
    > was not a solution] or 2) just restart the network.
    >
    > I just wrote a 2 or 3 line script so they did not have to remember
    > the CLI interface other than just login in and type one command.
    >
    > What do the network stats show when you can't connect. eg IP
    > number, and any other messages. Check them again after you
    > restart and see what happens.
    >
    > What is your network connection type. direct-link such
    > as shared or dedicated T1, ?DSL, Cable modem, whatever.
    >
    > I really expect some external changes are causing this and not
    > anything internal to SCO.
    >
    > Bill
    >
    > --
    > Bill Vermillion - bv @ wjv . com


    Hi Bill,

    I have a Sonicwall Pro3060 on my network I have it for years. I do
    allow VPN access, but the SCO box loses network connectivity
    randomly. I will have to try a netstat command next time to see what
    it is doing. Any other commands you would like to see?

    This has nothing to do with VPN access, VPN access is provided 24/7
    and is used very frequently by several users. So I'm at a loss, this
    is just becoming annoying.

    I have a ML350 G3, that is running SCO6 with NO problems at all. It's
    just my G5, and they are configured exactly the same, software, etc.

    Now HP wants me to uninstall all the EFS drivers and reinstall them
    thinking that I got a screwed up version. Guess I will try that this
    weekend.

    I'm looking for any help anyone can give me. Can anyone tell me what
    command to issue to see what driver the network interface is using?
    Or where to go look for it? That's another thing HP wanted me to look
    at, but the commands they were giving me were all linux commands and
    not avail on SCO.

    Thanks,
    -David


  18. Re: OpenServer 6.0.0 hangs randomly. Please help!

    On Sep 13, 10:31 am, "David C. Moody" wrote:
    > On Sep 11, 8:02 am, b...@wjv.com (Bill Vermillion) wrote:
    >
    >
    >
    > > In article <1189473775.705261.268...@w3g2000hsg.googlegroups.c om>,
    > > David C. Moody wrote:

    >
    > > >Just an update from me..

    >
    > > >I tried the KHZ=100 parameter and my problem remains the same. Every
    > > >3-5 days my network connection shuts down and I cannot access anything
    > > >via the NIC card.

    >
    > > >I've already replaced the motherboard (built-in NIC), now HP is
    > > >sending me through another array of tests.

    >
    > > >This is just ridiculous.

    >
    > > >Any other help would be greatly appreciated.

    >
    > > >-David

    >
    > > How about some more information. I had a client with
    > > a Sonic-Wall [I think that was it - I get some clients confused]
    > > who would set the system on the weekend so the owner could log in
    > > from home, and then on Monday they could not access the network.

    >
    > > So they would reboot the machine. After doing this several times
    > > they called me, and the fix for them was 1) don't do this [which
    > > was not a solution] or 2) just restart the network.

    >
    > > I just wrote a 2 or 3 line script so they did not have to remember
    > > the CLI interface other than just login in and type one command.

    >
    > > What do the network stats show when you can't connect. eg IP
    > > number, and any other messages. Check them again after you
    > > restart and see what happens.

    >
    > > What is your network connection type. direct-link such
    > > as shared or dedicated T1, ?DSL, Cable modem, whatever.

    >
    > > I really expect some external changes are causing this and not
    > > anything internal to SCO.

    >
    > > Bill

    >
    > > --
    > > Bill Vermillion - bv @ wjv . com

    >
    > Hi Bill,
    >
    > I have a Sonicwall Pro3060 on my network I have it for years. I do
    > allow VPN access, but the SCO box loses network connectivity
    > randomly. I will have to try a netstat command next time to see what
    > it is doing. Any other commands you would like to see?
    >
    > This has nothing to do with VPN access, VPN access is provided 24/7
    > and is used very frequently by several users. So I'm at a loss, this
    > is just becoming annoying.
    >
    > I have a ML350 G3, that is running SCO6 with NO problems at all. It's
    > just my G5, and they are configured exactly the same, software, etc.
    >
    > Now HP wants me to uninstall all the EFS drivers and reinstall them
    > thinking that I got a screwed up version. Guess I will try that this
    > weekend.
    >
    > I'm looking for any help anyone can give me. Can anyone tell me what
    > command to issue to see what driver the network interface is using?
    > Or where to go look for it? That's another thing HP wanted me to look
    > at, but the commands they were giving me were all linux commands and
    > not avail on SCO.
    >
    > Thanks,
    > -David


    HP's off-source help desk often doesn't even know their own products,
    much less OS's that run on them. SCO recommends using their NIC
    drivers rather than HP's. TA 116163, which despite its title also
    applies to OS6. So I don't think re-installing EFS is going to help
    and may hurt, given how finicky OpenServer is about un-installs.

    The OS6 netconfig utility is pretty good at auto-detecting NICs and
    applying the right driver. It's only important that you have
    installed the latest SCO ND's.

    I had reports of an OS 6 box developing periodic network problems.
    Looking into it I found that the problems began when the Network Admin
    decided to segment the local network using M$ ISA's. The problems
    went away when he put all boxes back in the same segment. So, yeah,
    I'm in the camp that suspects something external to the box.

    --RLR


  19. Re: OpenServer 6.0.0 hangs randomly. Please help!

    In article <1189704716.209832.186910@d55g2000hsg.googlegroups. com>,
    David C. Moody wrote:
    >On Sep 11, 8:02 am, b...@wjv.com (Bill Vermillion) wrote:
    >> In article <1189473775.705261.268...@w3g2000hsg.googlegroups.c om>,
    >> David C. Moody wrote:
    >>
    >> >Just an update from me..

    >>
    >> >I tried the KHZ=100 parameter and my problem remains the same. Every
    >> >3-5 days my network connection shuts down and I cannot access anything
    >> >via the NIC card.

    >>
    >> >I've already replaced the motherboard (built-in NIC), now HP is
    >> >sending me through another array of tests.

    >>
    >> >This is just ridiculous.

    >>
    >> >Any other help would be greatly appreciated.

    >>
    >> >-David

    >>
    >> How about some more information. I had a client with
    >> a Sonic-Wall [I think that was it - I get some clients confused]
    >> who would set the system on the weekend so the owner could log in
    >> from home, and then on Monday they could not access the network.
    >>
    >> So they would reboot the machine. After doing this several times
    >> they called me, and the fix for them was 1) don't do this [which
    >> was not a solution] or 2) just restart the network.
    >>
    >> I just wrote a 2 or 3 line script so they did not have to remember
    >> the CLI interface other than just login in and type one command.
    >>
    >> What do the network stats show when you can't connect. eg IP
    >> number, and any other messages. Check them again after you
    >> restart and see what happens.
    >>
    >> What is your network connection type. direct-link such
    >> as shared or dedicated T1, ?DSL, Cable modem, whatever.
    >>
    >> I really expect some external changes are causing this and not
    >> anything internal to SCO.
    >>
    >> Bill
    >>
    >> --
    >> Bill Vermillion - bv @ wjv . com

    >
    >Hi Bill,


    >I have a Sonicwall Pro3060 on my network I have it for years. I do
    >allow VPN access, but the SCO box loses network connectivity
    >randomly. I will have to try a netstat command next time to see what
    >it is doing. Any other commands you would like to see?


    Run your netstat commands and run the arp commands. Do it when all
    works and then when it fails. It could be that something is
    usurping the IP of the SCO machine.

    >This has nothing to do with VPN access, VPN access is provided 24/7
    >and is used very frequently by several users. So I'm at a loss, this
    >is just becoming annoying.


    On the VPN using the Sonic people could still log in, but the SCO
    was disconnected. Restarting the tcp daemon fixed it. No need to
    reboot. I just built a small script with an easy name so that
    when things went away they could just login and type

    >I have a ML350 G3, that is running SCO6 with NO problems at all. It's
    >just my G5, and they are configured exactly the same, software, etc.


    Something is different.

    >Now HP wants me to uninstall all the EFS drivers and reinstall them
    >thinking that I got a screwed up version. Guess I will try that this
    >weekend.


    That makes no sense. Software/settings should not change
    dynmaically unless there is some outside influence.


    >I'm looking for any help anyone can give me. Can anyone tell me what
    >command to issue to see what driver the network interface is using?
    >Or where to go look for it? That's another thing HP wanted me to look
    >at, but the commands they were giving me were all linux commands and
    >not avail on SCO.


    If it works part of the time then I would not expect the driver.

    I really suspect something is grabbing the SCO's IP, or that
    there is something that may be disconnecting the SCO from the
    network.

    Again - look at the network status before and after and also
    look at the ARP commands. And perform the arp on the SCO
    machine >>AND<< other machines on the network and compare the
    output. Note the MAC addresses to make sure they are the same.

    Also check to see on what the SCO machine is connected to, as if
    this goes away, loses power, inaccessbie in any way, that could
    also cause the problems.

    I can't give you the exact commands as I don't have access to an
    SCO machine at the moment.

    And just what commands were being given to you by HP.
    Post them and we can give you real Unix equivalents :-)

    Bill
    --
    Bill Vermillion - bv @ wjv . com

  20. Re: OpenServer 6.0.0 hangs randomly. Please help!


    Twenty five days without hangs!!!

    (I'm becoming a happy man...)

    --
    Alberto Rodriguez Rodriguez


+ Reply to Thread
Page 1 of 2 1 2 LastLast