Re: Beginning to think about VMware and SCO 5.0.5 - SCO

This is a discussion on Re: Beginning to think about VMware and SCO 5.0.5 - SCO ; Bob Bailin wrote: > Steve, > > Being in a similar hardware situation as your client, > (Gigabyte dual PIII @ 1.4GHz, 512GB, DPT 3420 with 256MB > cache, a RAID 10 with 4 Seagate 18GB drives), I would suggest ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Re: Beginning to think about VMware and SCO 5.0.5

  1. Re: Beginning to think about VMware and SCO 5.0.5

    Bob Bailin wrote:
    > Steve,
    >
    > Being in a similar hardware situation as your client,
    > (Gigabyte dual PIII @ 1.4GHz, 512GB, DPT 3420 with 256MB
    > cache, a RAID 10 with 4 Seagate 18GB drives), I would suggest that
    > keeping the underlying hardware similar to what you
    > have now would be the simplest solution.
    >
    > Adding VMware running under some version of Linux
    > adds 2 extra layers of complexity that someone's going
    > to have to deal with in the future. You're able to deal
    > with this client's 8 yr old system confidently because
    > it's very straightforward and easily understood after
    > all these years. Do you think the system you're proposing
    > will be as understandable 8 years from now?


    Agreed.

    A new box (actually two new boxes) running SCO 5.0.7 should still
    be good. VMWare, I don't know. And I don't know the impact of
    virtualized hardware on SCO. Somewhere I read someone running SCO under
    VMWare was reporting better performance then the same SCO OS
    installed natively on the same hardware. (Might have been 5.0.5 which
    will not run at full speed on P4's.)

    >
    > You didn't mention the number of users on this system
    > or whether they consider the current performance
    > adequate. Upgrading to a dual or quad-core Core 2


    They have 28-30 logged in users with SAR showing 96-97%
    idle average throughout the day except during a period
    in the morning and at 17:00 where daily reports are run
    from cron and at 3:00am when the backup to tape is running.

    > processor might be preferable to switching over to
    > Xeon processors simply because of their more likely
    > future widespread availability. Even these lowly desktop
    > processors will provide a 5-10x performance improvement,
    > especially combined with a gigabyte network switch
    > (I assume users are connected with telnet or ssh?)


    Telnet as all are local. I use ssh for remote administration.
    The application programmer uses VPN to connect to the LAN
    and then telnet to the SCO Box(s).

    > A newer RAID supporting SCO and switching over to
    > RAID10 using smaller, faster 15K drives will provide
    > an additional boost, along with a Quantum DDS5 tape


    By smaller, are you referring to 2.5" drives?

    Raid-10 is total overkill for this application. The systems
    are running with RAID1 on two 36G drives with 14G remaining
    un-assigned drive space.

    Besides, I am now gun shy of SCSI RAID: On 6/12 I was called at 19:50
    when they had been down for 4 hours after losing building
    power. They tell me that the servers had all been shutdown
    before the UPS batteries ran down.

    When I arrived, the two RAID1 disks in the primary were down
    and would not come back up. (Subsequently attaching the SCA
    cage to the adapted 29160 controller POST'ed the disks as
    "Failed Start Unit Request.") There were two disks in RAID1
    in the backup server and two disks listed as hot spares
    at ID4 and ID5. The RAID controller listed the disks as ID9, ID11,
    ID14, and ID15. THE RAID would not boot, getting partially
    into loading the kernel and then hanging. I found that
    the disk at physical ID4 (a hot spare) had been swapped into
    the RAID (ID0 & ID1) and I was able to pull all the disks,
    move ID4 to ID0, boot unix.old and it came up very slowly,
    taking 10 times as long as usual to get to the "press Ctl-D..."
    prompt and the RAID alarm sounding all the while.

    I ran fsck -ofull and fsck finished without an error
    message. I rebooted and the system came up normally. I used
    the dpt raidutil command to silence the alarm for the critical
    RAID1.

    The disk from ID5 was moved to the primary system
    ID0 position, and the nightly backup was restored to it.
    Both servers were back up and running at 02:15 am.

    Two systems now running on one disk each: The primary just
    restored from backup, the backup system still showing as
    a critical RAID1 with a missing disk.

    On 6/14, I put two new Fujitsu 10K drives in the primary system
    as RAID1, The backup system was still running on the old Seagate
    10K drive.

    On 6/19, I added three new Seagate 15K 146G disks to the
    mix by removing Fujitsu disk ID0 from the primary system and
    installing one of the 15K disks and allowed the controller rebuild from
    the 36G disk at ID1. I installed a 15K disk in the backup system
    at ID1, shutdown and created a RAID1 out of the 36G 10K Seagate
    at ID0 and the new 15K disk at ID1. Both RAID's completed the
    rebuild and went "optimal."

    During the night of 6/19 they lost power again after the night
    shift had left. The UPS(s) battery ran down. The next morning, None
    of the disks would come up. All showed "no media" for the block size
    in the RAID controller setup screen. All drives connected to
    the Adaptec 29160 controller POST'ed as "Failed Start Unit Request."

    So, in two weeks we lost four of the original six 10k 36G Seagate drives,
    Two new Fujitsu 10K 36G drives, the remaining two original 10K 36 Seagate
    drives, and three new 15K 146G Seagate drives. All these drives
    report "Failed Start Unit Request."

    6/20 I got the customer up and running on two borrowed Fujitsu
    36G 10K drives, one in each server on the RAID controller but not
    in a RAID.

    I called DTI Data and talked to Scott. He indicated that the Failed
    Start Unit Request is totally hardware related. The drive would have
    to be opened and an engineer will have to determine why the drive is
    reporting "not ready." That's what I got from Seagate and Fujitsu
    technical support and they said that I must get an RMA for the
    in-warranty drives and send them in for replacement.

    How the hell can all the disks in two servers go bad in less then
    two weeks? So I'm a little shell shocked and gun shy concerning
    RAID10 at this time.

    > backup solution (we switched over from an SDT11000
    > earlier this year) and gigabyte transfer speeds between
    > the two servers will improve things noticeably, even
    > if they aren't state-of-the-art.


    We are looking to upgrade the Microlite Backup Edge 1.1
    to Edge 2.2 and perform full system backups to an FTP
    server that is backed up with a separate tape drive.

    I was pleased to find that the Backup Edge RE2 boot
    media created will perform a bare metal restore from the
    FTP archive. I tested this at another client's co-location
    site when they moved the 5.0.7 system from in-house to
    the co-location center and the on-site technicians
    would not agree to rotate the tapes through the system.

    >
    > Your client ends up with a spiffier, faster system
    > that's still the same one they're used to after all
    > these years. You must, however, upgrade to 5.0.7
    > before the hardware upgrade. You'll then be able
    > to transfer to the new systems by using BackupEdge
    > and a BTLD for the new disk controller.


    Been there and done that before. Looks like I'm about
    to quote two new systems with two upgrades to 5.0.7
    along with the attendant 25 user licenses on each box.

    Now I'm torn between SCSI SCA and SAS or SATA for these
    boxes. Any SAS RAID's working on 5.0.7?

    I have used Adaptec 2420SA SATA RAID on another client's
    5.0.7 system and would plan to use it again with WD Raptor
    10K SATA drives.

    However, the client just orders six more Seagate 146G 15K
    drives, so the new hardware will likely be SCA to accept
    the new drives when we move.


    >
    > Bob
    >
    > "Steve M. Fabac, Jr." wrote in message
    > news:<4861F85D.3090502@att.net>...
    >> I have not touched VMWare and don't know where to
    >> start to investigate this issue. So I thought I'd
    >> post it here where several users have implemented various
    >> systems for their own use or client's to solicit recommendations
    >> on suitable system configurations to replace the client's
    >> current servers.
    >>
    >> The client running SCO 5.0.5 Enterprise on two
    >> servers, one is the live production and the other is
    >> a "hot spare."
    >>
    >> These are identical SuperMicro PIII 1.4Ghz machines with
    >> 512M RAM and a DPT 3754U2 RAID controller with 16M cache RAM,
    >> and two 36G SCA 10K disks in RAID1.
    >>
    >> Both machines are backed up each night to their own Sony
    >> SDT-9000 DAT drive. And the application data directories and
    >> user home directories are copied from the live server to the
    >> backup server before the tape backup runs.
    >>
    >> Charged with upgrading this hardware, it makes sense to plan
    >> to migrate to a single CPU system board hosting a 2-3 GHz dual
    >> or quad core Xeon CPU. I would then replace the full length
    >> DPT RAID controllers with a current technology RAID controller
    >> either SATA or SAS with suitable 76G to 146G hard drives in RAID1
    >>
    >> Clearly, moving both live and backup systems to modern hardware
    >> will require upgrading the SCO 5.0.5 OS to either 5.0.7 or
    >> SCO Openserver 6.0 on both machines.
    >>
    >> What's the current opinion on a solid system that will run either
    >> 5.0.7 or 6.0 and have drivers for a RAID controller?
    >>
    >> An alternative strategy I'd like to offer the client is a configuration
    >> using VMWare to Virtualize both the primary and backup 5.0.5
    >> servers.
    >>
    >> I've checked the VMWare web site and I see VMW products ranging
    >> in price from $3624 to $21824 and I have no clue on how to
    >> specify the product the client needs.
    >>
    >> Please comment on the following:
    >>
    >> 1) One or two hardware platforms? The client desires to maintain
    >> hardware redundancy so that if the primary box goes down, we can
    >> switch operations to the secondary box.
    >>
    >> 2) Then should I use one platform to host the primary (live) 5.0.5
    >> instance and a second VMWare platform hosting a running instance
    >> of the current backup server? This continues to require we take
    >> the time to copy the application data from the primary instance
    >> to the backup instance so that the backup instance is ready to
    >> go should the primary box fail.
    >>
    >> 3) Or, not bother to keep a backup 5.0.5 server running on the redundant
    >> VMWare host but just migrate the live 5.0.5 image from the primary
    >> VMWare host to the backup VMWare host as needed? Can that even be
    >> done?
    >>
    >> 4) How do we back up the live 5.0.5 server? Continue to use a dedicated
    >> SONY tape drive and BackupEdge running in the live 5.0.5 instance?
    >> Or is backup performed at the VMWare level? Is that reliable?
    >>
    >> 5) where is the UPS communication and monitoring software installed?
    >> under 5.0.5 or VMWare? Do we shutdown the 5.0.5 instance and then
    >> shutdown VMWare and power off the UPS to preserve the UPS battery?
    >>
    >>
    >> I'm sure that there are other questions that I have not thought of that
    >> will have to be answered to design the optimal strategy for the client.
    >> If you can answer any of the above questions or offer insights into
    >> other important considerations, please post them.
    >> --
    >> Steve Fabac
    >> S.M. Fabac & Associates
    >> 816/765-1670

    >
    >


    --
    Steve Fabac
    S.M. Fabac & Associates
    816/765-1670

  2. Re: Beginning to think about VMware and SCO 5.0.5

    On Wed, 25 Jun 2008, Steve M. Fabac, Jr. wrote:
    > > processor might be preferable to switching over to Xeon processors
    > > simply because of their more likely future widespread availability.
    > > Even these lowly desktop processors will provide a 5-10x performance
    > > improvement, especially combined with a gigabyte network switch (I
    > > assume users are connected with telnet or ssh?)

    >
    > Telnet as all are local. I use ssh for remote administration.
    > The application programmer uses VPN to connect to the LAN
    > and then telnet to the SCO Box(s).
    >
    > > A newer RAID supporting SCO and switching over to RAID10 using
    > > smaller, faster 15K drives will provide an additional boost, along
    > > with a Quantum DDS5 tape

    >
    > By smaller, are you referring to 2.5" drives?
    >
    > Raid-10 is total overkill for this application. The systems
    > are running with RAID1 on two 36G drives with 14G remaining
    > un-assigned drive space.
    >
    > Besides, I am now gun shy of SCSI RAID: On 6/12 I was called at 19:50
    > when they had been down for 4 hours after losing building power. They
    > tell me that the servers had all been shutdown before the UPS batteries
    > ran down. On 6/14, I put two new Fujitsu 10K drives in the primary
    > system as RAID1, The backup system was still running on the old Seagate
    > 10K drive.
    >
    > On 6/19, I added three new Seagate 15K 146G disks to the mix by removing
    > Fujitsu disk ID0 from the primary system and installing one of the 15K
    > disks and allowed the controller rebuild from the 36G disk at ID1. I
    > installed a 15K disk in the backup system at ID1, shutdown and created a
    > RAID1 out of the 36G 10K Seagate at ID0 and the new 15K disk at ID1.
    > Both RAID's completed the rebuild and went "optimal."
    >
    > During the night of 6/19 they lost power again after the night shift had
    > left. The UPS(s) battery ran down. The next morning, None of the disks
    > would come up. All showed "no media" for the block size in the RAID
    > controller setup screen. All drives connected to the Adaptec 29160
    > controller POST'ed as "Failed Start Unit Request."
    >
    > So, in two weeks we lost four of the original six 10k 36G Seagate
    > drives, Two new Fujitsu 10K 36G drives, the remaining two original 10K
    > 36 Seagate drives, and three new 15K 146G Seagate drives. All these
    > drives report "Failed Start Unit Request."


    I have lost 8 drives all at once. All drivers were purchased at the same
    time. I now like to stagger my drive purchases. I like to have my
    drivers purchased 1-2 years a part. This has resolved my all drive
    failures at the same time. I have often seen multiple drive failures on
    drives purchased at the same time. Drives just all seem to drop dead at
    the same time for me when purchased at the same time, but staggering the
    purchases has really assisted in prevent failures at the same time. I
    like doing 2 at a time. (knock on wood) I do not want to jinx anything.
    But doing this has then seen only the 2 drives dying at the same time.
    The old bell curve on failures seems to show that drives fail at about the
    same time, when in use.

    > 6/20 I got the customer up and running on two borrowed Fujitsu
    > 36G 10K drives, one in each server on the RAID controller but not
    > in a RAID.


    ....

    > How the hell can all the disks in two servers go bad in less then
    > two weeks? So I'm a little shell shocked and gun shy concerning
    > RAID10 at this time.


    All purchased/manufactured at the same time and running at about the same
    length. The bell curve predicts failures at the same time. So I am
    really a believer in drives dying at the same time. Hence staggering when
    pruchased/manufactured. Mine have seen the drives from the same
    manufacturing run. I have at most 8 drives dye with-in 2 weeks. Research
    showed they all were from the same manufacturing run. So yeah they dye at
    the same time.

    Good Luck,

    --
    Boyd Gerber
    ZENEZ 1042 East Fort Union #135, Midvale Utah 84047

  3. Re: Beginning to think about VMware and SCO 5.0.5

    Bob Bailin wrote:
    > "Steve M. Fabac, Jr." wrote in message news:4862CF7E.7030205@att.net...
    >> Bob Bailin wrote:
    >>> Steve,
    >>> A newer RAID supporting SCO and switching over to
    >>> RAID10 using smaller, faster 15K drives will provide
    >>> an additional boost, along with a Quantum DDS5 tape

    >> By smaller, are you referring to 2.5" drives?

    >
    > No, just 36 or 72GB 15K drives instead of 200GB drives.
    >
    >> Raid-10 is total overkill for this application. The systems
    >> are running with RAID1 on two 36G drives with 14G remaining
    >> un-assigned drive space.
    >>
    >> Besides, I am now gun shy of SCSI RAID: On 6/12 I was called at 19:50
    >> when they had been down for 4 hours after losing building
    >> power. They tell me that the servers had all been shutdown
    >> before the UPS batteries ran down.

    >
    > I don't believe it, unless they shutdown the system without actually
    > powering it off manually. When was the last time this server was
    > powered down before this?


    May 27 22:23:36 xxxreal syslogd: restart
    May 27 22:23:36 xxxreal SCO OpenServer(TM) Release 5
    May 27 22:23:36 xxxreal
    May 27 22:23:36 xxxreal (C) 1976-1998 The Santa Cruz Operation
    , Inc.
    >
    > [snip the details]
    >> On 6/19, I added three new Seagate 15K 146G disks to the
    >> mix by removing Fujitsu disk ID0 from the primary system and
    >> installing one of the 15K disks and allowed the controller rebuild from
    >> the 36G disk at ID1. I installed a 15K disk in the backup system
    >> at ID1, shutdown and created a RAID1 out of the 36G 10K Seagate
    >> at ID0 and the new 15K disk at ID1. Both RAID's completed the
    >> rebuild and went "optimal."
    >>
    >> During the night of 6/19 they lost power again after the night
    >> shift had left. The UPS(s) battery ran down. The next morning, None
    >> of the disks would come up. All showed "no media" for the block size
    >> in the RAID controller setup screen. All drives connected to
    >> the Adaptec 29160 controller POST'ed as "Failed Start Unit Request."
    >>
    >> So, in two weeks we lost four of the original six 10k 36G Seagate drives,
    >> Two new Fujitsu 10K 36G drives, the remaining two original 10K 36 Seagate
    >> drives, and three new 15K 146G Seagate drives. All these drives
    >> report "Failed Start Unit Request."
    >>

    >
    >> How the hell can all the disks in two servers go bad in less then
    >> two weeks? So I'm a little shell shocked and gun shy concerning
    >> RAID10 at this time.
    >>

    >
    > I would be more concerned with how your UPS handles a power loss
    > situation after the batteries have run down. It should cut the output
    > power cleanly, swiftly and permanently when the battery charge
    > reaches a certain level. It shouldn't power on automatically when
    > the power resumes. And it shouldn't try to extend the battery life
    > by outputting a lower voltage. Consider testing the backup system
    > by cutting the power to its UPS at the breaker (not by pulling
    > the plug) and monitoring the results until complete power down.
    >
    > You didn't mention whether the SCA cages were internal or
    > external. If external, were they plugged into the same UPS?


    SuperMicro SC742S-400 (400W PS) 6 internal SCA hot_swap cages


    > In either case, are the drives configured to spin up with a delay
    > or on receipt of a Start Unit command from the hba? This is one
    > way to get around an inadequate power supply that can't handle
    > the initial startup currents all at once, killing the drives by rapidly
    > cycling the power.
    >
    > Bob
    >
    >
    >


    --
    Steve Fabac
    S.M. Fabac & Associates
    816/765-1670

+ Reply to Thread