HSD52 dead reduced raidset - VMS

This is a discussion on HSD52 dead reduced raidset - VMS ; Hi. I have a dual redundant hsd50 controller (SCSI over DSSI). One of the storagesets was a 4 x 18GB raidset that went into reduced mode. There was not a large enough hot (or cold) spare available at the time. ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: HSD52 dead reduced raidset

  1. HSD52 dead reduced raidset


    Hi.

    I have a dual redundant hsd50 controller (SCSI over DSSI).
    One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    There was not a large enough hot (or cold) spare available at the time.
    When I replaced the 4th drive, one of the remaining 3 drives was howling
    and spun down before the 4th drive could be added.

    Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    will only stay spinning until it gets hot and starts howling.

    I have replaced the 4th 18G drive and currently sits as a spareset.

    The controller runs HSOF 5.7 under OpenVMS VAX V6.2

  2. Re: HSD52 dead reduced raidset

    Muddflapp Mohican wrote:
    >
    > Hi.
    >
    > I have a dual redundant hsd50 controller (SCSI over DSSI).
    > One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    > There was not a large enough hot (or cold) spare available at the time.
    > When I replaced the 4th drive, one of the remaining 3 drives was howling
    > and spun down before the 4th drive could be added.
    >
    > Is there a secret HSD toolset that can resurrect a dead reduced raidset?


    YES! It's called BACKUP.

    *SOME* RAID configurations will protect you against loss of data caused
    by the failure of a single disk drive. No RAID configuration is, in any
    way, a substitute for regular disk backups!

    I have this uneasy feeling that you neglected to make regular backups.
    If I'm right, lots of luck in looking for a new job!

    I might add that there is much to be said for having both hot spares and
    cold spares. If all your drives are the same size, one spare may be
    enough. If not, the problem gets a little messier.

  3. Re: HSD52 dead reduced raidset


    You should be able to rebuild the raidset if there hasn't been a data loss.
    The following command should start the rebuild of the raidset.

    HSD50> set yourraidsetname replace=DISKxxx <- this is the spare disk

    If the howling disk keeps running until the raidset is rebuilt, you can
    replace that also. When you are ready to replace the howling disk (=
    when the raidset is in normal state again) you need to remove the
    howling disk from the raidet with the following command:

    HSD50> set yourraidsetname remove=DISKxxx <- this is the disk you are
    removing from the raidset

    After that you remove it physically and insert a replacement disk and do:

    HSD50> run config

    Then you do the replace command again with the new disk as the DISKxxx

    Then you should have the raidset running normally again.


    Regards,

    Kari








    Muddflapp Mohican wrote:

    >
    > Hi.
    >
    > I have a dual redundant hsd50 controller (SCSI over DSSI).
    > One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    > There was not a large enough hot (or cold) spare available at the time.
    > When I replaced the 4th drive, one of the remaining 3 drives was howling
    > and spun down before the 4th drive could be added.
    >
    > Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    > All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    > will only stay spinning until it gets hot and starts howling.
    >
    > I have replaced the 4th 18G drive and currently sits as a spareset.
    >
    > The controller runs HSOF 5.7 under OpenVMS VAX V6.2


  4. Re: HSD52 dead reduced raidset

    George Cornelius wrote:
    > In article , Muddflapp Mohican writes:
    >
    >> I have a dual redundant hsd50 controller (SCSI over DSSI).
    >> One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    >> There was not a large enough hot (or cold) spare available at the time.
    >> When I replaced the 4th drive, one of the remaining 3 drives was howling
    >> and spun down before the 4th drive could be added.
    >>
    >> Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    >> All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    >> will only stay spinning until it gets hot and starts howling.

    >
    > I would say you have a serious problem.
    >
    > I have heard a recommendation to _refrigerate_ a bad drive to get the
    > data from it. I have no clue whether this is even reasonable or safe.
    > A BACKUP/PHYSICAL to tape (or a Unix/Linux dd command) might have some
    > hope of successfully cloning the drive.
    >
    > If your data is important, seek professional help. It's costly but if
    > as you say the noisy drive still has good data they may be the most
    > likely to recover it for you.
    >
    > If they are HP drives, I would start with them first. Or the original
    > manufacturer - Seagate, IBM (now Hitachi), ... whoever.
    >
    > A coworker's wife is a Hitachi, previously IBM, engineer, and I hear
    > stories from time to time about white knuckle recovery operations on
    > drives shipped from customers who did not have a good backup...
    >
    > There are professional recovery services as well.
    >


    Yes, there are. But I have heard stories about $1,000 US/megabyte
    recovered. True, or not, it's a poor position to be in!! And regular
    backups are almost certainly cheaper.


  5. Re: HSD52 dead reduced raidset

    In article , Muddflapp Mohican writes:

    > I have a dual redundant hsd50 controller (SCSI over DSSI).
    > One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    > There was not a large enough hot (or cold) spare available at the time.
    > When I replaced the 4th drive, one of the remaining 3 drives was howling
    > and spun down before the 4th drive could be added.
    >
    > Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    > All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    > will only stay spinning until it gets hot and starts howling.


    I would say you have a serious problem.

    I have heard a recommendation to _refrigerate_ a bad drive to get the
    data from it. I have no clue whether this is even reasonable or safe.
    A BACKUP/PHYSICAL to tape (or a Unix/Linux dd command) might have some
    hope of successfully cloning the drive.

    If your data is important, seek professional help. It's costly but if
    as you say the noisy drive still has good data they may be the most
    likely to recover it for you.

    If they are HP drives, I would start with them first. Or the original
    manufacturer - Seagate, IBM (now Hitachi), ... whoever.

    A coworker's wife is a Hitachi, previously IBM, engineer, and I hear
    stories from time to time about white knuckle recovery operations on
    drives shipped from customers who did not have a good backup...

    There are professional recovery services as well.

    --
    George Cornelius cornelius A T eisner D O T decus D O T org

    > I have replaced the 4th 18G drive and currently sits as a spareset.
    >
    > The controller runs HSOF 5.7 under OpenVMS VAX V6.2


  6. Re: HSD52 dead reduced raidset

    George Cornelius wrote:
    > In article , "Richard B. Gilbert" writes:
    >> Yes, there are. But I have heard stories about $1,000 US/megabyte
    >> recovered.

    >
    > I believe there is a setup cost - a minimum fee associated with getting
    > into a position to perform a recovery - after which the cost per megabyte
    > may be a bit lower than what you quote.
    >

    That $1000/megabyte recovered figure is about ten or twelve years old
    and it was rumor rather than personal experience. I haven't heard
    anything about the price of such services since. The last time I had a
    disk crash we simply restored from backup. It doesn't happen often and,
    if you pay attention to your error counters, you may see some warning
    signs prior to any catastrophic failure. When I saw a disk start to log
    media errors (emphasis on the plural) I simply asked DEC/Compaq/HP to
    replace it under warranty or contract, as appropriate. Anything REALLY
    critical was RAID 1 or RAID 5 anyway. The only time a disk ever rolled
    over and died on me happened while I was recovering from a triple
    bypass. My boss called me at home and asked me what to to.

    I told him where to find a spare drive, had him replace the failed
    drive, and restore from backup.



  7. Re: HSD52 dead reduced raidset

    In article <5P6F8yDjwtQj@eisner.encompasserve.org>, BEGINcornelius@decuserve.orgEND (George Cornelius) writes:
    >In article , Muddflapp Mohican writes:
    >> Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    >> All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    >> will only stay spinning until it gets hot and starts howling.


    [...]

    > I have heard a recommendation to _refrigerate_ a bad drive to get the
    > data from it. I have no clue whether this is even reasonable or safe.
    > A BACKUP/PHYSICAL to tape (or a Unix/Linux dd command) might have some
    > hope of successfully cloning the drive.


    I forgot about controller metadata.

    On the HSx50's, you have to set a drive TRANSPORTABLE to be able to get
    at all of the blocks. This _may_ involve writing to the drive.

    So it may be better to put the drive on some other SCSI bus in order to
    copy the bits from it.

    Personally, I would tend to use Linux just because dd is more versatile
    than $ BACKUP/PHYSICAL . But, as I said, you may be best off getting
    the professionals to work on this.

    --
    George Cornelius cornelius A T eisner D O T decus D O T org

  8. Re: HSD52 dead reduced raidset

    In article , "Richard B. Gilbert" writes:
    > Yes, there are. But I have heard stories about $1,000 US/megabyte
    > recovered.


    I believe there is a setup cost - a minimum fee associated with getting
    into a position to perform a recovery - after which the cost per megabyte
    may be a bit lower than what you quote.

    But if you can't get the thing to spin up at all, and the manufacturer
    doesn't have some tricks up his sleeve (replace the bearings?? perhaps
    not...), then the next jump up in recovery expense is probably rather
    high.

    By the way, I have heard that it is (was) relatively common for the circuit
    board on a drive to fail with the mechanism intact, with a correspondingly
    less expensive recovery option available. In the current situation that is
    almost certainly not the case with the second drive that failed; and the
    first one that failed is most likely out of date and therefore close to
    useless. Still, three of each four blocks contain data you might want
    (assuming RAID 5), so, who knows... It's that troublesome parity block
    that is the killer in trying to recover a reduced raidset where one of
    the remaining drives has stale data...

    --
    George Cornelius cornelius A T eisner D O T decus D O T org

  9. Re: HSD52 dead reduced raidset

    On Mon, Apr 21, 2008 at 4:49 PM, Richard B. Gilbert
    wrote:
    > George Cornelius wrote:
    >
    > > In article , "Richard B.

    > Gilbert" writes:
    > >
    > > > Yes, there are. But I have heard stories about $1,000 US/megabyte

    > recovered.
    > > >

    > >
    > > I believe there is a setup cost - a minimum fee associated with getting
    > > into a position to perform a recovery - after which the cost per megabyte
    > > may be a bit lower than what you quote.
    > >
    > >

    > That $1000/megabyte recovered figure is about ten or twelve years old and
    > it was rumor rather than personal experience. I haven't heard anything
    > about the price of such services since. The last time I had a disk crash we
    > simply restored from backup. It doesn't happen often and, if you pay
    > attention to your error counters, you may see some warning signs prior to
    > any catastrophic failure. When I saw a disk start to log media errors
    > (emphasis on the plural) I simply asked DEC/Compaq/HP to replace it under
    > warranty or contract, as appropriate. Anything REALLY critical was RAID 1 or
    > RAID 5 anyway. The only time a disk ever rolled over and died on me
    > happened while I was recovering from a triple bypass. My boss called me at
    > home and asked me what to to.
    >
    > I told him where to find a spare drive, had him replace the failed drive,
    > and restore from backup.
    >
    >
    >


    ROTFFLMAO.

    Except for the fact that you actually told him the *right* thing to to
    to recover his data, that's a very BOFHish story.

    I once interviewed at a place that wanted me to look at whatever they
    did and figure out ways to make it more efficient.

    This place had an entire floor on an office building; Just about
    everything d|i|g|i|t|a|l and Compaq had pretty much made. And some
    third-party stuff, too.

    One of the first questions I asked, of course, was about how they did
    their backups.

    I was told that the second shift operators kicked them off in the
    early evening as the workload settled down, and that the morning guys
    would kill them off because they hadn't finished by the time the
    daytime workload ramped up and things started "getting slow".

    I finished the interview, thanked them for their time, and ran like hell.

    WWWebb

    WW [What *was* your username?] Webb

  10. Re: HSD52 dead reduced raidset

    Bob Koehler wrote:
    > In article , "Richard B. Gilbert" writes:
    >> Yes, there are. But I have heard stories about $1,000 US/megabyte
    >> recovered. True, or not, it's a poor position to be in!! And regular
    >> backups are almost certainly cheaper.

    >
    > I know of a group that lost a critical hard drive. The backups also
    > failed. I don't know how much they paid but they went from a
    > several million dollar loss to a fairly cheap bill.
    >
    > Most likely the electronics failed but the platters still held the
    > data.
    >


    I suppose it happens that way occasionally. I'm not convinced, however,
    that it is a common failure mode. Given a decent environment to work
    in, the electronics can be expected to outlast the moving parts by a
    substantial margin.

    Do you ever test your ability to read and restore your backups? It's a
    very good idea!

  11. Re: HSD52 dead reduced raidset

    In article , "Richard B. Gilbert" writes:
    >
    > Yes, there are. But I have heard stories about $1,000 US/megabyte
    > recovered. True, or not, it's a poor position to be in!! And regular
    > backups are almost certainly cheaper.


    I know of a group that lost a critical hard drive. The backups also
    failed. I don't know how much they paid but they went from a
    several million dollar loss to a fairly cheap bill.

    Most likely the electronics failed but the platters still held the
    data.


  12. Re: HSD52 dead reduced raidset

    "Richard B. Gilbert" wrote in
    news:LOednXulcqgIe5DVnZ2dnUVZ_umdnZ2d@comcast.com:

    > Bob Koehler wrote:
    >> Most likely the electronics failed but the platters still held the
    >> data.
    >>

    >
    > I suppose it happens that way occasionally. I'm not convinced,
    > however, that it is a common failure mode. Given a decent environment
    > to work in, the electronics can be expected to outlast the moving
    > parts by a substantial margin.


    I've sold two old (working but no longer required) drives on a "popular
    internet auction site". In both cases the kinds of questions I was asked
    made me suspect that they were being used as donors. In both cases I was
    right and in both cases the buyer sent an email thanking me for saving the
    data. Someone in the office has also sucessfully performed a board-ectomy
    and recovered data.

    So electronics failure certainly does happen.

    Another spurious data point while I'm here. When RAID fails spectacularly,
    the failure mode always seems to be: drive failed, swapped in a spare,
    second drive died before the replacement had had enough copied on to it.
    I don't know whether RAID 6 will help, since I suspect that as the 2nd
    replacement drive is coming online a 3rd one will die (drives tend to be
    installed all together and all from the same batch, so they're roughly the
    same age and have done roughly the same amount of work, so I'm never
    surprised that they often decide to peg it synchronously!).

    Antonio


  13. Re: HSD52 dead reduced raidset

    Antonio Carlini wrote:
    > "Richard B. Gilbert" wrote in
    > news:LOednXulcqgIe5DVnZ2dnUVZ_umdnZ2d@comcast.com:
    >
    >> Bob Koehler wrote:
    >>> Most likely the electronics failed but the platters still held the
    >>> data.
    >>>

    >> I suppose it happens that way occasionally. I'm not convinced,
    >> however, that it is a common failure mode. Given a decent environment
    >> to work in, the electronics can be expected to outlast the moving
    >> parts by a substantial margin.

    >
    > I've sold two old (working but no longer required) drives on a "popular
    > internet auction site". In both cases the kinds of questions I was asked
    > made me suspect that they were being used as donors. In both cases I was
    > right and in both cases the buyer sent an email thanking me for saving the
    > data. Someone in the office has also sucessfully performed a board-ectomy
    > and recovered data.
    >
    > So electronics failure certainly does happen.
    >
    > Another spurious data point while I'm here. When RAID fails spectacularly,
    > the failure mode always seems to be: drive failed, swapped in a spare,
    > second drive died before the replacement had had enough copied on to it.
    > I don't know whether RAID 6 will help, since I suspect that as the 2nd
    > replacement drive is coming online a 3rd one will die (drives tend to be
    > installed all together and all from the same batch, so they're roughly the
    > same age and have done roughly the same amount of work, so I'm never
    > surprised that they often decide to peg it synchronously!).


    This suggests, to me, that people are expecting more than the hardware
    can deliver. Five years of service from a drive is quite a lot. In the
    case of the very latest 15,000 RPM drives, it's probably two or three
    lifetimes!

    Where possible, I would try to retire older drives to non-critical
    service. Drives are not THAT expensive, especially when compared with
    the cost of restoring a few hundred gigabytes that may, or may not, have
    been backed up recently. My personal experience suggests that backups
    will not be done unless someone is cracking the whip!!

  14. Re: HSD52 dead reduced raidset

    Bob Koehler wrote:
    > In article , "Richard B. Gilbert" writes:
    >> This suggests, to me, that people are expecting more than the hardware
    >> can deliver. Five years of service from a drive is quite a lot. In the
    >> case of the very latest 15,000 RPM drives, it's probably two or three
    >> lifetimes!

    >
    > Tell that to the 16 year old RZ28 or 20 year old RF73 drives I use.
    > Narry a bit dropped.
    >


    Well, it shows that Digital made, or at least supplied, good stuff. I'm
    not sure that I would want to have that drive in any sort of critical
    service. Has that drive been powered up more or less continuously? Or
    has it been sitting on the shelf most of the time?


  15. Re: HSD52 dead reduced raidset

    In article , "Richard B. Gilbert" writes:
    >
    > This suggests, to me, that people are expecting more than the hardware
    > can deliver. Five years of service from a drive is quite a lot. In the
    > case of the very latest 15,000 RPM drives, it's probably two or three
    > lifetimes!


    Tell that to the 16 year old RZ28 or 20 year old RF73 drives I use.
    Narry a bit dropped.


  16. Re: HSD52 dead reduced raidset

    In article , "Richard B. Gilbert" writes:
    > Bob Koehler wrote:
    >> In article , "Richard B. Gilbert" writes:
    >>> This suggests, to me, that people are expecting more than the hardware
    >>> can deliver. Five years of service from a drive is quite a lot. In the
    >>> case of the very latest 15,000 RPM drives, it's probably two or three
    >>> lifetimes!

    >>
    >> Tell that to the 16 year old RZ28 or 20 year old RF73 drives I use.
    >> Narry a bit dropped.
    >>

    >
    > Well, it shows that Digital made, or at least supplied, good stuff. I'm
    > not sure that I would want to have that drive in any sort of critical
    > service. Has that drive been powered up more or less continuously? Or
    > has it been sitting on the shelf most of the time?


    We have RZ29B drives which have been in nearly continuous service since
    they were new. If they survive infancy, then they very rarely develop
    problems. Spinning them up/down as little as possible and keeping
    them cool is probably one reason they seem to run forever, however the
    newer stuff (in particular, the 36GB drives) have lots of problems no
    matter how well you treat them.

    We also had RA90 and RA92 drives which ran many years under heavy load
    (i.e., near maximum I/O rate 24/7) with zero problems once they were
    "burnt in".


    George Cook
    WVNET

  17. Re: HSD52 dead reduced raidset

    In article , "Richard B. Gilbert" writes:
    >
    > Well, it shows that Digital made, or at least supplied, good stuff. I'm
    > not sure that I would want to have that drive in any sort of critical
    > service. Has that drive been powered up more or less continuously? Or
    > has it been sitting on the shelf most of the time?


    Except for occaisional unannounced power outages those drives run
    all the time.


  18. Re: HSD52 dead reduced raidset

    On Tue, Apr 22, 2008 at 8:38 PM, Richard B. Gilbert
    wrote:
    >
    > Antonio Carlini wrote:
    >
    > > "Richard B. Gilbert" wrote in
    > > news:LOednXulcqgIe5DVnZ2dnUVZ_umdnZ2d@comcast.com:
    > >
    > > > Bob Koehler wrote:
    > > >
    > > > > Most likely the electronics failed but the platters still held the
    > > > > data.
    > > > >
    > > > >
    > > > I suppose it happens that way occasionally. I'm not convinced,
    > > > however, that it is a common failure mode. Given a decent environment
    > > > to work in, the electronics can be expected to outlast the moving
    > > > parts by a substantial margin.
    > > >

    > >
    > > I've sold two old (working but no longer required) drives on a "popular

    > internet auction site". In both cases the kinds of questions I was asked
    > made me suspect that they were being used as donors. In both cases I was
    > right and in both cases the buyer sent an email thanking me for saving the
    > data. Someone in the office has also sucessfully performed a board-ectomy
    > and recovered data.
    > >
    > > So electronics failure certainly does happen.
    > >
    > > Another spurious data point while I'm here. When RAID fails spectacularly,

    > the failure mode always seems to be: drive failed, swapped in a spare,
    > second drive died before the replacement had had enough copied on to it.
    > > I don't know whether RAID 6 will help, since I suspect that as the 2nd

    > replacement drive is coming online a 3rd one will die (drives tend to be
    > installed all together and all from the same batch, so they're roughly the
    > same age and have done roughly the same amount of work, so I'm never
    > surprised that they often decide to peg it synchronously!).
    > >

    >
    > This suggests, to me, that people are expecting more than the hardware can
    > deliver. Five years of service from a drive is quite a lot. In the case of
    > the very latest 15,000 RPM drives, it's probably two or three lifetimes!
    >
    > Where possible, I would try to retire older drives to non-critical service.
    > Drives are not THAT expensive, especially when compared with the cost of
    > restoring a few hundred gigabytes that may, or may not, have been backed up
    > recently. My personal experience suggests that backups will not be done
    > unless someone is cracking the whip!!
    >


    We spend several million dollars a year on our backups.

    WWWebb

  19. Re: HSD52 dead reduced raidset

    On Apr 20, 4:40 pm, Muddflapp Mohican wrote:
    > Hi.
    >
    > I have a dual redundant hsd50 controller (SCSI over DSSI).
    > One of the storagesets was a 4 x 18GB raidset that went into reduced mode.
    > There was not a large enough hot (or cold) spare available at the time.
    > When I replaced the 4th drive, one of the remaining 3 drives was howling
    > and spun down before the 4th drive could be added.
    >
    > Is there a secret HSD toolset that can resurrect a dead reduced raidset?
    > All 3 drives of the 4 disk reduced raidset will spin up. The third drive
    > will only stay spinning until it gets hot and starts howling.
    >
    > I have replaced the 4th 18G drive and currently sits as a spareset.
    >
    > The controller runs HSOF 5.7 under OpenVMS VAX V6.2


    What type of RAID set are the drives configured in? From the
    description, I assume it is not a RAID 0 (stripe) or RAID 1
    (mirror).

    If is is a RAID 0+1 (mirrored 2 disk stripe), all of the data should
    be present; the trick may be to get the controller to allow you to
    access it, especially if one drive in each stripe failed.

    If it is a RAID 5 (n+1 redundancy with distributed parity), you have
    lost data unless you can repair the 2nd failed drive. Repairing the
    first failed drive may be better than nothing, but if you can retrieve
    any data at all there will be corruption or lost data as the result of
    changes that were made while the RAID set was operating in reduced
    mode. The controller may detect the inconsistency using the metadata
    and prevent you from attempting this. If the corruption affects
    critical portions of the OpenVMS file system you will have to work
    around that as well.

    In short, unless the failures involve loss of both members of the same
    mirror set in a RAID 0+1, you will need professional assistance to
    recover the data. Contact someone before you do anything else to the
    disks: the more you manipulate them, the greater the risk of
    additional damage.

    Good luck!

    Jerry

+ Reply to Thread