Failure of brand new drive... possibly due to staggered spinup? - Storage
This is a discussion on Failure of brand new drive... possibly due to staggered spinup? - Storage ; Hi all, I've just experienced a mystifying failure of a hard disk that was literally only one day old. It is a Hitachi Travelstar 5K160 (80gb 5400RPM SATA) that came with my new Dell laptop. I had installed Ubuntu Feisty ...
| | LinkBack | Tools |
|
#1
| |||
| |||
| I've just experienced a mystifying failure of a hard disk that was literally only one day old. It is a Hitachi Travelstar 5K160 (80gb 5400RPM SATA) that came with my new Dell laptop. I had installed Ubuntu Feisty Linux, and everything seemed to be working fine, and I even checked the S.M.A.R.T. data for the drive and it looked great. I was playing around with drive power settings using hdparm under Linux, and I enabled "power-on in standby mode", which is supposed to enable staggered spin-up. No particular reason, I was just trying it out. I assumed the effect would be harmless in a single-drive system. Everything continued to work fine, until I powered off the computer an hour or so later... I tried to turn it back on, and BIOS reported failure of the first disk drive. I tried a variety of rescue CDs and boot disks, to no avail... I could not get the drive to respond. I then removed the drive from the laptop and put it in my desktop tower. Again, the computer was unable to communicate with it, ruling out the possibility of a drive controller issue. I tried holding the drive in my hand as it powered up, and I could not feel the characteristic hum of the motor! So I'm quite mystified. The coincidence is uncanny, and I've never had a brand-spanking-new drive fail like this. Is it possible that enabling "power-on in standby mode" destroyed this drive?? In my experience, drives in standby mode are still capable of communicating with the host, so I don't understand what the problem with this drive could be. Anyone have any advice/anecdotes/explanation? Dan Lenski |
|
#2
| |||
| |||
| Your BIOS can fail to send the START STOP UNIT command to the boot drive. Or the drive firmware can have a bug in this path, in which case yes, the drive is killed. But this is unlikely. Solution: - attach the disk to the second Linux machine as _non-primary_ disk. - boot Linux - play with "hdparm" Or: - find an USB/1394 box for the disk - install in inside - attach the box to a Windows machine. What will Windows say? -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation maxim@storagecraft.com http://www.storagecraft.com "Dan Lenski" news:1180713827.391586.305640@q69g2000hsb.googlegr oups.com... > Hi all, > I've just experienced a mystifying failure of a hard disk that was > literally only one day old. It is a Hitachi Travelstar 5K160 (80gb > 5400RPM SATA) that came with my new Dell laptop. I had installed > Ubuntu Feisty Linux, and everything seemed to be working fine, and I > even checked the S.M.A.R.T. data for the drive and it looked great. > > I was playing around with drive power settings using hdparm under > Linux, and I enabled "power-on in standby mode", which is supposed to > enable staggered spin-up. No particular reason, I was just trying it > out. I assumed the effect would be harmless in a single-drive > system. Everything continued to work fine, until I powered off the > computer an hour or so later... > > I tried to turn it back on, and BIOS reported failure of the first > disk drive. I tried a variety of rescue CDs and boot disks, to no > avail... I could not get the drive to respond. I then removed the > drive from the laptop and put it in my desktop tower. Again, the > computer was unable to communicate with it, ruling out the possibility > of a drive controller issue. I tried holding the drive in my hand as > it powered up, and I could not feel the characteristic hum of the > motor! > > So I'm quite mystified. The coincidence is uncanny, and I've never > had a brand-spanking-new drive fail like this. Is it possible that > enabling "power-on in standby mode" destroyed this drive?? In my > experience, drives in standby mode are still capable of communicating > with the host, so I don't understand what the problem with this drive > could be. Anyone have any advice/anecdotes/explanation? > > Dan Lenski > |
|
#3
| |||
| |||
| "Dan Lenski" news:1180713827.391586.305640@q69g2000hsb.googlegr oups.com... > Hi all, [SNIP] > > So I'm quite mystified. The coincidence is uncanny, and I've never > had a brand-spanking-new drive fail like this. Is it possible that > enabling "power-on in standby mode" destroyed this drive?? In my > experience, drives in standby mode are still capable of communicating > with the host, so I don't understand what the problem with this drive > could be. Anyone have any advice/anecdotes/explanation? > > Dan Lenski You've just experienced an early failure. Nothing special, just a fact of life. The frequency of failures usually follow the 'bathtub' curve. Relatively many drives fail early in their life due to one weak component that barely made it through the manufacturing tests. The failure rate drops over time and remains low for a number of years. Then it goes up again when the drive components start to wear. Just call Dell and have the drive replaced under warranty. Rob |
|
#4
| |||
| |||
| On Jun 1, 12:15 pm, "Rob Turk" > You've just experienced an early failure. Nothing special, just a fact of > life. The frequency of failures usually follow the 'bathtub' curve. > Relatively many drives fail early in their life due to one weak component > that barely made it through the manufacturing tests. The failure rate drops > over time and remains low for a number of years. Then it goes up again when > the drive components start to wear. I guess so. It's just... spooky! Is it typical for such early failures to occur when the drive is power-cycled? I'm going to live in fear of the "hdparm -s1" option in the future :-) > Just call Dell and have the drive replaced under warranty. I've done that, after assuaging my conscience that this wasn't my fault. Well, actually I used their Internet chat tech support... which was a pleasant surprise since it turns out to be less annoying than speaking on the phone. Dan |
|
#5
| |||
| |||
| Dan Lenski wrote: > On Jun 1, 12:15 pm, "Rob Turk" >> You've just experienced an early failure. Nothing special, just a fact of >> life. The frequency of failures usually follow the 'bathtub' curve. >> Relatively many drives fail early in their life due to one weak component >> that barely made it through the manufacturing tests. The failure rate drops >> over time and remains low for a number of years. Then it goes up again when >> the drive components start to wear. Didn't Google, or some other large company, recently discredit the bathtub curve failure rate theory? |
|
#6
| |||
| |||
| On Jun 1, 12:13 pm, "Maxim S. Shatskih" > Your BIOS can fail to send the START STOP UNIT command to the boot drive. > > Or the drive firmware can have a bug in this path, in which case yes, the > drive is killed. But this is unlikely. > > Solution: > - attach the disk to the second Linux machine as _non-primary_ disk. > - boot Linux > - play with "hdparm" Tried this. (I actually booted off a USB-drive containing some Linux utilities since I only had one SATA cable.) When Linux boots, it complains of an inability to communicate with SATA disk 1. So no /dev/ sd* node ever gets allocated for the disk. I can see the possibility that BIOS fails to send the appropriate initialization commands to the drive, knowing how buggy BIOS can be. But it seems unlikely that *both* BIOS and the Linux kernel would fail to do so! And from other mailing list posts, I've read that SATA drives should not have any problem identifying themselves to the host in standby mode, before spin-up. > Or: > - find an USB/1394 box for the disk > - install in inside > - attach the box to a Windows machine. What will Windows say? An interesting idea. Though I don't have a SATA enclosure handy, only an IDE enclosure. I guess the drive really is just plain dead. I really wish I could confirm or refute the notion that standby mode did it, though! Dan |
|
#7
| |||
| |||
| Dan Lenski wrote: > > > I guess the drive really is just plain dead. I really wish I could > confirm or refute the notion that standby mode did it, though! So get another one and try it again, repeat until you have a statistically valid sample :-) -- Nik Simpson |
|
#8
| |||
| |||
| On Jun 1, 1:37 pm, Nik Simpson > So get another one and try it again, repeat until you have a > statistically valid sample :-) I hadn't planned to get into the hard disk testing business anytime soon :-) I'm just worried that there could be some issue with standby mode on this brand of drive. Having my drive die after 2 days is bad enough... having it die after 2 months when I have all my work on there would be a lot worse. Dan |
|
#9
| |||
| |||
| On Fri, 01 Jun 2007 16:03:47 -0000, Dan Lenski wrote: >Hi all, >I've just experienced a mystifying failure of a hard disk that was >literally only one day old. It is a Hitachi Travelstar 5K160 (80gb >5400RPM SATA) that came with my new Dell laptop. I had installed >Ubuntu Feisty Linux, and everything seemed to be working fine, and I >even checked the S.M.A.R.T. data for the drive and it looked great. > >I was playing around with drive power settings using hdparm under >Linux, and I enabled "power-on in standby mode", which is supposed to >enable staggered spin-up. No particular reason, I was just trying it >out. I assumed the effect would be harmless in a single-drive >system. Everything continued to work fine, until I powered off the >computer an hour or so later... > >I tried to turn it back on, and BIOS reported failure of the first >disk drive. I tried a variety of rescue CDs and boot disks, to no >avail... I could not get the drive to respond. I then removed the >drive from the laptop and put it in my desktop tower. Again, the >computer was unable to communicate with it, ruling out the possibility >of a drive controller issue. I tried holding the drive in my hand as >it powered up, and I could not feel the characteristic hum of the >motor! > >So I'm quite mystified. The coincidence is uncanny, and I've never >had a brand-spanking-new drive fail like this. Is it possible that >enabling "power-on in standby mode" destroyed this drive?? In my >experience, drives in standby mode are still capable of communicating >with the host, so I don't understand what the problem with this drive >could be. Anyone have any advice/anecdotes/explanation? > >Dan Lenski This is as expected. You need to send a Power-Up In Standby feature set device spin-up. command to spinup the disk, or a Disable Power-Up In Standby feature set. to disable the feature. -- Svend Olaf |
|
#10
| |||
| |||
| "Dan Lenski" > Hi all, > I've just experienced a mystifying failure of a hard disk that was > literally only one day old. It is a Hitachi Travelstar 5K160 (80gb > 5400RPM SATA) that came with my new Dell laptop. I had installed > Ubuntu Feisty Linux, and everything seemed to be working fine, and I > even checked the S.M.A.R.T. data for the drive and it looked great. > > I was playing around with drive power settings using hdparm under > Linux, and I enabled "power-on in standby mode", which is supposed to > enable staggered spin-up. No particular reason, I was just trying it > out. I assumed the effect would be harmless in a single-drive system. > Everything continued to work fine, until I powered off the > computer an hour or so later... > > I tried to turn it back on, and BIOS reported failure of the first > disk drive. I tried a variety of rescue CDs and boot disks, to no > avail... I could not get the drive to respond. I then removed the > drive from the laptop and put it in my desktop tower. Again, the > computer was unable to communicate with it, ruling out the possibility > of a drive controller issue. I tried holding the drive in my hand as > it powered up, and I could not feel the characteristic hum of the motor! And why should you? You set it up to "power-on in standby mode". So it does. > > So I'm quite mystified. The coincidence is uncanny, and I've never > had a brand-spanking-new drive fail like this. Is it possible that > enabling "power-on in standby mode" destroyed this drive?? Nope, it is just doing what you told it to do, its in standby until you tell it to come out of it. > In my experience, drives in standby mode are still capable of communicating > with the host, And it probably does. Problem is likely that host doesn't understand why it is in standby mode, so it fails it. > so I don't understand what the problem with this drive could be. Most likely none. > Anyone have any advice/anecdotes/explanation? Most likely your host isn't compatible with power-on in standby mode. Set the drive back to normal. That may be easier said then done, apparently. > > Dan Lenski |
|
#11
| |||
| |||
| On Jun 1, 1:43 pm, "Folkert Rienstra" > > of a drive controller issue. I tried holding the drive in my hand as > > it powered up, and I could not feel the characteristic hum of the motor! > > And why should you? You set it up to "power-on in standby mode". > So it does. Indeed. However, I would expect it to come out of standby mode when addressed by the host :-) For example, under Linux I can put a drive temporarily into standby with "hdparm -y /dev/sda". However, the Linux IDE/SATA drivers will bring it out of standby as soon as I try to access it. > > In my experience, drives in standby mode are still capable of communicating > > with the host, > > And it probably does. > Problem is likely that host doesn't understand why it is in standby mode, > so it fails it. Okay. I would believe this if it was only the laptop BIOS that didn't know what to do. But not only the laptop BIOS can't initialize it, also the BIOS on my desktop can't initialize it, and the Linux kernel can't initialize it when booting from an external disk. I certainly think a recent Linux 2.6.20 kernel must know how to deal with this situation... I've never met another hard drive feature that the Linux kernel couldn't handle with ease. Of course, now that I dig around a little more, I find this patch on the linux-ide mailing list: http://www.mail-archive.com/linux-id.../msg04323.html Maybe with this patch my kernel will figure out what to do? I'll try it tonight... > > Anyone have any advice/anecdotes/explanation? > > Most likely your host isn't compatible with power-on in standby mode. > Set the drive back to normal. That may be easier said then done, apparently. Indeed. Is there any utility to do this?? Dan Lenski |
|
#12
| |||
| |||
| "Dan Lenski" > On Jun 1, 1:43 pm, "Folkert Rienstra" > > > of a drive controller issue. I tried holding the drive in my hand as > > > it powered up, and I could not feel the characteristic hum of the motor! > > > > And why should you? You set it up to "power-on in standby mode". > > So it does. > Indeed. However, I would expect it to come out of standby mode when > addressed by the host :-) Nope. It wants/needs to be specifically told. Else any access would wake it up. > For example, under Linux I can put a drive > temporarily into standby with "hdparm -y /dev/sda". However, the > Linux IDE/SATA drivers will bring it out of standby as soon as I try > to access it. Power-on in standby mode is an altogether different feature. It's similar to the start unit command of SCSI that is required if a SCSI drive has been jumpered for autospin disabled. The difference here is that the jumper has been executed in software so you have a jumper command and a spinup command. Svend has mentioned them both already. > > > > In my experience, drives in standby mode are still capable of communicating > > > with the host, > > > > And it probably does. > > Problem is likely that host doesn't understand why it is in standby mode, > > so it fails it. > > Okay. I would believe this if it was only the laptop BIOS that didn't > know what to do. But not only the laptop BIOS can't initialize it, > also the BIOS on my desktop can't initialize it, and the Linux kernel > can't initialize it when booting from an external disk. That's not so surprising at all. Even IBM/Hitachi who are normally well equiped (either their Drive Fitness Test or Feature Tool) don't have it in their toolkits. > > I certainly think a recent Linux 2.6.20 kernel must know how to deal > with this situation... I've never met another hard drive feature that > the Linux kernel couldn't handle with ease. > > Of course, now that I dig around a little more, I find this patch on > the linux-ide mailing list: http://www.mail-archive.com/linux-id.../msg04323.html > Maybe with this patch my kernel will figure out what to do? I'll try > it tonight... > > > > Anyone have any advice/anecdotes/explanation? > > > > Most likely your host isn't compatible with power-on in standby mode. > > Set the drive back to normal. That may be easier said then done, apparently. > Indeed. Is there any utility to do this?? Now that you mention it, Svend was experimenting with it. http://www.partitionsupport.com/advancednotes.htm > > Dan Lenski |
|
#13
| |||
| |||
| On Jun 1, 5:12 pm, svo...@partitionsupport.com (Svend Olaf Mikkelsen) wrote: > This is as expected. You need to send a > > Power-Up In Standby feature set device spin-up. > > command to spinup the disk, or a > > Disable Power-Up In Standby feature set. > > to disable the feature. > -- > Svend Olaf Wow. Just wow. I can hardly believe it, but that worked. Thanks Svend and Folkert for helping me figure out that the drive wasn't actually dead. Issuing those commands to the drive wasn't so easy: I had to apply Mark Lord's patch (http://www.mail-archive.com/linux- ide@vger.kernel.org/msg04323.html) to the 2.6.20 kernel. But lo and behold, when I booted with that patch, the SETFEATURE_SPINUP command was sent to the drive, and it began to operate again. The whole thing is kind of amazing: toggling the "power up in standby" feature caused the BIOS of *three* desktop computers to pronounce the drive dead, and to freeze when booting. In order to get past the BIOS, I had to hotplug the drive at the GRUB boot menu. And the default 2.6.20 Linux kernel of Ubuntu failed to spin the drive up as well. Probably the Linux kernel doesn't support this since it expects the BIOS to have spun the drive up already. So I still have some questions... * does anyone know of a BIOS that actually *does* know how to spin up drives that boot in standby? * why isn't this feature marked as DANGEROUS in the hdparm manual :-) ? * is there a way to issue raw commands to a drive from Linux (maybe via /sys) without recompiling the kernel? I'd like to make a standalone boot disk to help out other folks who've bricked their drive in a similar fashion. It'd be great to figure out a way to do it without a custom kernel. Wow. This is definitely the strangest hardware/firmware quirk I've ever encountered... and one of the most time-consuming. Dan |
|
#14
| |||
| |||
| > The whole thing is kind of amazing: toggling the "power up in standby" > feature caused the BIOS of *three* desktop computers to pronounce the > drive dead, and to freeze when booting. A clear sign of bad industry support of this (S)ATA feature, especially for laptop drives. For SCSI drives, their SCSI BIOSes can send START STOP UNIT (the similar SCSI command) at boot for very long times, and the drive can be mechanically jumpered to "no spin at powerup". This is because spinning up a SCSI drive imposes significant load to the PSU, so, it is a good idea to delay its spinup until after the BIOS self-tests, while the (S)ATA drives will be spinned up and power up. This reduces the PSU power load. But this is relevant for "heavy" SCSI drives only, not relevant for a laptop drive. That's why - IMHO - the industry support for a feature is bad on (S)ATA. > * why isn't this feature marked as DANGEROUS in the hdparm > manual :-) ? Hey, it's open source, mark yourself and tell the maintainer :-) > * is there a way to issue raw commands to a drive from Linux (maybe > via /sys) without recompiling the kernel? Try FreeBSD and "camcontrol". -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation maxim@storagecraft.com http://www.storagecraft.com |
|
#15
| |||
| |||
| Maxim S. Shatskih wrote: >> The whole thing is kind of amazing: toggling the "power up in standby" >> feature caused the BIOS of *three* desktop computers to pronounce the >> drive dead, and to freeze when booting. > > A clear sign of bad industry support of this (S)ATA feature, especially for > laptop drives. > > For SCSI drives, their SCSI BIOSes can send START STOP UNIT (the similar SCSI > command) at boot for very long times, and the drive can be mechanically > jumpered to "no spin at powerup". > > This is because spinning up a SCSI drive imposes significant load to the PSU, > so, it is a good idea to delay its spinup until after the BIOS self-tests, > while the (S)ATA drives will be spinned up and power up. This reduces the PSU > power load. > > But this is relevant for "heavy" SCSI drives only, not relevant for a laptop > drive. That's why - IMHO - the industry support for a feature is bad on (S)ATA. I don't agree on that. Don't forget that SATA drives are also used in big (and very expensive) storage arrays for low performance high capacity disk storage. > >> * why isn't this feature marked as DANGEROUS in the hdparm >> manual :-) ? > > Hey, it's open source, mark yourself and tell the maintainer :-) > >> * is there a way to issue raw commands to a drive from Linux (maybe >> via /sys) without recompiling the kernel? > > Try FreeBSD and "camcontrol". > |
|
#16
| |||
| |||
| On Jun 2, 4:02 am, "Maxim S. Shatskih" > > The whole thing is kind of amazing: toggling the "power up in standby" > > feature caused the BIOS of *three* desktop computers to pronounce the > > drive dead, and to freeze when booting. > > A clear sign of bad industry support of this (S)ATA feature, especially for > laptop drives. Right. It's about a 3-line addition to the BIOS code, as can be seen from Mark Lord's libata patch which I linked to. In my opinion, it *is* a feature which would benefit desktop computers and embedded systems, since you could save significant load on the PSU by not spinning up the HD at boot time. For example, my friend has built an automotive PC and he had problems with it crashing at boot, due to excessive drain on the car's 12V supply. Also, I don't think the distinction between 2.5" and 3.5" drives is relevant here, since they all use the same (S)ATA command set. > > * why isn't this feature marked as DANGEROUS in the hdparm > > manual :-) ? > > Hey, it's open source, mark yourself and tell the maintainer :-) Oh, believe me, I plan to :-) In my opinion, it is MUCH more dangerous than the other features marked dangerous. Most of them can simply crash the OS or lock up the drive until the next reboot. This one can make the drive appear dead *and* freeze the BIOS. > > * is there a way to issue raw commands to a drive from Linux (maybe > > via /sys) without recompiling the kernel? > > Try FreeBSD and "camcontrol". Cool. That's a neat utility. I feel like it outta be possible to send some commands via /sys/bus/scsi/devices or something like that... but it's just a hunch. I'm going to email Mark Lord about his patch and maybe he'll have a suggestion for that! I'd also like to poke the freakin' BIOS vendors with a clue stick and tell them to support this feature... but that's probably a lost cause, right? Dan |
|
#17
| |||
| |||
| On Jun 2, 4:22 am, Dirk Munk > I don't agree on that. Don't forget that SATA drives are also used in > big (and very expensive) storage arrays for low performance high > capacity disk storage. Right. I assume that's why this drive has the feature. I have heard that some data centers use arrays of 2.5" disks since they consume significantly less power, and I'm assuming that's why this feature is implemented for SATA disks. Dan |
|
#18
| |||
| |||
| > > Try FreeBSD and "camcontrol". > > Cool. That's a neat utility. I feel like it outta be possible to > send some commands via /sys/bus/scsi/devices or something like that... > but it's just a hunch. "camcontrol" IIRC can do this. But, to send SCSI commands to (S)ATA drive in FreeBSD, you need a properly built kernel - no direct ATA disk driver, but the SCSI-to-ATA bridge driver. -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation maxim@storagecraft.com http://www.storagecraft.com |
|
#19
| |||
| |||
| In comp.sys.ibm.pc.hardware.storage Dan Lenski > On Jun 2, 4:22 am, Dirk Munk >> I don't agree on that. Don't forget that SATA drives are also used in >> big (and very expensive) storage arrays for low performance high >> capacity disk storage. > Right. I assume that's why this drive has the feature. I have heard > that some data centers use arrays of 2.5" disks since they consume > significantly less power, and I'm assuming that's why this feature is > implemented for SATA disks. Actually 2.5" SATA drives are used as local disks in blade servers, were space and power are at a premium. There are also high performance 2.5" disks that are unsuitable for laptops, but AFAIK they are not available to ordinary customers, just to OEMs. And yes, I believe you are correct that this is the reason the feature is present. An other one is that 2.5" disks are far better at starting fast than 3.5" disks, since on laptops this is a typical way to save power. Still, basically the BIOS manufacurers or customizers messed up badly here. Arno |
|
#20
| |||
| |||
| In comp.sys.ibm.pc.hardware.storage Dan Lenski [...] > I'd also like to poke the freakin' BIOS vendors with a clue stick and > tell them to support this feature... but that's probably a lost cause, > right? Very likely. These people believe they know what they are doing, which is the worst kind of incompetence. Arno |
« Previous Thread
|
Next Thread »
| Tools | |
| |
| LinkBack to this Thread: http://fixunix.com/storage/202882-failure-brand-new-drive-possibly-due-staggered-spinup.html | ||||
| Posted By | For | Type | Date | |
| ??more: Advanced/W-ZERO3[es] | This thread | Refback | 07-17-2008 01:33 PM | |
| ??more: 2008?2? | This thread | Refback | 03-27-2008 08:30 AM | |
| ??more: RAID | This thread | Refback | 03-22-2008 11:41 AM | |
| ??more: RAID?? on Windows Server 2008 ??? | This thread | Refback | 02-29-2008 03:20 AM | |
| | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 8" disk drive project possibly 3.5" substitution | unix | CP/M | 20 | 11-09-2007 09:04 PM |
| hard drive failure S.M.A.R.T drive errors reported. | unix | Storage | 2 | 10-08-2007 10:02 AM |
| IDE Reset & spinup | unix | Storage | 56 | 10-08-2007 09:59 AM |
| Re: Any Drive Brand to Avoid? | unix | Storage | 3 | 10-08-2007 09:31 AM |
| Failure of brand new drive... possibly due to staggered spinup? | unix | Storage | 31 | 10-08-2007 09:26 AM |
All times are GMT. The time now is 10:03 AM.
