Bad disk controller? Dying HDD? - Hardware

This is a discussion on Bad disk controller? Dying HDD? - Hardware ; In comp.sys.ibm.pc.hardware.storage Trevor Hemsley wrote: > On Wed, 13 Aug 2008 18:28:20 UTC in comp.os.linux.hardware, ANTant@zimage.com > (Ant) wrote: > > > I've never heard of Fortron but that doesn't mean that they aren't any good. > > > What ...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 55

Thread: Bad disk controller? Dying HDD?

  1. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Trevor Hemsley wrote:
    > On Wed, 13 Aug 2008 18:28:20 UTC in comp.os.linux.hardware, ANTant@zimage.com
    > (Ant) wrote:


    > > > I've never heard of Fortron but that doesn't mean that they aren't any good.
    > > > What I have seen before is IDE devices on the same cable interfering with each
    > > > other when one of them is going south. You could try removing the 2nd device
    > > > from the primary cable and attaching it to another one (if there is one) and see
    > > > if that eliminates the errors on one or both of the drives.

    > >
    > > Is unmounting hdb via umount command valid?


    > Not really, you need to get the drive off the cable to tell which means opening
    > things up and performing surgery.


    OK when I get a chance (at work right now).
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  2. Re: Bad disk controller? Dying HDD?

    Ant wrote:
    >>>>>>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came
    >>>>>>> home and noticed my second drive wasn't there so I checked
    >>>>>>> dmesg (deleted normal logs before hdb):

    >
    >>>>>>> SMART Attributes Data Structure revision number: 7
    >>>>>>> Vendor Specific SMART Attributes with Thresholds:
    >>>>>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
    >>>>>>> UPDATED WHEN_FAILED RAW_VALUE
    >>>>>>> 1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail
    >>>>>>> Offline - 0
    >>>>>>> 3 Spin_Up_Time 0x0027 089 076 020 Pre-fail
    >>>>>>> Always - 1489
    >>>>>>> 4 Start_Stop_Count 0x0032 099 099 008 Old_age
    >>>>>>> Always - 1198
    >>>>>>> 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail
    >>>>>>> Always - 0
    >>>>>>> 7 Seek_Error_Rate 0x000b 100 100 023 Pre-fail
    >>>>>>> Always - 0
    >>>>>>> 9 Power_On_Hours 0x0012 007 007 001 Old_age
    >>>>>>> Always - 61139
    >>>>>>> 11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail
    >>>>>>> Always - 0
    >>>>>>> 12 Power_Cycle_Count 0x0032 099 099 008 Old_age
    >>>>>>> Always - 1191
    >>>>>>> 13 Read_Soft_Error_Rate 0x000b 100 100 023 Pre-fail
    >>>>>>> Always - 0
    >>>>>>> 199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age
    >>>>>>> Always - 0 Warning: device does not support Error Logging
    >>>>>>> SMART Error Log Version: 0
    >>>>>>> No Errors Logged
    >>>>>>>
    >>>>>>> Warning: device does not support Self Test Logging
    >>>>>>> SMART Self-test log structure revision number 0
    >>>>>>> Warning: ATA Specification requires self-test log structure
    >>>>>>> revision number = 1
    >>>>>>> No self-tests have been logged. [To run self-tests, use:
    >>>>>>> smartctl -t]
    >>>>>>> Device does not support Selective Self Tests/Logging
    >>>>>>> What's going on with my HDDs?
    >>>>>> Something's died, Jim. You into necrophilia ?
    >>>>>> P
    >>>>>
    >>>>>>> I am not a hardware and Linux/Debian expert. My hardware
    >>>>>>> specifications can be found here:
    >>>>>>> http://alpha.zimage.com/~ant/antfarm.../computers.txt
    >>>>>>> (secondary/backup computer). For now, I am going to unmount
    >>>>>>> /dev/hdb to see if hdb is causing all this madness.

    >
    >>>>>> There's certainly quite a few bad sectors on that drive.

    >
    >>>>> How can you tell?

    >
    >>>> Thats the reallocated sector count in the SMART report.

    >
    >>> So I have 100 bad sectors? Am I reading that right?


    >> No, 20, the last number, the raw value. Still very high.


    Sorry, I ****ed that up, its acually got no reallocated sectors.

    > Can you please kindly explain what these raw values mean?


    That varys with the parameter.

    > Are those 20 bad sectors or something else?


    Its actually got 0 bad sectors.


    >>>>> I am so used to chkdsk, scandisk, pretty images, etc.

    >
    >>>> A Jap would at least have the decency to disembowel itself.

    >
    >>>>>> That shouldnt have produced the errors with the other drive tho
    >>>>>> so
    >>>>>> something else has died too.

    >
    >>>>> True. So far, unmounting hdb hasn't resulted any symptoms.
    >>>>> However,
    >>>>> I haven't done much on it since I went to eat dinner and do other
    >>>>> things.
    >>>
    >>>>>> The system appears to be pretty comprehensively ****ed.
    >>>>
    >>>>>
    >>>>
    >>>>>> It would be worth trying a spare power supply, it might be
    >>>>>> something as basic as that.
    >>>>
    >>>>> OK, I will keep that in mind. My PSU in that box isn't that old. I
    >>>>> had to replace it with a new one on 5/14/2007 according to
    >>>>> http://alpha.zimage.com/~ant/antfarm/about/toys.html: "Replaced
    >>>>> the dead Antec PSU in Linux/Debian box with a Fortron
    >>>>> FSP650-80GLC PSU (650 watts)."
    >>>>
    >>>>>> Worth checking for bad caps on the motherboard too. These are the
    >>>>>> usually blue or black plastic covered post like things that stick
    >>>>>> up vertically from the motherboard surface. The tops should be
    >>>>>> flat. If any have bulged or have leaked, thats a bad cap and for
    >>>>>> someone like you the only
    >>>>>> real fix is to replace the motherboard. May be better to just bin
    >>>>>> the PC tho.
    >>>>
    >>>>> OK, I will check again. The last time I opened this PC was in late
    >>>>> January 2008, and I had several kernel panics and PC lockups due
    >>>>> to
    >>>>> a bad 512 MB Kingston memory (took a day to figure that out). This
    >>>>> could be related, but then I don't recall any funny odors, saw any
    >>>>> coloring on hardwares, bad caps, etc. Even my computer friend who
    >>>>> build computers looked and didn't see anything back then and told
    >>>>> me it
    >>>>> was bad RAM. Hmm.
    >>>>
    >>>>> Let's see how having hdb disabled go. hdb is mainly a storage and
    >>>>> backup drive, and I already made a backup of it a few weeks ago
    >>>>> (not much changed that I can recall).


    >>> Still no problems yet overnight with hdb unmounted in Debian/Linux.


    >> Thats more likely to be due to an intermittent fault unless debian is so
    >> ****ed that it cant report which hd* has a problem with some bad sectors.


    > Oh look, I found something new in my recent dmesg (wished it had dates
    > and time stamps) showed even with unmounted hdb drive even though my
    > PC
    > isn't pausing from what I can tell remotely (ssh since I am at work):
    > hdb: drive_cmd: status=0x7d { DriveReady DeviceFault SeekComplete
    > DataRequest CorrectedError Error }
    > hdb: drive_cmd: error=0x7d { DriveStatusError UncorrectableError
    > SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0 ide:
    > failed opcode was: 0xb0
    > hdb: status error: status=0x7d { DriveReady DeviceFault SeekComplete
    > DataRequest CorrectedError Error }
    > hdb: status error: error=0x7d { DriveStatusError UncorrectableError
    > SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0 ide:
    > failed opcode was: 0xb0
    > hdb: drive not ready for command


    > Nothing for hda (good). If this gets a lot worse, I am going to shutdown the PC.


    > I have a question, where do smartctl's test results go?


    Nowhere, they come from the drive. You can pipe them wherever you want in the usual way.

    > I have never seen any results from them in the past, including dmesg:


    > # smartctl -t long /dev/hdb
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/
    >
    > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    > Warning: device does not support Self-Test functions.
    >
    > Sending command: "Execute SMART Extended self-test routine immediately
    > in off-line mode".
    > Drive command "Execute SMART Extended self-test routine immediately in
    > off-line mode" successful.
    > Testing has begun.


    > According to man, it says it can take 10+ minutes but I see nothing.
    >


    Just run smartctl yourself.



  3. Re: Bad disk controller? Dying HDD?

    > >>>>>>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came
    > >>>>>>> home and noticed my second drive wasn't there so I checked
    > >>>>>>> dmesg (deleted normal logs before hdb):

    > >
    > >>>>>>> SMART Attributes Data Structure revision number: 7
    > >>>>>>> Vendor Specific SMART Attributes with Thresholds:
    > >>>>>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
    > >>>>>>> UPDATED WHEN_FAILED RAW_VALUE
    > >>>>>>> 1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail
    > >>>>>>> Offline - 0
    > >>>>>>> 3 Spin_Up_Time 0x0027 089 076 020 Pre-fail
    > >>>>>>> Always - 1489
    > >>>>>>> 4 Start_Stop_Count 0x0032 099 099 008 Old_age
    > >>>>>>> Always - 1198
    > >>>>>>> 5 Reallocated_Sector_Ct 0x0033 100 100 020 Pre-fail
    > >>>>>>> Always - 0
    > >>>>>>> 7 Seek_Error_Rate 0x000b 100 100 023 Pre-fail
    > >>>>>>> Always - 0
    > >>>>>>> 9 Power_On_Hours 0x0012 007 007 001 Old_age
    > >>>>>>> Always - 61139
    > >>>>>>> 11 Calibration_Retry_Count 0x0013 100 100 020 Pre-fail
    > >>>>>>> Always - 0
    > >>>>>>> 12 Power_Cycle_Count 0x0032 099 099 008 Old_age
    > >>>>>>> Always - 1191
    > >>>>>>> 13 Read_Soft_Error_Rate 0x000b 100 100 023 Pre-fail
    > >>>>>>> Always - 0
    > >>>>>>> 199 UDMA_CRC_Error_Count 0x001a 200 200 000 Old_age
    > >>>>>>> Always - 0 Warning: device does not support Error Logging
    > >>>>>>> SMART Error Log Version: 0
    > >>>>>>> No Errors Logged
    > >>>>>>>
    > >>>>>>> Warning: device does not support Self Test Logging
    > >>>>>>> SMART Self-test log structure revision number 0
    > >>>>>>> Warning: ATA Specification requires self-test log structure
    > >>>>>>> revision number = 1
    > >>>>>>> No self-tests have been logged. [To run self-tests, use:
    > >>>>>>> smartctl -t]
    > >>>>>>> Device does not support Selective Self Tests/Logging
    > >>>>>>> What's going on with my HDDs?
    > >>>>>> Something's died, Jim. You into necrophilia ?
    > >>>>>> P
    > >>>>>
    > >>>>>>> I am not a hardware and Linux/Debian expert. My hardware
    > >>>>>>> specifications can be found here:
    > >>>>>>> http://alpha.zimage.com/~ant/antfarm.../computers.txt
    > >>>>>>> (secondary/backup computer). For now, I am going to unmount
    > >>>>>>> /dev/hdb to see if hdb is causing all this madness.

    > >
    > >>>>>> There's certainly quite a few bad sectors on that drive.

    > >
    > >>>>> How can you tell?

    > >
    > >>>> Thats the reallocated sector count in the SMART report.

    > >
    > >>> So I have 100 bad sectors? Am I reading that right?


    > >> No, 20, the last number, the raw value. Still very high.


    > Sorry, I ****ed that up, its acually got no reallocated sectors.


    Ah.


    > > Can you please kindly explain what these raw values mean?


    > That varys with the parameter.


    OK.


    > > Are those 20 bad sectors or something else?


    > Its actually got 0 bad sectors.


    OK. So, it's not a bad sector problem then.


    > >>>>> I am so used to chkdsk, scandisk, pretty images, etc.

    > >
    > >>>> A Jap would at least have the decency to disembowel itself.

    > >
    > >>>>>> That shouldnt have produced the errors with the other drive tho
    > >>>>>> so
    > >>>>>> something else has died too.

    > >
    > >>>>> True. So far, unmounting hdb hasn't resulted any symptoms.
    > >>>>> However,
    > >>>>> I haven't done much on it since I went to eat dinner and do other
    > >>>>> things.
    > >>>
    > >>>>>> The system appears to be pretty comprehensively ****ed.
    > >>>>
    > >>>>>
    > >>>>
    > >>>>>> It would be worth trying a spare power supply, it might be
    > >>>>>> something as basic as that.
    > >>>>
    > >>>>> OK, I will keep that in mind. My PSU in that box isn't that old. I
    > >>>>> had to replace it with a new one on 5/14/2007 according to
    > >>>>> http://alpha.zimage.com/~ant/antfarm/about/toys.html: "Replaced
    > >>>>> the dead Antec PSU in Linux/Debian box with a Fortron
    > >>>>> FSP650-80GLC PSU (650 watts)."
    > >>>>
    > >>>>>> Worth checking for bad caps on the motherboard too. These are the
    > >>>>>> usually blue or black plastic covered post like things that stick
    > >>>>>> up vertically from the motherboard surface. The tops should be
    > >>>>>> flat. If any have bulged or have leaked, thats a bad cap and for
    > >>>>>> someone like you the only
    > >>>>>> real fix is to replace the motherboard. May be better to just bin
    > >>>>>> the PC tho.
    > >>>>
    > >>>>> OK, I will check again. The last time I opened this PC was in late
    > >>>>> January 2008, and I had several kernel panics and PC lockups due
    > >>>>> to
    > >>>>> a bad 512 MB Kingston memory (took a day to figure that out). This
    > >>>>> could be related, but then I don't recall any funny odors, saw any
    > >>>>> coloring on hardwares, bad caps, etc. Even my computer friend who
    > >>>>> build computers looked and didn't see anything back then and told
    > >>>>> me it
    > >>>>> was bad RAM. Hmm.
    > >>>>
    > >>>>> Let's see how having hdb disabled go. hdb is mainly a storage and
    > >>>>> backup drive, and I already made a backup of it a few weeks ago
    > >>>>> (not much changed that I can recall).


    > >>> Still no problems yet overnight with hdb unmounted in Debian/Linux.


    > >> Thats more likely to be due to an intermittent fault unless debian is so
    > >> ****ed that it cant report which hd* has a problem with some bad sectors.


    > > Oh look, I found something new in my recent dmesg (wished it had dates
    > > and time stamps) showed even with unmounted hdb drive even though my
    > > PC
    > > isn't pausing from what I can tell remotely (ssh since I am at work):
    > > hdb: drive_cmd: status=0x7d { DriveReady DeviceFault SeekComplete
    > > DataRequest CorrectedError Error }
    > > hdb: drive_cmd: error=0x7d { DriveStatusError UncorrectableError
    > > SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0 ide:
    > > failed opcode was: 0xb0
    > > hdb: status error: status=0x7d { DriveReady DeviceFault SeekComplete
    > > DataRequest CorrectedError Error }
    > > hdb: status error: error=0x7d { DriveStatusError UncorrectableError
    > > SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0 ide:
    > > failed opcode was: 0xb0
    > > hdb: drive not ready for command


    > > Nothing for hda (good). If this gets a lot worse, I am going to shutdown the PC.


    > > I have a question, where do smartctl's test results go?


    > Nowhere, they come from the drive. You can pipe them wherever you want in the usual way.


    Are you saying when I run these long tests, smartctl -a are the results
    from them? If so, then I was expecting something. Heh.


    > > I have never seen any results from them in the past, including dmesg:


    > > # smartctl -t long /dev/hdb
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/
    > >
    > > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    > > Warning: device does not support Self-Test functions.
    > >
    > > Sending command: "Execute SMART Extended self-test routine immediately
    > > in off-line mode".
    > > Drive command "Execute SMART Extended self-test routine immediately in
    > > off-line mode" successful.
    > > Testing has begun.


    > > According to man, it says it can take 10+ minutes but I see nothing.
    > >


    > Just run smartctl yourself.


    I did run "smartctl -t long /dev/hdb" and waited for a long time, but
    never got anything after it. No test results.
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  4. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >> > Nothing for hda (good). If this gets a lot worse, I am going to shutdown the PC.

    >
    >> > I have a question, where do smartctl's test results go?

    >
    >> Nowhere, they come from the drive. You can pipe them wherever you want in the usual way.

    >
    > Are you saying when I run these long tests, smartctl -a are the results
    > from them? If so, then I was expecting something. Heh.
    >


    Smartctl is displaying the results. You might also display the logs,
    assuming your drives support them:

    smartctl -l /dev/hd?

    From smartctl -h
    -l TYPE, --log=TYPE
    Show device log. TYPE: error, selftest, selective, directory,
    background, scttemp[sts,hist]

    Also I'd physically disconnect the drive that seems to be failing,
    hdb, not umount, pull the IDE cable. As long as the cable is plugged
    in and it's powered up it's on the bus.

    Jerry

  5. Re: Bad disk controller? Dying HDD?

    On 13 Aug 2008 09:34:21 GMT, Arno Wagner put finger
    to keyboard and composed:

    >In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    >> noticed my second drive wasn't there so I checked dmesg (deleted normal
    >> logs before hdb):

    >
    >From the SMART status, the Seagate (first drive) has read and
    >seek errors, that could well be connected. The Quantum looks
    >fine in SMART, but it is older (i.e. less reliable SMART
    >implementation) and has nasty SectorIDNotFound errors in the
    >log.
    >
    >So, what is going on indeed. I think either hdb is dying
    >and messing with the bus as it does so, or you have a different
    >problem. Attributes 1 and 7 on hda do not look good either.
    >The errors in the log for hda could indeed be a controller
    >issue, but nether could the errors for hdb or the SMART
    >attributes 1 and 7 for hda. I would suspect that your PSU is
    >going south and that unlean power is causing the problem.
    >
    >Arno


    High raw "seek error rate" and "read error rate" figures for Seagate
    drives appear to be normal. By my reckoning, the "seek error rate"
    parameter appears to be a seek count, not an error, and not a rate.

    - Franc Zabkar
    --
    Please remove one 'i' from my address when replying by email.

  6. Re: Bad disk controller? Dying HDD?

    Franc Zabkar wrote in news:4uk6a41i4d4gcfksg7vt3lhordrrkgbnhn@4ax.com
    > On 13 Aug 2008 09:34:21 GMT, Arno Wagner put finger
    > to keyboard and composed:
    >
    > > In comp.sys.ibm.pc.hardware.storage Ant wrote:
    > > > Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    > > > noticed my second drive wasn't there so I checked dmesg (deleted normal
    > > > logs before hdb):

    > >
    > > From the SMART status, the Seagate (first drive) has read and
    > > seek errors, that could well be connected. The Quantum looks
    > > fine in SMART, but it is older (i.e. less reliable SMART
    > > implementation) and has nasty SectorIDNotFound errors in the log.
    > >
    > > So, what is going on indeed. I think either hdb is dying
    > > and messing with the bus as it does so, or you have a different
    > > problem. Attributes 1 and 7 on hda do not look good either.
    > > The errors in the log for hda could indeed be a controller
    > > issue, but nether could the errors for hdb or the SMART
    > > attributes 1 and 7 for hda. I would suspect that your PSU is
    > > going south and that unlean power is causing the problem.
    > >
    > > Arno


    > High raw "seek error rate" and "read error rate" figures for Seagate
    > drives appear to be normal.


    You told him that before, Franc.
    He doesn't hear you, you are in *his* killfile.

    > By my reckoning, the "seek error rate" parameter appears to be a seek
    > count, not an error, and not a rate.


    Someone put that one to bed too.
    But then he is probably in *your* killfile
    and you didn't hear him. Right, Franc?


    >
    > - Franc Zabkar


  7. Re: Bad disk controller? Dying HDD?

    Ant wrote:
    >>>>>>>>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came
    >>>>>>>>> home and noticed my second drive wasn't there so I checked
    >>>>>>>>> dmesg (deleted normal logs before hdb):
    >>>
    >>>>>>>>> SMART Attributes Data Structure revision number: 7
    >>>>>>>>> Vendor Specific SMART Attributes with Thresholds:
    >>>>>>>>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
    >>>>>>>>> UPDATED WHEN_FAILED RAW_VALUE
    >>>>>>>>> 1 Raw_Read_Error_Rate 0x0029 100 100 020
    >>>>>>>>> Pre-fail Offline - 0
    >>>>>>>>> 3 Spin_Up_Time 0x0027 089 076 020
    >>>>>>>>> Pre-fail Always - 1489
    >>>>>>>>> 4 Start_Stop_Count 0x0032 099 099 008
    >>>>>>>>> Old_age Always - 1198
    >>>>>>>>> 5 Reallocated_Sector_Ct 0x0033 100 100 020
    >>>>>>>>> Pre-fail Always - 0
    >>>>>>>>> 7 Seek_Error_Rate 0x000b 100 100 023
    >>>>>>>>> Pre-fail Always - 0
    >>>>>>>>> 9 Power_On_Hours 0x0012 007 007 001
    >>>>>>>>> Old_age Always - 61139
    >>>>>>>>> 11 Calibration_Retry_Count 0x0013 100 100 020
    >>>>>>>>> Pre-fail Always - 0
    >>>>>>>>> 12 Power_Cycle_Count 0x0032 099 099 008
    >>>>>>>>> Old_age Always - 1191
    >>>>>>>>> 13 Read_Soft_Error_Rate 0x000b 100 100 023
    >>>>>>>>> Pre-fail Always - 0
    >>>>>>>>> 199 UDMA_CRC_Error_Count 0x001a 200 200 000
    >>>>>>>>> Old_age Always - 0 Warning: device does not support
    >>>>>>>>> Error Logging SMART Error Log Version: 0
    >>>>>>>>> No Errors Logged
    >>>>>>>>>
    >>>>>>>>> Warning: device does not support Self Test Logging
    >>>>>>>>> SMART Self-test log structure revision number 0
    >>>>>>>>> Warning: ATA Specification requires self-test log structure
    >>>>>>>>> revision number = 1
    >>>>>>>>> No self-tests have been logged. [To run self-tests, use:
    >>>>>>>>> smartctl -t]
    >>>>>>>>> Device does not support Selective Self Tests/Logging
    >>>>>>>>> What's going on with my HDDs?
    >>>>>>>> Something's died, Jim. You into necrophilia ?
    >>>>>>>> P
    >>>>>>>
    >>>>>>>>> I am not a hardware and Linux/Debian expert. My hardware
    >>>>>>>>> specifications can be found here:
    >>>>>>>>> http://alpha.zimage.com/~ant/antfarm.../computers.txt
    >>>>>>>>> (secondary/backup computer). For now, I am going to unmount
    >>>>>>>>> /dev/hdb to see if hdb is causing all this madness.
    >>>
    >>>>>>>> There's certainly quite a few bad sectors on that drive.
    >>>
    >>>>>>> How can you tell?
    >>>
    >>>>>> Thats the reallocated sector count in the SMART report.
    >>>
    >>>>> So I have 100 bad sectors? Am I reading that right?

    >
    >>>> No, 20, the last number, the raw value. Still very high.

    >
    >> Sorry, I ****ed that up, its acually got no reallocated sectors.

    >
    > Ah.
    >
    >
    >>> Can you please kindly explain what these raw values mean?

    >
    >> That varys with the parameter.

    >
    > OK.
    >
    >
    >>> Are those 20 bad sectors or something else?

    >
    >> Its actually got 0 bad sectors.

    >
    > OK. So, it's not a bad sector problem then.
    >
    >
    >>>>>>> I am so used to chkdsk, scandisk, pretty images, etc.
    >>>
    >>>>>> A Jap would at least have the decency to disembowel itself.
    >>>
    >>>>>>>> That shouldnt have produced the errors with the other drive tho
    >>>>>>>> so
    >>>>>>>> something else has died too.
    >>>
    >>>>>>> True. So far, unmounting hdb hasn't resulted any symptoms.
    >>>>>>> However,
    >>>>>>> I haven't done much on it since I went to eat dinner and do
    >>>>>>> other things.
    >>>>>
    >>>>>>>> The system appears to be pretty comprehensively ****ed.
    >>>>>>
    >>>>>>>
    >>>>>>
    >>>>>>>> It would be worth trying a spare power supply, it might be
    >>>>>>>> something as basic as that.
    >>>>>>
    >>>>>>> OK, I will keep that in mind. My PSU in that box isn't that
    >>>>>>> old. I had to replace it with a new one on 5/14/2007 according
    >>>>>>> to http://alpha.zimage.com/~ant/antfarm/about/toys.html:
    >>>>>>> "Replaced the dead Antec PSU in Linux/Debian box with a Fortron
    >>>>>>> FSP650-80GLC PSU (650 watts)."
    >>>>>>
    >>>>>>>> Worth checking for bad caps on the motherboard too. These are
    >>>>>>>> the usually blue or black plastic covered post like things
    >>>>>>>> that stick up vertically from the motherboard surface. The
    >>>>>>>> tops should be flat. If any have bulged or have leaked, thats
    >>>>>>>> a bad cap and for someone like you the only
    >>>>>>>> real fix is to replace the motherboard. May be better to just
    >>>>>>>> bin the PC tho.
    >>>>>>
    >>>>>>> OK, I will check again. The last time I opened this PC was in
    >>>>>>> late January 2008, and I had several kernel panics and PC
    >>>>>>> lockups due to
    >>>>>>> a bad 512 MB Kingston memory (took a day to figure that out).
    >>>>>>> This could be related, but then I don't recall any funny odors,
    >>>>>>> saw any coloring on hardwares, bad caps, etc. Even my computer
    >>>>>>> friend who build computers looked and didn't see anything back
    >>>>>>> then and told me it
    >>>>>>> was bad RAM. Hmm.
    >>>>>>
    >>>>>>> Let's see how having hdb disabled go. hdb is mainly a storage
    >>>>>>> and backup drive, and I already made a backup of it a few weeks
    >>>>>>> ago (not much changed that I can recall).

    >
    >>>>> Still no problems yet overnight with hdb unmounted in
    >>>>> Debian/Linux.

    >
    >>>> Thats more likely to be due to an intermittent fault unless debian
    >>>> is so ****ed that it cant report which hd* has a problem with some
    >>>> bad sectors.

    >
    >>> Oh look, I found something new in my recent dmesg (wished it had
    >>> dates
    >>> and time stamps) showed even with unmounted hdb drive even though my
    >>> PC
    >>> isn't pausing from what I can tell remotely (ssh since I am at
    >>> work):
    >>> hdb: drive_cmd: status=0x7d { DriveReady DeviceFault SeekComplete
    >>> DataRequest CorrectedError Error }
    >>> hdb: drive_cmd: error=0x7d { DriveStatusError UncorrectableError
    >>> SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0
    >>> ide:
    >>> failed opcode was: 0xb0
    >>> hdb: status error: status=0x7d { DriveReady DeviceFault SeekComplete
    >>> DataRequest CorrectedError Error }
    >>> hdb: status error: error=0x7d { DriveStatusError UncorrectableError
    >>> SectorIdNotFound AddrMarkNotFound }, LBAsect=226327933, sector=0
    >>> ide:
    >>> failed opcode was: 0xb0
    >>> hdb: drive not ready for command

    >
    >>> Nothing for hda (good). If this gets a lot worse, I am going to
    >>> shutdown the PC.

    >
    >>> I have a question, where do smartctl's test results go?

    >
    >> Nowhere, they come from the drive. You can pipe them wherever you
    >> want in the usual way.

    >
    > Are you saying when I run these long tests, smartctl -a are the
    > results from them? If so, then I was expecting something. Heh.
    >
    >
    >>> I have never seen any results from them in the past, including
    >>> dmesg:

    >
    >>> # smartctl -t long /dev/hdb
    >>> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    >>> Allen
    >>> Home page is http://smartmontools.sourceforge.net/
    >>>
    >>> === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    >>> Warning: device does not support Self-Test functions.
    >>>
    >>> Sending command: "Execute SMART Extended self-test routine
    >>> immediately
    >>> in off-line mode".
    >>> Drive command "Execute SMART Extended self-test routine immediately
    >>> in
    >>> off-line mode" successful.
    >>> Testing has begun.

    >
    >>> According to man, it says it can take 10+ minutes but I see nothing.
    >>>

    >
    >> Just run smartctl yourself.

    >
    > I did run "smartctl -t long /dev/hdb" and waited for a long time, but
    > never got anything after it. No test results.


    See Jerry's reply.



  8. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Ant wrote:
    > On 8/13/2008 2:34 AM PT, Arno Wagner typed:


    >> In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    >>> noticed my second drive wasn't there so I checked dmesg (deleted normal
    >>> logs before hdb):

    >>
    >> From the SMART status, the Seagate (first drive) has read and
    >> seek errors, that could well be connected. The Quantum looks
    >> fine in SMART, but it is older (i.e. less reliable SMART
    >> implementation) and has nasty SectorIDNotFound errors in the
    >> log.


    > Thanks for the clarifications.



    >> So, what is going on indeed. I think either hdb is dying
    >> and messing with the bus as it does so, or you have a different
    >> problem. Attributes 1 and 7 on hda do not look good either.
    >> The errors in the log for hda could indeed be a controller
    >> issue, but nether could the errors for hdb or the SMART
    >> attributes 1 and 7 for hda. I would suspect that your PSU is
    >> going south and that unlean power is causing the problem.


    > Darn it, this Fortron FSP650-80GLC PSU (650 watts) isn't not that old
    > either. Grr.


    I had one of these (of one) die in a server after 1 year. I use
    Enermax now.

    > So far, no errors overnight after unmounting hdb overnight...


    Well, it is possible that hda is fine.

    Arno

  9. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Franc Zabkar wrote:
    > On 13 Aug 2008 09:34:21 GMT, Arno Wagner put finger
    > to keyboard and composed:


    >>In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    >>> noticed my second drive wasn't there so I checked dmesg (deleted normal
    >>> logs before hdb):

    >>
    >>From the SMART status, the Seagate (first drive) has read and
    >>seek errors, that could well be connected. The Quantum looks
    >>fine in SMART, but it is older (i.e. less reliable SMART
    >>implementation) and has nasty SectorIDNotFound errors in the
    >>log.
    >>
    >>So, what is going on indeed. I think either hdb is dying
    >>and messing with the bus as it does so, or you have a different
    >>problem. Attributes 1 and 7 on hda do not look good either.
    >>The errors in the log for hda could indeed be a controller
    >>issue, but nether could the errors for hdb or the SMART
    >>attributes 1 and 7 for hda. I would suspect that your PSU is
    >>going south and that unlean power is causing the problem.
    >>
    >>Arno


    > High raw "seek error rate" and "read error rate" figures for Seagate
    > drives appear to be normal. By my reckoning, the "seek error rate"
    > parameter appears to be a seek count, not an error, and not a rate.


    Agreed for the raw value. I am actually concerned about the cooked
    values. They are a bit low, which may aor may not indicate a problem.
    If removing hdb (SectorIdNotFound is a killer...) solves the
    issue, I would say that hda is likely fine but should be watched.


    Arno

  10. Re: Bad disk controller? Dying HDD?

    > >> > Nothing for hda (good). If this gets a lot worse, I am going to shutdown the PC.
    > >
    > >> > I have a question, where do smartctl's test results go?

    > >
    > >> Nowhere, they come from the drive. You can pipe them wherever you want in the usual way.

    > >
    > > Are you saying when I run these long tests, smartctl -a are the results
    > > from them? If so, then I was expecting something. Heh.
    > >


    > Smartctl is displaying the results. You might also display the logs,
    > assuming your drives support them:


    > smartctl -l /dev/hd?


    > From smartctl -h
    > -l TYPE, --log=TYPE
    > Show device log. TYPE: error, selftest, selective, directory,
    > background, scttemp[sts,hist]


    It doesn't look like there is logging for hdb. Is this logging kept in
    HDD's SMART memory or something? Are these the same information I see
    when I run smartctl command or something different?


    # smartctl -l error /dev/hdb
    smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF READ SMART DATA SECTION ===
    Warning: device does not support Error Logging
    SMART Error Log Version: 0
    No Errors Logged

    # smartctl -l selftest /dev/hdb
    smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF READ SMART DATA SECTION ===
    Warning: device does not support Self Test Logging
    SMART Self-test log structure revision number 0
    Warning: ATA Specification requires self-test log structure revision
    number = 1
    No self-tests have been logged. [To run self-tests, use: smartctl -t]


    # smartctl -l selftest /dev/hda
    smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining
    LifeTime(hours) LBA_of_first_error
    # 1 Extended offline Completed without error 00% 18951
    -
    # 2 Extended offline Completed without error 00% 18674
    -
    # 3 Extended offline Completed without error 00% 15957
    -
    # 4 Extended offline Completed without error 00% 14448
    -


    # smartctl -l error /dev/hda
    smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF READ SMART DATA SECTION ===
    SMART Error Log Version: 1
    No Errors Logged

    --

    # smartctl -t long /dev/hda
    smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    Allen
    Home page is http://smartmontools.sourceforge.net/

    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Execute SMART Extended self-test routine immediately
    in off-line mode".
    Drive command "Execute SMART Extended self-test routine immediately in
    off-line mode" successful.
    Testing has begun.
    Please wait 58 minutes for test to complete.
    Test will complete after Wed Aug 13 19:43:52 2008

    Use smartctl -X to abort test.

    Do I just run "smartctl -a /dev/hda" to see the results after an hour?


    > Also I'd physically disconnect the drive that seems to be failing,
    > hdb, not umount, pull the IDE cable. As long as the cable is plugged
    > in and it's powered up it's on the bus.


    OK. I will do that later (still at work ). So far no new dmesg
    messages about HDDs.
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net (down)
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  11. Re: Bad disk controller? Dying HDD?

    > >> In comp.sys.ibm.pc.hardware.storage Ant wrote:
    > >>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    > >>> noticed my second drive wasn't there so I checked dmesg (deleted normal
    > >>> logs before hdb):
    > >>
    > >> From the SMART status, the Seagate (first drive) has read and
    > >> seek errors, that could well be connected. The Quantum looks
    > >> fine in SMART, but it is older (i.e. less reliable SMART
    > >> implementation) and has nasty SectorIDNotFound errors in the
    > >> log.


    > > Thanks for the clarifications.



    > >> So, what is going on indeed. I think either hdb is dying
    > >> and messing with the bus as it does so, or you have a different
    > >> problem. Attributes 1 and 7 on hda do not look good either.
    > >> The errors in the log for hda could indeed be a controller
    > >> issue, but nether could the errors for hdb or the SMART
    > >> attributes 1 and 7 for hda. I would suspect that your PSU is
    > >> going south and that unlean power is causing the problem.


    > > Darn it, this Fortron FSP650-80GLC PSU (650 watts) isn't not that old
    > > either. Grr.


    > I had one of these (of one) die in a server after 1 year. I use
    > Enermax now.


    Wow. I recall Fortron wasn't bad. I had generic, Antec, Enlight, and
    SeaSonic brands go bad on me after a few years.


    > > So far, no errors overnight after unmounting hdb overnight...


    > Well, it is possible that hda is fine.


    OK.
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net (down)
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  12. Re: Bad disk controller? Dying HDD?

    > In comp.sys.ibm.pc.hardware.storage Franc Zabkar wrote:
    > > On 13 Aug 2008 09:34:21 GMT, Arno Wagner put finger
    > > to keyboard and composed:


    > >>In comp.sys.ibm.pc.hardware.storage Ant wrote:
    > >>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    > >>> noticed my second drive wasn't there so I checked dmesg (deleted normal
    > >>> logs before hdb):
    > >>
    > >>From the SMART status, the Seagate (first drive) has read and
    > >>seek errors, that could well be connected. The Quantum looks
    > >>fine in SMART, but it is older (i.e. less reliable SMART
    > >>implementation) and has nasty SectorIDNotFound errors in the
    > >>log.
    > >>
    > >>So, what is going on indeed. I think either hdb is dying
    > >>and messing with the bus as it does so, or you have a different
    > >>problem. Attributes 1 and 7 on hda do not look good either.
    > >>The errors in the log for hda could indeed be a controller
    > >>issue, but nether could the errors for hdb or the SMART
    > >>attributes 1 and 7 for hda. I would suspect that your PSU is
    > >>going south and that unlean power is causing the problem.
    > >>
    > >>Arno


    > > High raw "seek error rate" and "read error rate" figures for Seagate
    > > drives appear to be normal. By my reckoning, the "seek error rate"
    > > parameter appears to be a seek count, not an error, and not a rate.


    > Agreed for the raw value. I am actually concerned about the cooked
    > values. They are a bit low, which may aor may not indicate a problem.
    > If removing hdb (SectorIdNotFound is a killer...) solves the
    > issue, I would say that hda is likely fine but should be watched.


    I did notice my PC feels a little smoother with hdb unmounted. I don't
    know if it is because of the problems it has or the way IDE/PATA works
    (again, not a hardware guy).
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net (down)
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  13. Re: Bad disk controller? Dying HDD?

    Ant wrote:
    >>>>> Nothing for hda (good). If this gets a lot worse, I am going to
    >>>>> shutdown the PC.
    >>>
    >>>>> I have a question, where do smartctl's test results go?
    >>>
    >>>> Nowhere, they come from the drive. You can pipe them wherever you
    >>>> want in the usual way.
    >>>
    >>> Are you saying when I run these long tests, smartctl -a are the
    >>> results from them? If so, then I was expecting something. Heh.
    >>>

    >
    >> Smartctl is displaying the results. You might also display the logs,
    >> assuming your drives support them:

    >
    >> smartctl -l /dev/hd?

    >
    >> From smartctl -h
    >> -l TYPE, --log=TYPE
    >> Show device log. TYPE: error, selftest, selective, directory,
    >> background, scttemp[sts,hist]


    > It doesn't look like there is logging for hdb.


    What you have below is what is logged.

    > Is this logging kept in HDD's SMART memory or something?


    Nope.

    > Are these the same information I see when I run smartctl command


    Yes.

    > or something different?


    Nope.

    > # smartctl -l error /dev/hdb
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/


    > === START OF READ SMART DATA SECTION ===
    > Warning: device does not support Error Logging
    > SMART Error Log Version: 0
    > No Errors Logged


    > # smartctl -l selftest /dev/hdb
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/
    >
    > === START OF READ SMART DATA SECTION ===
    > Warning: device does not support Self Test Logging
    > SMART Self-test log structure revision number 0
    > Warning: ATA Specification requires self-test log structure revision
    > number = 1
    > No self-tests have been logged. [To run self-tests, use: smartctl -t]


    > # smartctl -l selftest /dev/hda
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/


    > === START OF READ SMART DATA SECTION ===
    > SMART Self-test log structure revision number 1
    > Num Test_Description Status Remaining
    > LifeTime(hours) LBA_of_first_error
    > # 1 Extended offline Completed without error 00% 18951
    > -
    > # 2 Extended offline Completed without error 00% 18674
    > -
    > # 3 Extended offline Completed without error 00% 15957
    > -
    > # 4 Extended offline Completed without error 00% 14448
    > -


    Thats all there is.

    > # smartctl -l error /dev/hda
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/
    >
    > === START OF READ SMART DATA SECTION ===
    > SMART Error Log Version: 1
    > No Errors Logged
    >
    > --
    >
    > # smartctl -t long /dev/hda
    > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > Allen
    > Home page is http://smartmontools.sourceforge.net/
    >
    > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    > Sending command: "Execute SMART Extended self-test routine immediately
    > in off-line mode".
    > Drive command "Execute SMART Extended self-test routine immediately in
    > off-line mode" successful.
    > Testing has begun.
    > Please wait 58 minutes for test to complete.
    > Test will complete after Wed Aug 13 19:43:52 2008


    > Use smartctl -X to abort test.


    > Do I just run "smartctl -a /dev/hda" to see the results after an hour?


    No, what you have above is all there is.

    >> Also I'd physically disconnect the drive that seems to be failing,
    >> hdb, not umount, pull the IDE cable. As long as the cable is plugged
    >> in and it's powered up it's on the bus.

    >
    > OK. I will do that later (still at work ). So far no new dmesg
    > messages about HDDs.


    Bet its just an intermittent fault and you wont get anything useful until it returns.



  14. Re: Bad disk controller? Dying HDD?

    Ant wrote:
    >> In comp.sys.ibm.pc.hardware.storage Franc Zabkar
    >> wrote:
    >>> On 13 Aug 2008 09:34:21 GMT, Arno Wagner put finger
    >>> to keyboard and composed:

    >
    >>>> In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >>>>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came
    >>>>> home and noticed my second drive wasn't there so I checked dmesg
    >>>>> (deleted normal logs before hdb):
    >>>>
    >>>> From the SMART status, the Seagate (first drive) has read and
    >>>> seek errors, that could well be connected. The Quantum looks
    >>>> fine in SMART, but it is older (i.e. less reliable SMART
    >>>> implementation) and has nasty SectorIDNotFound errors in the
    >>>> log.
    >>>>
    >>>> So, what is going on indeed. I think either hdb is dying
    >>>> and messing with the bus as it does so, or you have a different
    >>>> problem. Attributes 1 and 7 on hda do not look good either.
    >>>> The errors in the log for hda could indeed be a controller
    >>>> issue, but nether could the errors for hdb or the SMART
    >>>> attributes 1 and 7 for hda. I would suspect that your PSU is
    >>>> going south and that unlean power is causing the problem.
    >>>>
    >>>> Arno

    >
    >>> High raw "seek error rate" and "read error rate" figures for Seagate
    >>> drives appear to be normal. By my reckoning, the "seek error rate"
    >>> parameter appears to be a seek count, not an error, and not a rate.

    >
    >> Agreed for the raw value. I am actually concerned about the cooked
    >> values. They are a bit low, which may aor may not indicate a problem.
    >> If removing hdb (SectorIdNotFound is a killer...) solves the
    >> issue, I would say that hda is likely fine but should be watched.


    > I did notice my PC feels a little smoother with hdb unmounted.
    > I don't know if it is because of the problems it has


    It doesnt have any problems.

    > or the way IDE/PATA works (again, not a hardware guy).


    Nope.



  15. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Rod Speed wrote:
    > Ant wrote:
    > >>>>> Nothing for hda (good). If this gets a lot worse, I am going to
    > >>>>> shutdown the PC.
    > >>>
    > >>>>> I have a question, where do smartctl's test results go?
    > >>>
    > >>>> Nowhere, they come from the drive. You can pipe them wherever you
    > >>>> want in the usual way.
    > >>>
    > >>> Are you saying when I run these long tests, smartctl -a are the
    > >>> results from them? If so, then I was expecting something. Heh.
    > >>>

    > >
    > >> Smartctl is displaying the results. You might also display the logs,
    > >> assuming your drives support them:

    > >
    > >> smartctl -l /dev/hd?

    > >
    > >> From smartctl -h
    > >> -l TYPE, --log=TYPE
    > >> Show device log. TYPE: error, selftest, selective, directory,
    > >> background, scttemp[sts,hist]


    > > It doesn't look like there is logging for hdb.


    > What you have below is what is logged.


    Oh. I was expecting a different log.


    > > Is this logging kept in HDD's SMART memory or something?


    > Nope.


    > > Are these the same information I see when I run smartctl command


    > Yes.


    Ah.


    > > or something different?


    > Nope.


    That what confused me. I was expecting something else. OK, good to
    know they're the same.


    > > # smartctl -l error /dev/hdb
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/


    > > === START OF READ SMART DATA SECTION ===
    > > Warning: device does not support Error Logging
    > > SMART Error Log Version: 0
    > > No Errors Logged


    > > # smartctl -l selftest /dev/hdb
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/
    > >
    > > === START OF READ SMART DATA SECTION ===
    > > Warning: device does not support Self Test Logging
    > > SMART Self-test log structure revision number 0
    > > Warning: ATA Specification requires self-test log structure revision
    > > number = 1
    > > No self-tests have been logged. [To run self-tests, use: smartctl -t]


    > > # smartctl -l selftest /dev/hda
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/


    > > === START OF READ SMART DATA SECTION ===
    > > SMART Self-test log structure revision number 1
    > > Num Test_Description Status Remaining
    > > LifeTime(hours) LBA_of_first_error
    > > # 1 Extended offline Completed without error 00% 18951
    > > -
    > > # 2 Extended offline Completed without error 00% 18674
    > > -
    > > # 3 Extended offline Completed without error 00% 15957
    > > -
    > > # 4 Extended offline Completed without error 00% 14448
    > > -


    > Thats all there is.


    Ah. Yeah, I was expecting a fancy technical listing.


    > > # smartctl -l error /dev/hda
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/
    > >
    > > === START OF READ SMART DATA SECTION ===
    > > SMART Error Log Version: 1
    > > No Errors Logged
    > >
    > > --
    > >
    > > # smartctl -t long /dev/hda
    > > smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce
    > > Allen
    > > Home page is http://smartmontools.sourceforge.net/
    > >
    > > === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    > > Sending command: "Execute SMART Extended self-test routine immediately
    > > in off-line mode".
    > > Drive command "Execute SMART Extended self-test routine immediately in
    > > off-line mode" successful.
    > > Testing has begun.
    > > Please wait 58 minutes for test to complete.
    > > Test will complete after Wed Aug 13 19:43:52 2008


    > > Use smartctl -X to abort test.


    > > Do I just run "smartctl -a /dev/hda" to see the results after an hour?


    > No, what you have above is all there is.


    OK cool!


    > >> Also I'd physically disconnect the drive that seems to be failing,
    > >> hdb, not umount, pull the IDE cable. As long as the cable is plugged
    > >> in and it's powered up it's on the bus.

    > >
    > > OK. I will do that later (still at work ). So far no new dmesg
    > > messages about HDDs.


    > Bet its just an intermittent fault and you wont get anything useful until it returns.


    Yep. I did get that error that SMART status couldn't be read last night
    when the hell gone lose.
    --
    "She's got ants in her pants." --unknown
    /\___/\
    / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
    | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net (down)
    \ _ / Please remove ANT if replying by e-mail.
    ( )

  16. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >> >> > Nothing for hda (good). If this gets a lot worse, I am going to shutdown the PC.
    >> >
    >> >> > I have a question, where do smartctl's test results go?
    >> >
    >> >> Nowhere, they come from the drive. You can pipe them wherever you want in the usual way.
    >> >
    >> > Are you saying when I run these long tests, smartctl -a are the results
    >> > from them? If so, then I was expecting something. Heh.
    >> >


    >> Smartctl is displaying the results. You might also display the logs,
    >> assuming your drives support them:


    >> smartctl -l /dev/hd?


    >> From smartctl -h
    >> -l TYPE, --log=TYPE
    >> Show device log. TYPE: error, selftest, selective, directory,
    >> background, scttemp[sts,hist]


    > It doesn't look like there is logging for hdb. Is this logging kept in
    > HDD's SMART memory or something? Are these the same information I see
    > when I run smartctl command or something different?


    smartctl -a shows the log if there is one. Your hdb
    probably does not have one, because it is too old.

    Arno

  17. Re: Bad disk controller? Dying HDD?

    In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >> >> In comp.sys.ibm.pc.hardware.storage Ant wrote:
    >> >>> Hi. My old mini-tower PC's HDDs are acting crazy today. I came home and
    >> >>> noticed my second drive wasn't there so I checked dmesg (deleted normal
    >> >>> logs before hdb):
    >> >>
    >> >> From the SMART status, the Seagate (first drive) has read and
    >> >> seek errors, that could well be connected. The Quantum looks
    >> >> fine in SMART, but it is older (i.e. less reliable SMART
    >> >> implementation) and has nasty SectorIDNotFound errors in the
    >> >> log.


    >> > Thanks for the clarifications.



    >> >> So, what is going on indeed. I think either hdb is dying
    >> >> and messing with the bus as it does so, or you have a different
    >> >> problem. Attributes 1 and 7 on hda do not look good either.
    >> >> The errors in the log for hda could indeed be a controller
    >> >> issue, but nether could the errors for hdb or the SMART
    >> >> attributes 1 and 7 for hda. I would suspect that your PSU is
    >> >> going south and that unlean power is causing the problem.


    >> > Darn it, this Fortron FSP650-80GLC PSU (650 watts) isn't not that old
    >> > either. Grr.


    >> I had one of these (of one) die in a server after 1 year. I use
    >> Enermax now.


    > Wow. I recall Fortron wasn't bad. I had generic, Antec, Enlight, and
    > SeaSonic brands go bad on me after a few years.


    Fortron is reasonable. Antec is pretty bad in a pretty package.
    SeaSonic is reasonable for lower powers. Enermax, however, is
    very good and has large reserves. For example one Enermax 550W
    I have is sold as 750W by another "quality" brand with a different
    case and likely adjusted protection circuitry. The inly thing
    with Enermax used to be that they were loud. Depending on the model
    they have solved this issue.

    BTW, my impressions about the different quality levels are from
    looking at the electronics and components used, and analysing
    failures. While I do not have a larger sample, better components
    and larger safety margins allways pay off in increased reliability.

    Arno

  18. Re: Bad disk controller? Dying HDD?

    On 14 Aug 2008 00:36:19 GMT, Arno Wagner put finger
    to keyboard and composed:

    >> High raw "seek error rate" and "read error rate" figures for Seagate
    >> drives appear to be normal. By my reckoning, the "seek error rate"
    >> parameter appears to be a seek count, not an error, and not a rate.

    >
    >Agreed for the raw value. I am actually concerned about the cooked
    >values. They are a bit low, which may aor may not indicate a problem.
    >If removing hdb (SectorIdNotFound is a killer...) solves the
    >issue, I would say that hda is likely fine but should be watched.
    >
    >
    >Arno


    Sorry, I should have looked more closely. It would appear then that
    the raw value and the normalised values are not directly related. The
    former is monotonically increasing according to my testing, but
    clearly the normalised values show a worst case figure which is less
    than the current value. I wish Seagate would come clean and release
    their SMART documentation, if only to dispel people's concerns over
    the large raw "error" numbers.

    - Franc Zabkar
    --
    Please remove one 'i' from my address when replying by email.

  19. Re: Bad disk controller? Dying HDD?

    On Wed, 13 Aug 2008 13:28:20 -0500, Ant rearranged some electrons to say:

    > In comp.sys.ibm.pc.hardware.storage Trevor Hemsley
    > wrote:
    >> On Wed, 13 Aug 2008 13:57:01 UTC in comp.os.linux.hardware, Ant
    >> wrote:

    >
    >> > Darn it, this Fortron FSP650-80GLC PSU (650 watts) isn't not that old
    >> > either. Grr.

    >
    >> I've never heard of Fortron but that doesn't mean that they aren't any
    >> good. What I have seen before is IDE devices on the same cable
    >> interfering with each other when one of them is going south. You could
    >> try removing the 2nd device from the primary cable and attaching it to
    >> another one (if there is one) and see if that eliminates the errors on
    >> one or both of the drives.

    >
    > Is unmounting hdb via umount command valid?


    umount doesn't normally power down the drive, it only unmounts the file
    system.

  20. Re: Bad disk controller? Dying HDD?

    On Aug 14, 7:01 am, Arno Wagner wrote:
    > In comp.sys.ibm.pc.hardware.storage Franc Zabkar wrote:
    >
    >
    >
    > > On 14 Aug 2008 00:36:19 GMT, Arno Wagner put finger
    > > to keyboard and composed:
    > >>> High raw "seek error rate" and "read error rate" figures for Seagate
    > >>> drives appear to be normal. By my reckoning, the "seek error rate"
    > >>> parameter appears to be a seek count, not an error, and not a rate.

    >
    > >>Agreed for the raw value. I am actually concerned about the cooked
    > >>values. They are a bit low, which may aor may not indicate a problem.
    > >>If removing hdb (SectorIdNotFound is a killer...) solves the
    > >>issue, I would say that hda is likely fine but should be watched.

    >
    > >>Arno

    > > Sorry, I should have looked more closely. It would appear then that
    > > the raw value and the normalised values are not directly related. The
    > > former is monotonically increasing according to my testing, but
    > > clearly the normalised values show a worst case figure which is less
    > > than the current value. I wish Seagate would come clean and release
    > > their SMART documentation, if only to dispel people's concerns over
    > > the large raw "error" numbers.

    >
    > And agreed again. At least there is a cooked value that has
    > a somewhat constant and known semantics.
    >
    > Arno


    I don't know if this has been mentioned before, but a good insurance
    policy is to have two drives, replace the oldest once every three
    years, and then reinstall the OS etc. on the newest drive, the
    remaining old drive is now used for backup and experimentation. It is
    good insurance and prorated over a three year span is not that
    expensive. In the
    present instance I would buy a new drive immediately.

    John Culleton

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast