Harddisk failures & logging? - SUN

This is a discussion on Harddisk failures & logging? - SUN ; Our small network (running Solaris 8) has recently had 2 harddisks fail, shortly after turning on logging on several of the partitions on the drives. Could this be related? Both drives failed with "Media Errors" on the same slice - ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Harddisk failures & logging?

  1. Harddisk failures & logging?

    Our small network (running Solaris 8) has recently had 2 harddisks
    fail, shortly after turning on logging on several of the partitions on
    the drives. Could this be related?


    Both drives failed with "Media Errors" on the same slice - 3 (/var).
    On both drives the / partition slice 0 also became non-writeable.


    We also saw a complaint on both systems that it could not "roll" the
    log file on the / partition???


    All attempts to repair the disks fails. fsck on the / partition runs
    quite quickly but fails complaining that it can not write to the
    partition. format (format or analyze) on the drives will not work
    because of the same problem - can not write to the drive.


    Any ideas? Anything other than fsck available to do some better
    low-level checks of the disks? Or allow the disks to become writeable
    again?


    Thanks,
    Mike


  2. Re: Harddisk failures & logging?

    On 4 Jan 2006 19:17:42 -0800, "Mike" wrote:

    >Our small network (running Solaris 8) has recently had 2 harddisks
    >fail, shortly after turning on logging on several of the partitions on
    >the drives. Could this be related?
    >
    >
    >Both drives failed with "Media Errors" on the same slice - 3 (/var).
    >On both drives the / partition slice 0 also became non-writeable.
    >
    >
    >We also saw a complaint on both systems that it could not "roll" the
    >log file on the / partition???
    >
    >
    >All attempts to repair the disks fails. fsck on the / partition runs
    >quite quickly but fails complaining that it can not write to the
    >partition. format (format or analyze) on the drives will not work
    >because of the same problem - can not write to the drive.
    >
    >
    >Any ideas? Anything other than fsck available to do some better
    >low-level checks of the disks? Or allow the disks to become writeable
    >again?
    >
    >
    >Thanks,
    > Mike


    The specifics are going to depend on whether you have ATA or SCSI/FC
    disks, so I'll be generic in my response.

    Most likely you have scenario where the disk(s) spare (reserved)
    sector area is full. If that is the case the disks shouldn't be usedd
    any more because you are getting data corruption every time you get a
    media error.

    fsck is more of a file system test, it isn't a disk diagnostic.

    A safe thing to do which will tell you how bad off you are is to use
    dd as in dd if=/dev/pathofquestionalbledisk of=/dev/null

    This will read entire disk, or at least up to the point where you have
    another error and put good ifo in your log.

    If you want to run some data validation tests, disk drive self-tests,
    etc, then some tests are constrained because one of the disks is a
    boot disk which you can't dismount to do some tests.

    One software package you could look at $90.00 is
    http://www.santools.com/smart/unix/manual there is a Solaris version,
    but do your own research to see if it is appropriate for you.



  3. Re: Harddisk failures & logging?

    Mike wrote:
    > Our small network (running Solaris 8) has recently had 2 harddisks
    > fail, shortly after turning on logging on several of the partitions on
    > the drives. Could this be related?


    Seems odd.

    > Both drives failed with "Media Errors" on the same slice - 3 (/var).
    > On both drives the / partition slice 0 also became non-writeable.


    Could this be a cable problem? Termination issue?

    > We also saw a complaint on both systems that it could not "roll" the
    > log file on the / partition???


    No idera. Your could try SunVTS.

    >
    > All attempts to repair the disks fails. fsck on the / partition runs
    > quite quickly but fails complaining that it can not write to the
    > partition. format (format or analyze) on the drives will not work
    > because of the same problem - can not write to the drive.


    I think it is impossible on late versions of Solaris to run fsck on a
    mounted file system, and even if it can be done, it would not be too
    sensible.

    Why not boot from CD, then run fsck on the disk? That way the disk will
    be writeable, since it is not mounted.

    --
    Dave K

    http://www.southminster-branch-line.org.uk/

    Please note my email address changes periodically to avoid spam.
    It is always of the form: month-year@domain. Hitting reply will work
    for a couple of months only. Later set it manually. The month is
    always written in 3 letters (e.g. Jan, not January etc)

  4. Re: Harddisk failures & logging?

    Mike wrote:
    >
    > Any ideas? Anything other than fsck available to do some better
    > low-level checks of the disks? Or allow the disks to become writeable
    > again?


    Try smartmontools. You can run diagnostics on the disks and read
    their internal error logs.

    http://sourceforge.net/projects/smartmontools

    Regards,

    David Mathog
    mathog@caltech.edu

  5. Re: Harddisk failures & logging?

    Systems are Blade 2000s with 73GB fiber channel drives.

    The systems have dual drives and we perform disk-to-disk backups
    frequently. So we were able to boot off the second drive to try to work
    on the primary drives.

    Even with running fsck and format from the backup drive on the primary
    drive, all attempts to trouble shoot or analyze the drives fail.

    Guess we will look at trying the dd and smartmontools options to see if
    we can check into them a little more.


  6. Re: Harddisk failures & logging?

    The systems are Blade 2000s and we are using 73GB fiber channel drives.

    We have all systems with dual drives and do frequent disk-to-disk
    backups. So we were able to boot off the backup disk with no problems.
    Then from the backup disk we tried both fsck and format commands on the
    unmounted partitions. Both failed. Even with the drives unmounted they
    were still marked as non-writeable.

    Will try the dd and smartmontools to see if they will give us more
    info.

    Also curious, how/where can I look at the logs?


    Mike


+ Reply to Thread