Harddisk failures & logging? - SUN
This is a discussion on Harddisk failures & logging? - SUN ; Our small network (running Solaris 8) has recently had 2 harddisks
fail, shortly after turning on logging on several of the partitions on
the drives. Could this be related?
Both drives failed with "Media Errors" on the same slice - ...
-
Harddisk failures & logging?
Our small network (running Solaris 8) has recently had 2 harddisks
fail, shortly after turning on logging on several of the partitions on
the drives. Could this be related?
Both drives failed with "Media Errors" on the same slice - 3 (/var).
On both drives the / partition slice 0 also became non-writeable.
We also saw a complaint on both systems that it could not "roll" the
log file on the / partition???
All attempts to repair the disks fails. fsck on the / partition runs
quite quickly but fails complaining that it can not write to the
partition. format (format or analyze) on the drives will not work
because of the same problem - can not write to the drive.
Any ideas? Anything other than fsck available to do some better
low-level checks of the disks? Or allow the disks to become writeable
again?
Thanks,
Mike
-
Re: Harddisk failures & logging?
On 4 Jan 2006 19:17:42 -0800, "Mike" wrote:
>Our small network (running Solaris 8) has recently had 2 harddisks
>fail, shortly after turning on logging on several of the partitions on
>the drives. Could this be related?
>
>
>Both drives failed with "Media Errors" on the same slice - 3 (/var).
>On both drives the / partition slice 0 also became non-writeable.
>
>
>We also saw a complaint on both systems that it could not "roll" the
>log file on the / partition???
>
>
>All attempts to repair the disks fails. fsck on the / partition runs
>quite quickly but fails complaining that it can not write to the
>partition. format (format or analyze) on the drives will not work
>because of the same problem - can not write to the drive.
>
>
>Any ideas? Anything other than fsck available to do some better
>low-level checks of the disks? Or allow the disks to become writeable
>again?
>
>
>Thanks,
> Mike
The specifics are going to depend on whether you have ATA or SCSI/FC
disks, so I'll be generic in my response.
Most likely you have scenario where the disk(s) spare (reserved)
sector area is full. If that is the case the disks shouldn't be usedd
any more because you are getting data corruption every time you get a
media error.
fsck is more of a file system test, it isn't a disk diagnostic.
A safe thing to do which will tell you how bad off you are is to use
dd as in dd if=/dev/pathofquestionalbledisk of=/dev/null
This will read entire disk, or at least up to the point where you have
another error and put good ifo in your log.
If you want to run some data validation tests, disk drive self-tests,
etc, then some tests are constrained because one of the disks is a
boot disk which you can't dismount to do some tests.
One software package you could look at $90.00 is
http://www.santools.com/smart/unix/manual there is a Solaris version,
but do your own research to see if it is appropriate for you.
-
Re: Harddisk failures & logging?
Mike wrote:
> Our small network (running Solaris 8) has recently had 2 harddisks
> fail, shortly after turning on logging on several of the partitions on
> the drives. Could this be related?
Seems odd.
> Both drives failed with "Media Errors" on the same slice - 3 (/var).
> On both drives the / partition slice 0 also became non-writeable.
Could this be a cable problem? Termination issue?
> We also saw a complaint on both systems that it could not "roll" the
> log file on the / partition???
No idera. Your could try SunVTS.
>
> All attempts to repair the disks fails. fsck on the / partition runs
> quite quickly but fails complaining that it can not write to the
> partition. format (format or analyze) on the drives will not work
> because of the same problem - can not write to the drive.
I think it is impossible on late versions of Solaris to run fsck on a
mounted file system, and even if it can be done, it would not be too
sensible.
Why not boot from CD, then run fsck on the disk? That way the disk will
be writeable, since it is not mounted.
--
Dave K
http://www.southminster-branch-line.org.uk/
Please note my email address changes periodically to avoid spam.
It is always of the form: month-year@domain. Hitting reply will work
for a couple of months only. Later set it manually. The month is
always written in 3 letters (e.g. Jan, not January etc)
-
Re: Harddisk failures & logging?
Mike wrote:
>
> Any ideas? Anything other than fsck available to do some better
> low-level checks of the disks? Or allow the disks to become writeable
> again?
Try smartmontools. You can run diagnostics on the disks and read
their internal error logs.
http://sourceforge.net/projects/smartmontools
Regards,
David Mathog
mathog@caltech.edu
-
Re: Harddisk failures & logging?
Systems are Blade 2000s with 73GB fiber channel drives.
The systems have dual drives and we perform disk-to-disk backups
frequently. So we were able to boot off the second drive to try to work
on the primary drives.
Even with running fsck and format from the backup drive on the primary
drive, all attempts to trouble shoot or analyze the drives fail.
Guess we will look at trying the dd and smartmontools options to see if
we can check into them a little more.
-
Re: Harddisk failures & logging?
The systems are Blade 2000s and we are using 73GB fiber channel drives.
We have all systems with dual drives and do frequent disk-to-disk
backups. So we were able to boot off the backup disk with no problems.
Then from the backup disk we tried both fsck and format commands on the
unmounted partitions. Both failed. Even with the drives unmounted they
were still marked as non-writeable.
Will try the dd and smartmontools to see if they will give us more
info.
Also curious, how/where can I look at the logs?
Mike