IDing failed disk drive in Origin 200 server - SGI

This is a discussion on IDing failed disk drive in Origin 200 server - SGI ; First off, I'm a Linux/Solaris guy so please don't hate me for asking a basic question, but I have a failed disk in an O200 server and I need to figure out which one. The power light is blinking orange ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: IDing failed disk drive in Origin 200 server

  1. IDing failed disk drive in Origin 200 server

    First off, I'm a Linux/Solaris guy so please don't hate me for asking
    a basic question, but I have a failed disk in an O200 server and I
    need to figure out which one. The power light is blinking orange on
    the front of the unit. We have several external RAID units attached to
    the various SCSI busses, but I don't think those are the culprit.

    The SYSLOG messages are:

    Feb 2 22:35:04 6A:superior unix: dksc3d1s7: <6> retries exhausted
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>

    There are tons of those.

    We then have this stuff:

    Feb 5 08:13:10 4A:superior unix: WARNING: ARP: got MAC address on eg
    for BCAST IP address 0.0.0.0
    Feb 5 15:08:11 6A:superior unix: ql3d1: <6>Selection timeout
    Feb 5 15:08:11 6A:superior unix: dksc3d1s7: <6>SCSI driver error:
    device does not respond to selection
    Feb 5 15:08:11 1A:superior unix: ALERT: I/O Error Detected. Shutting
    down filesystem: /mnts/iraid/
    Feb 5 15:08:11 1A:superior unix: ALERT: Please umount the filesystem,
    and rectify the problem(s)


    One of the external RAID units had a power-supply failure and lost
    power. I'm pretty positive that it was mounted under /mnts/iraid. I'm
    *not* sure if that is the dksc3d1s7 device though because I fixed all
    those issues and rebooted and I still have the orange light on the
    front of the server.


    Here are more boot messages

    Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem: /
    Feb 5 15:18:16 5A:superior unix: NOTICE: Starting XFS recovery on
    filesystem: / (dev: 0/226)
    Feb 5 15:18:16 5A:superior unix: NOTICE: Ending XFS recovery for
    filesystem: / (/hw/module/1/slot/MotherBoard/node/xtalk/8/pci/
    0/scsi_ctlr/0/target/1/lun/0/disk/partition/0/block)
    Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    /mnts/int_stripe/
    Feb 5 15:18:16 5A:superior unix: NOTICE: Ending clean XFS mount for
    filesystem: /mnts/int_stripe/
    Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    /mnts/ciprico/
    Feb 5 15:18:16 5A:superior unix: NOTICE: Ending clean XFS mount for
    filesystem: /mnts/ciprico/
    Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    /mnts/iraid/
    Feb 5 15:18:16 5A:superior unix: NOTICE: Starting XFS recovery on
    filesystem: /mnts/iraid/ (dev: 0/259)
    Feb 5 15:18:16 5A:superior unix: NOTICE: Ending XFS recovery for
    filesystem: /mnts/iraid/ (/hw/module/1/slot/MotherBoard/node/x
    talk/8/pci/6/scsi_ctlr/0/target/1/lun/0/disk/partition/7/block)


    Anyone have any clue where to begin?

    Thanks,
    Jeff

  2. Re: IDing failed disk drive in Origin 200 server

    controler 3, disk 1 partition 7 (it's an option drive if it's partition
    7). Lucky lucky guy, tiz not ya root drive :-)

    To find the broken disk id the others by using the disk activity light's.

    cd to the mount point of one disk and run find . in that dir. Then the
    disk thats flashing is that drive. Run mount and it will give you the disk
    ID.

    eg: /dev/dsk/dks1d4s7 is controler 1 disk 4 partition 7 . Controler 0 is
    the first controler and will be the internal controler that the system
    disk is on (by default unless you've installed the system in a non
    standard config).

    well hope this helps

    l8erz

    *********************
    Khalid Schofield
    System Administrator / EM Technician
    Dept. Of Materials
    University Of Oxford
    Parks Road
    Oxford
    OX1 3PH

    Email: khalid.schofield@materials.ox.ac.uk
    Tel: 01865 273785
    Fax: 01865 283333
    Web: http://www-em.materials.ox.ac.uk/peo...eld/index.html


    On Thu, 5 Feb 2004, Jeff wrote:

    > First off, I'm a Linux/Solaris guy so please don't hate me for asking
    > a basic question, but I have a failed disk in an O200 server and I
    > need to figure out which one. The power light is blinking orange on
    > the front of the unit. We have several external RAID units attached to
    > the various SCSI busses, but I don't think those are the culprit.
    >
    > The SYSLOG messages are:
    >
    > Feb 2 22:35:04 6A:superior unix: dksc3d1s7: <6> retries exhausted
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    > error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    > error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    > error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6> retrying request
    > Feb 2 22:39:04 6A:superior unix: dksc3d1s7: <6>[Alert] Media
    > error<6>: (asc=0x0, asq=0x0)<6>, Block #-4096<6> (0)<6>
    >
    > There are tons of those.
    >
    > We then have this stuff:
    >
    > Feb 5 08:13:10 4A:superior unix: WARNING: ARP: got MAC address on eg
    > for BCAST IP address 0.0.0.0
    > Feb 5 15:08:11 6A:superior unix: ql3d1: <6>Selection timeout
    > Feb 5 15:08:11 6A:superior unix: dksc3d1s7: <6>SCSI driver error:
    > device does not respond to selection
    > Feb 5 15:08:11 1A:superior unix: ALERT: I/O Error Detected. Shutting
    > down filesystem: /mnts/iraid/
    > Feb 5 15:08:11 1A:superior unix: ALERT: Please umount the filesystem,
    > and rectify the problem(s)
    >
    >
    > One of the external RAID units had a power-supply failure and lost
    > power. I'm pretty positive that it was mounted under /mnts/iraid. I'm
    > *not* sure if that is the dksc3d1s7 device though because I fixed all
    > those issues and rebooted and I still have the orange light on the
    > front of the server.
    >
    >
    > Here are more boot messages
    >
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem: /
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Starting XFS recovery on
    > filesystem: / (dev: 0/226)
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Ending XFS recovery for
    > filesystem: / (/hw/module/1/slot/MotherBoard/node/xtalk/8/pci/
    > 0/scsi_ctlr/0/target/1/lun/0/disk/partition/0/block)
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    > /mnts/int_stripe/
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Ending clean XFS mount for
    > filesystem: /mnts/int_stripe/
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    > /mnts/ciprico/
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Ending clean XFS mount for
    > filesystem: /mnts/ciprico/
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Start mounting filesystem:
    > /mnts/iraid/
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Starting XFS recovery on
    > filesystem: /mnts/iraid/ (dev: 0/259)
    > Feb 5 15:18:16 5A:superior unix: NOTICE: Ending XFS recovery for
    > filesystem: /mnts/iraid/ (/hw/module/1/slot/MotherBoard/node/x
    > talk/8/pci/6/scsi_ctlr/0/target/1/lun/0/disk/partition/7/block)
    >
    >
    > Anyone have any clue where to begin?
    >
    > Thanks,
    > Jeff
    >


+ Reply to Thread