Disks in error state - Veritas Volume Manager

This is a discussion on Disks in error state - Veritas Volume Manager ; Hi! We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists of two IBM FastT900 situated at two physically separeted locations. All LUN's are mirrored, one mirror on each FastT for redundancy if one FastT dives. ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Disks in error state

  1. Disks in error state


    Hi!

    We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists
    of two IBM FastT900 situated at two physically separeted locations. All LUN's
    are mirrored, one mirror on each FastT for redundancy if one FastT dives.

    We use Qlogic HBA cards to connect the Solaris boxes to the SAN.

    The past week we rebuilt our SAN environment, one FastT at the time. The
    first one went smooth, we disconnected it and dmp took care of everything,
    letting the other FastT take over.

    We upgraded the firmware and when the FastT was back online we re-created
    the LUN's, and extended some LUN's aswell while we were at it (planning to
    do the same on the other FastT). Then we used VEA to replace the "failed"
    disk's with the newly re-created ones.
    On this FastT we use LUN's ranging from 1-26. Previously we've been using
    the same LUN's on the other FastT aswell but now we changed the LUN's from
    1-26 to 50-76 instead to be able to easier know which LUN's are on which
    FastT.

    The problem now is that after we fixed the other FastT and changed the LUN's
    from 1-26 to 50-76 all the disk's remain in error state.


    This show's the LUN's on the FastT we upgraded first:
    c7t4d0s2 auto:cdsdisk - - online
    c7t4d1s2 auto:cdsdisk - - online
    c7t4d2s2 auto:cdsdisk - - online
    c7t4d3s2 auto:cdsdisk - - online
    c7t4d4s2 auto:cdsdisk - - online

    This show's the LUN's on the FastT we upgraded last, that gives us trouble:
    c7t2d50s2 auto - - error
    c7t2d51s2 auto - - error
    c7t2d52s2 auto - - error
    c7t2d53s2 auto - - error
    c7t2d54s2 auto - - error

    c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
    c7t2d51 and so on.

    I've done the following:
    a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
    b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
    -a -r)
    c) Updated disk devices on Solaris (devfsadm -c disk)
    d) Labeled the new disk's using 'format'
    e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's

    I am able to format the disks, so the OS should be aware of them, as would
    Veritas, why do they remain in the error state?
    I've tried rebooting the Solaris box with the only result that the old LUN's
    (1-26) got removed, the disk's remain in the same state though.

    I understand that it's hard to help because there are lots of combinations
    of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
    here, but I would be very happy if someone tries. Anything helps.

  2. Re: Disks in error state


    Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
    on our linux systems, I was in a hurry yesterday and mixed things up)

    Anyways, for some reason our problems seem to be connected to dmp. Our SAN
    controller reports that some LUN's arent on their preferred path, when we
    change this -some- LUN's leave the error state. It doesnt apply to all tho,
    so there's some other problem aswell it seems.

    "Christian Nord" wrote:
    >
    >Hi!
    >
    >We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists
    >of two IBM FastT900 situated at two physically separeted locations. All

    LUN's
    >are mirrored, one mirror on each FastT for redundancy if one FastT dives.
    >
    >We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
    >
    >The past week we rebuilt our SAN environment, one FastT at the time. The
    >first one went smooth, we disconnected it and dmp took care of everything,
    >letting the other FastT take over.
    >
    >We upgraded the firmware and when the FastT was back online we re-created
    >the LUN's, and extended some LUN's aswell while we were at it (planning

    to
    >do the same on the other FastT). Then we used VEA to replace the "failed"
    >disk's with the newly re-created ones.
    >On this FastT we use LUN's ranging from 1-26. Previously we've been using
    >the same LUN's on the other FastT aswell but now we changed the LUN's from
    >1-26 to 50-76 instead to be able to easier know which LUN's are on which
    >FastT.
    >
    >The problem now is that after we fixed the other FastT and changed the LUN's
    >from 1-26 to 50-76 all the disk's remain in error state.
    >
    >
    >This show's the LUN's on the FastT we upgraded first:
    >c7t4d0s2 auto:cdsdisk - - online
    >c7t4d1s2 auto:cdsdisk - - online
    >c7t4d2s2 auto:cdsdisk - - online
    >c7t4d3s2 auto:cdsdisk - - online
    >c7t4d4s2 auto:cdsdisk - - online
    >
    >This show's the LUN's on the FastT we upgraded last, that gives us trouble:
    >c7t2d50s2 auto - - error
    >c7t2d51s2 auto - - error
    >c7t2d52s2 auto - - error
    >c7t2d53s2 auto - - error
    >c7t2d54s2 auto - - error
    >
    >c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
    >c7t2d51 and so on.
    >
    >I've done the following:
    >a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
    >b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
    >-a -r)
    >c) Updated disk devices on Solaris (devfsadm -c disk)
    >d) Labeled the new disk's using 'format'
    >e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
    >
    >I am able to format the disks, so the OS should be aware of them, as would
    >Veritas, why do they remain in the error state?
    >I've tried rebooting the Solaris box with the only result that the old LUN's
    >(1-26) got removed, the disk's remain in the same state though.
    >
    >I understand that it's hard to help because there are lots of combinations
    >of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
    >here, but I would be very happy if someone tries. Anything helps.



  3. Re: Disks in error state


    You've run into a "feature" in DMP after 4.0. A DMP device does not have
    to match the hardware device. You can run the command
    "vxdmpadmin getsubpaths dmpnodename=" to see which
    hardware device is actually mapped to the dmp device.

    There is a tech note that tells how to clear the
    dmp database, which may work.

    Steve

    "Christian Nord" wrote:
    >
    >Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
    >on our linux systems, I was in a hurry yesterday and mixed things up)
    >
    >Anyways, for some reason our problems seem to be connected to dmp. Our SAN
    >controller reports that some LUN's arent on their preferred path, when we
    >change this -some- LUN's leave the error state. It doesnt apply to all tho,
    >so there's some other problem aswell it seems.
    >
    >"Christian Nord" wrote:
    >>
    >>Hi!
    >>
    >>We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists
    >>of two IBM FastT900 situated at two physically separeted locations. All

    >LUN's
    >>are mirrored, one mirror on each FastT for redundancy if one FastT dives.
    >>
    >>We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
    >>
    >>The past week we rebuilt our SAN environment, one FastT at the time. The
    >>first one went smooth, we disconnected it and dmp took care of everything,
    >>letting the other FastT take over.
    >>
    >>We upgraded the firmware and when the FastT was back online we re-created
    >>the LUN's, and extended some LUN's aswell while we were at it (planning

    >to
    >>do the same on the other FastT). Then we used VEA to replace the "failed"
    >>disk's with the newly re-created ones.
    >>On this FastT we use LUN's ranging from 1-26. Previously we've been using
    >>the same LUN's on the other FastT aswell but now we changed the LUN's from
    >>1-26 to 50-76 instead to be able to easier know which LUN's are on which
    >>FastT.
    >>
    >>The problem now is that after we fixed the other FastT and changed the

    LUN's
    >>from 1-26 to 50-76 all the disk's remain in error state.
    >>
    >>
    >>This show's the LUN's on the FastT we upgraded first:
    >>c7t4d0s2 auto:cdsdisk - - online
    >>c7t4d1s2 auto:cdsdisk - - online
    >>c7t4d2s2 auto:cdsdisk - - online
    >>c7t4d3s2 auto:cdsdisk - - online
    >>c7t4d4s2 auto:cdsdisk - - online
    >>
    >>This show's the LUN's on the FastT we upgraded last, that gives us trouble:
    >>c7t2d50s2 auto - - error
    >>c7t2d51s2 auto - - error
    >>c7t2d52s2 auto - - error
    >>c7t2d53s2 auto - - error
    >>c7t2d54s2 auto - - error
    >>
    >>c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
    >>c7t2d51 and so on.
    >>
    >>I've done the following:
    >>a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
    >>b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
    >>-a -r)
    >>c) Updated disk devices on Solaris (devfsadm -c disk)
    >>d) Labeled the new disk's using 'format'
    >>e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
    >>
    >>I am able to format the disks, so the OS should be aware of them, as would
    >>Veritas, why do they remain in the error state?
    >>I've tried rebooting the Solaris box with the only result that the old

    LUN's
    >>(1-26) got removed, the disk's remain in the same state though.
    >>
    >>I understand that it's hard to help because there are lots of combinations
    >>of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
    >>here, but I would be very happy if someone tries. Anything helps.

    >



+ Reply to Thread