fail to import disk group - Veritas Cluster Server

This is a discussion on fail to import disk group - Veritas Cluster Server ; hi all, after vcs 4.1 installed and configured as two node cluster(implemented VXIO FENCING), i disconnected two HB cables. the primary node got the system panic and reboot, all service groups running on primay node failed over to standby node, ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: fail to import disk group

  1. fail to import disk group


    hi all,
    after vcs 4.1 installed and configured as two node cluster(implemented VXIO
    FENCING), i disconnected two HB cables. the primary node got the system panic
    and reboot, all service groups running on primay node failed over to standby
    node, but vcs failed to import disk group on standby node. i still tried
    to import DG on standby node by "vxdg -fC import DG" command, fail again.
    all disk group can be imported on primary node when the primary node booted
    up and joined as cluster member. any idea ? thanks all first.



  2. Re: fail to import disk group

    Depends on a lot of things:

    1. Can both nodes see all the disks ? including the disks from the
    diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    machines and compare or post here).

    2. The Service Group that includes the DiskGroup Resource, should be
    able to run on both the nodes. Look at the SystemList for the Service
    Group. Will always be best to use VCS to do this (it uses the correct
    options)

    3. Obviously you tried doing it by hand. If you have fencing enabled,
    and the diskgroup is not shared (not imported on both the nodes at the
    same time), then keys will be put onto them as soon as they are
    imported. If you want to remove keys , use the "-o clearreserve" option
    with the vxdg import command (just read the man page - pretty sure it is
    there). If that will not work, you can look at the keys on the disk(s)
    using

    /sbin/vxfenadm -r /dev/rdsk/c#t#d#s2

    and then kick of the key from the disk with

    /sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2


    This will place a key called "TEMP" on the disk, but will kick the other
    keys off (more importantly). Then you can delete the key with the command

    vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2



    If you want further help - will need to post smoe info here, but that
    should mostly cover it.


    go wrote:
    > hi all,
    > after vcs 4.1 installed and configured as two node cluster(implemented VXIO
    > FENCING), i disconnected two HB cables. the primary node got the system panic
    > and reboot, all service groups running on primay node failed over to standby
    > node, but vcs failed to import disk group on standby node. i still tried
    > to import DG on standby node by "vxdg -fC import DG" command, fail again.
    > all disk group can be imported on primary node when the primary node booted
    > up and joined as cluster member. any idea ? thanks all first.
    >
    >


  3. Re: fail to import disk group


    thanks for your help reply first.
    here's my reply
    >1. Can both nodes see all the disks ? including the disks from the
    >diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >machines and compare or post here).


    YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT THE
    SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE RESOURCE
    GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.


    3. Obviously you tried doing it by hand. If you have fencing enabled,
    >and the diskgroup is not shared (not imported on both the nodes at the
    >same time), then keys will be put onto them as soon as they are
    >imported. If you want to remove keys , use the "-o clearreserve" option


    >with the vxdg import command (just read the man page - pretty sure it is


    >there). If that will not work, you can look at the keys on the disk(s)
    >using


    I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC AND
    REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD NOT
    BE TAKEN BY MYSELF/HAND

    WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .

    BEST THANKS
    GO
    Me wrote:
    >Depends on a lot of things:
    >
    >1. Can both nodes see all the disks ? including the disks from the
    >diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >machines and compare or post here).
    >
    >2. The Service Group that includes the DiskGroup Resource, should be
    >able to run on both the nodes. Look at the SystemList for the Service
    >Group. Will always be best to use VCS to do this (it uses the correct
    >options)
    >
    >3. Obviously you tried doing it by hand. If you have fencing enabled,
    >and the diskgroup is not shared (not imported on both the nodes at the
    >same time), then keys will be put onto them as soon as they are
    >imported. If you want to remove keys , use the "-o clearreserve" option


    >with the vxdg import command (just read the man page - pretty sure it is


    >there). If that will not work, you can look at the keys on the disk(s)
    >using
    >
    >/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >
    >and then kick of the key from the disk with
    >
    >/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >
    >
    >This will place a key called "TEMP" on the disk, but will kick the other


    >keys off (more importantly). Then you can delete the key with the command
    >
    >vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >
    >
    >
    >If you want further help - will need to post smoe info here, but that
    >should mostly cover it.
    >
    >
    >go wrote:
    >> hi all,
    >> after vcs 4.1 installed and configured as two node cluster(implemented

    VXIO
    >> FENCING), i disconnected two HB cables. the primary node got the system

    panic
    >> and reboot, all service groups running on primay node failed over to

    standby
    >> node, but vcs failed to import disk group on standby node. i still tried
    >> to import DG on standby node by "vxdg -fC import DG" command, fail again.
    >> all disk group can be imported on primary node when the primary node booted
    >> up and joined as cluster member. any idea ? thanks all first.
    >>
    >>



  4. Re: fail to import disk group

    Fencing will prevent data loss by preventing access to a disk.

    In this sense, when a node does panic, the keys will be left there and
    other nodes in the cluster will be able to import (with a clearreserve)
    the diskgroup.


    OK, so what is needed to continue ?


    1. vxdisk -o alldgs list from both the nodes.
    2. hastatus -sum from any disk


    That is a start and then after seeing which machine sees what, we will
    be able to continue


    go wrote:
    > thanks for your help reply first.
    > here's my reply
    >
    >>1. Can both nodes see all the disks ? including the disks from the
    >>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>machines and compare or post here).

    >
    >
    > YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT THE
    > SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE RESOURCE
    > GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >
    >
    > 3. Obviously you tried doing it by hand. If you have fencing enabled,
    >
    >>and the diskgroup is not shared (not imported on both the nodes at the
    >>same time), then keys will be put onto them as soon as they are
    >>imported. If you want to remove keys , use the "-o clearreserve" option

    >
    >
    >>with the vxdg import command (just read the man page - pretty sure it is

    >
    >
    >>there). If that will not work, you can look at the keys on the disk(s)
    >>using

    >
    >
    > I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC AND
    > REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD NOT
    > BE TAKEN BY MYSELF/HAND
    >
    > WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >
    > BEST THANKS
    > GO
    > Me wrote:
    >
    >>Depends on a lot of things:
    >>
    >>1. Can both nodes see all the disks ? including the disks from the
    >>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>machines and compare or post here).
    >>
    >>2. The Service Group that includes the DiskGroup Resource, should be
    >>able to run on both the nodes. Look at the SystemList for the Service
    >>Group. Will always be best to use VCS to do this (it uses the correct
    >>options)
    >>
    >>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>and the diskgroup is not shared (not imported on both the nodes at the
    >>same time), then keys will be put onto them as soon as they are
    >>imported. If you want to remove keys , use the "-o clearreserve" option

    >
    >
    >>with the vxdg import command (just read the man page - pretty sure it is

    >
    >
    >>there). If that will not work, you can look at the keys on the disk(s)
    >>using
    >>
    >>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>
    >>and then kick of the key from the disk with
    >>
    >>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>
    >>
    >>This will place a key called "TEMP" on the disk, but will kick the other

    >
    >
    >>keys off (more importantly). Then you can delete the key with the command
    >>
    >>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>
    >>
    >>
    >>If you want further help - will need to post smoe info here, but that
    >>should mostly cover it.
    >>
    >>
    >>go wrote:
    >>
    >>>hi all,
    >>> after vcs 4.1 installed and configured as two node cluster(implemented

    >
    > VXIO
    >
    >>>FENCING), i disconnected two HB cables. the primary node got the system

    >
    > panic
    >
    >>>and reboot, all service groups running on primay node failed over to

    >
    > standby
    >
    >>>node, but vcs failed to import disk group on standby node. i still tried
    >>>to import DG on standby node by "vxdg -fC import DG" command, fail again.
    >>>all disk group can be imported on primary node when the primary node booted
    >>>up and joined as cluster member. any idea ? thanks all first.
    >>>
    >>>

    >
    >


  5. Re: fail to import disk group


    output here. thanks for ur help

    pls ignore the resource status, some test were performed during command output.

    vxdisk -alldgs list

    Node1:
    =========
    DEVICE TYPE DISK GROUP STATUS
    EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    EMC0_1 auto:cdsdisk - (controlmdg01) online
    EMC0_2 auto:cdsdisk - (controlmdg01) online
    EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    EMC0_6 auto:cdsdisk - (controlmdg04) online
    EMC0_7 auto:cdsdisk - (controlmdg05) online
    EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    EMC0_9 auto:cdsdisk - (controlmdg04) online
    EMC0_10 auto:cdsdisk - (controlmdg05) online
    EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    c0t0d0s2 auto:sliced rootdisk rootdg online
    c0t1d0s2 auto:sliced rootmirror rootdg online
    c0t4d0s2 auto:none - - online invalid
    c0t5d0s2 auto:none - - online invalid

    Node 2:
    ==============
    DEVICE TYPE DISK GROUP STATUS
    EMC0_0 auto:cdsdisk - (controlmdg02) online
    EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    EMC0_2 auto:cdsdisk - (controlmdg03) online
    EMC0_3 auto:cdsdisk - (controlmdg02) online
    EMC0_4 auto:cdsdisk - (controlmdg03) online
    EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    c0t0d0s2 auto:sliced rootdisk rootdg online
    c0t1d0s2 auto:sliced rootmirror rootdg online

    hastatus -sum
    ============

    -- SYSTEM STATE
    -- System State Frozen

    A HKCTMP01 RUNNING 0
    A HKCTMP02 RUNNING 0

    -- GROUP STATE
    -- Group System Probed AutoDisabled State


    B CM1 HKCTMP01 Y N PARTIAL

    B CM1 HKCTMP02 Y N OFFLINE

    B CM2 HKCTMP01 Y N PARTIAL

    B CM2 HKCTMP02 Y N OFFLINE

    B CM3 HKCTMP01 Y N OFFLINE

    B CM3 HKCTMP02 Y N PARTIAL

    B CM4 HKCTMP01 Y N OFFLINE

    B CM4 HKCTMP02 Y N PARTIAL

    B EM HKCTMP01 Y N OFFLINE|FAULTED
    B EM HKCTMP02 Y N PARTIAL


    -- RESOURCES FAILED
    -- Group Type Resource System


    C CM1 Application APPLICATION_AGCM01 HKCTMP01

    C CM2 Application APPLICATION_AGCM02 HKCTMP01

    C CM3 Application APPLICATION_AGCM03 HKCTMP02

    C CM4 Application APPLICATION_AGCM04 HKCTMP02

    C EM Application APPLICATION_AGEM01 HKCTMP02

    C EM Application APPLICATION_EM HKCTMP01


    Thanks
    Go

    Me wrote:
    >Fencing will prevent data loss by preventing access to a disk.
    >
    >In this sense, when a node does panic, the keys will be left there and
    >other nodes in the cluster will be able to import (with a clearreserve)


    >the diskgroup.
    >
    >
    >OK, so what is needed to continue ?
    >
    >
    >1. vxdisk -o alldgs list from both the nodes.
    >2. hastatus -sum from any disk
    >
    >
    >That is a start and then after seeing which machine sees what, we will
    >be able to continue
    >
    >
    >go wrote:
    >> thanks for your help reply first.
    >> here's my reply
    >>
    >>>1. Can both nodes see all the disks ? including the disks from the
    >>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>machines and compare or post here).

    >>
    >>
    >> YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT

    THE
    >> SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE

    RESOURCE
    >> GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>
    >>
    >> 3. Obviously you tried doing it by hand. If you have fencing enabled,


    >>
    >>>and the diskgroup is not shared (not imported on both the nodes at the


    >>>same time), then keys will be put onto them as soon as they are
    >>>imported. If you want to remove keys , use the "-o clearreserve" option

    >>
    >>
    >>>with the vxdg import command (just read the man page - pretty sure it

    is
    >>
    >>
    >>>there). If that will not work, you can look at the keys on the disk(s)


    >>>using

    >>
    >>
    >> I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC

    AND
    >> REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD

    NOT
    >> BE TAKEN BY MYSELF/HAND
    >>
    >> WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>
    >> BEST THANKS
    >> GO
    >> Me wrote:
    >>
    >>>Depends on a lot of things:
    >>>
    >>>1. Can both nodes see all the disks ? including the disks from the
    >>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>machines and compare or post here).
    >>>
    >>>2. The Service Group that includes the DiskGroup Resource, should be
    >>>able to run on both the nodes. Look at the SystemList for the Service


    >>>Group. Will always be best to use VCS to do this (it uses the correct


    >>>options)
    >>>
    >>>3. Obviously you tried doing it by hand. If you have fencing enabled,


    >>>and the diskgroup is not shared (not imported on both the nodes at the


    >>>same time), then keys will be put onto them as soon as they are
    >>>imported. If you want to remove keys , use the "-o clearreserve" option

    >>
    >>
    >>>with the vxdg import command (just read the man page - pretty sure it

    is
    >>
    >>
    >>>there). If that will not work, you can look at the keys on the disk(s)


    >>>using
    >>>
    >>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>
    >>>and then kick of the key from the disk with
    >>>
    >>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>
    >>>
    >>>This will place a key called "TEMP" on the disk, but will kick the other

    >>
    >>
    >>>keys off (more importantly). Then you can delete the key with the command
    >>>
    >>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>
    >>>
    >>>
    >>>If you want further help - will need to post smoe info here, but that


    >>>should mostly cover it.
    >>>
    >>>
    >>>go wrote:
    >>>
    >>>>hi all,
    >>>> after vcs 4.1 installed and configured as two node cluster(implemented

    >>
    >> VXIO
    >>
    >>>>FENCING), i disconnected two HB cables. the primary node got the system

    >>
    >> panic
    >>
    >>>>and reboot, all service groups running on primay node failed over to

    >>
    >> standby
    >>
    >>>>node, but vcs failed to import disk group on standby node. i still

    tried
    >>>>to import DG on standby node by "vxdg -fC import DG" command, fail again.
    >>>>all disk group can be imported on primary node when the primary node

    booted
    >>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>
    >>>>

    >>
    >>



  6. Re: fail to import disk group

    OK - imported on node1

    controlmdg02
    controlmdg03

    imported on node2
    controlmdg04
    controlmdg01
    controlmdg05




    diskgroups not imported

    vxfencoorddg





    ------- from the above, all is fine
    the vxfencoorddg, I presume, is your coordinator diskgroup, which should
    never be imported.

    All the rest is imported somewhere.

    All the nodes see the same disks (can verify by doing a
    /etc/vx/diag/d/vxdmpinq to see the serial numbers, but it's OK)



    OK, so obviously the problem was fixed. Sorry, thought that the problem
    was still there and needed fixing now.


    Going to be difficult to determine what went wrong, but the best place
    to start looking then is to look at the VCS engine log
    (/var/VRTSvcs/log/engine_A.log). If you remember around which time
    and/or which diskgroup, search for that, and look for the online of the
    resource, and where it failed.

    Then, cut and paste the lines from the engine_A.log here so we can see.

    You will most likely see something explaining it all there.


    GO wrote:
    > output here. thanks for ur help
    >
    > pls ignore the resource status, some test were performed during command output.
    >
    > vxdisk -alldgs list
    >
    > Node1:
    > =========
    > DEVICE TYPE DISK GROUP STATUS
    > EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    > EMC0_1 auto:cdsdisk - (controlmdg01) online
    > EMC0_2 auto:cdsdisk - (controlmdg01) online
    > EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    > EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    > EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    > EMC0_6 auto:cdsdisk - (controlmdg04) online
    > EMC0_7 auto:cdsdisk - (controlmdg05) online
    > EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    > EMC0_9 auto:cdsdisk - (controlmdg04) online
    > EMC0_10 auto:cdsdisk - (controlmdg05) online
    > EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    > EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    > c0t0d0s2 auto:sliced rootdisk rootdg online
    > c0t1d0s2 auto:sliced rootmirror rootdg online
    > c0t4d0s2 auto:none - - online invalid
    > c0t5d0s2 auto:none - - online invalid
    >
    > Node 2:
    > ==============
    > DEVICE TYPE DISK GROUP STATUS
    > EMC0_0 auto:cdsdisk - (controlmdg02) online
    > EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    > EMC0_2 auto:cdsdisk - (controlmdg03) online
    > EMC0_3 auto:cdsdisk - (controlmdg02) online
    > EMC0_4 auto:cdsdisk - (controlmdg03) online
    > EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    > EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    > EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    > EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    > EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    > EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    > EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    > EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    > c0t0d0s2 auto:sliced rootdisk rootdg online
    > c0t1d0s2 auto:sliced rootmirror rootdg online
    >
    > hastatus -sum
    > ============
    >
    > -- SYSTEM STATE
    > -- System State Frozen
    >
    > A HKCTMP01 RUNNING 0
    > A HKCTMP02 RUNNING 0
    >
    > -- GROUP STATE
    > -- Group System Probed AutoDisabled State
    >
    >
    > B CM1 HKCTMP01 Y N PARTIAL
    >
    > B CM1 HKCTMP02 Y N OFFLINE
    >
    > B CM2 HKCTMP01 Y N PARTIAL
    >
    > B CM2 HKCTMP02 Y N OFFLINE
    >
    > B CM3 HKCTMP01 Y N OFFLINE
    >
    > B CM3 HKCTMP02 Y N PARTIAL
    >
    > B CM4 HKCTMP01 Y N OFFLINE
    >
    > B CM4 HKCTMP02 Y N PARTIAL
    >
    > B EM HKCTMP01 Y N OFFLINE|FAULTED
    > B EM HKCTMP02 Y N PARTIAL
    >
    >
    > -- RESOURCES FAILED
    > -- Group Type Resource System
    >
    >
    > C CM1 Application APPLICATION_AGCM01 HKCTMP01
    >
    > C CM2 Application APPLICATION_AGCM02 HKCTMP01
    >
    > C CM3 Application APPLICATION_AGCM03 HKCTMP02
    >
    > C CM4 Application APPLICATION_AGCM04 HKCTMP02
    >
    > C EM Application APPLICATION_AGEM01 HKCTMP02
    >
    > C EM Application APPLICATION_EM HKCTMP01
    >
    >
    > Thanks
    > Go
    >
    > Me wrote:
    >
    >>Fencing will prevent data loss by preventing access to a disk.
    >>
    >>In this sense, when a node does panic, the keys will be left there and
    >>other nodes in the cluster will be able to import (with a clearreserve)

    >
    >
    >>the diskgroup.
    >>
    >>
    >>OK, so what is needed to continue ?
    >>
    >>
    >>1. vxdisk -o alldgs list from both the nodes.
    >>2. hastatus -sum from any disk
    >>
    >>
    >>That is a start and then after seeing which machine sees what, we will
    >>be able to continue
    >>
    >>
    >>go wrote:
    >>
    >>>thanks for your help reply first.
    >>> here's my reply
    >>>
    >>>
    >>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>>machines and compare or post here).
    >>>
    >>>
    >>>YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT

    >
    > THE
    >
    >>>SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE

    >
    > RESOURCE
    >
    >>>GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>>
    >>>
    >>>3. Obviously you tried doing it by hand. If you have fencing enabled,

    >
    >
    >>>>and the diskgroup is not shared (not imported on both the nodes at the

    >
    >
    >>>>same time), then keys will be put onto them as soon as they are
    >>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>
    >>>
    >>>>with the vxdg import command (just read the man page - pretty sure it

    >
    > is
    >
    >>>
    >>>>there). If that will not work, you can look at the keys on the disk(s)

    >
    >
    >>>>using
    >>>
    >>>
    >>>I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC

    >
    > AND
    >
    >>>REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD

    >
    > NOT
    >
    >>>BE TAKEN BY MYSELF/HAND
    >>>
    >>>WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>>
    >>>BEST THANKS
    >>>GO
    >>>Me wrote:
    >>>
    >>>
    >>>>Depends on a lot of things:
    >>>>
    >>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>>machines and compare or post here).
    >>>>
    >>>>2. The Service Group that includes the DiskGroup Resource, should be
    >>>>able to run on both the nodes. Look at the SystemList for the Service

    >
    >
    >>>>Group. Will always be best to use VCS to do this (it uses the correct

    >
    >
    >>>>options)
    >>>>
    >>>>3. Obviously you tried doing it by hand. If you have fencing enabled,

    >
    >
    >>>>and the diskgroup is not shared (not imported on both the nodes at the

    >
    >
    >>>>same time), then keys will be put onto them as soon as they are
    >>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>
    >>>
    >>>>with the vxdg import command (just read the man page - pretty sure it

    >
    > is
    >
    >>>
    >>>>there). If that will not work, you can look at the keys on the disk(s)

    >
    >
    >>>>using
    >>>>
    >>>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>>
    >>>>and then kick of the key from the disk with
    >>>>
    >>>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>>
    >>>>
    >>>>This will place a key called "TEMP" on the disk, but will kick the other
    >>>
    >>>
    >>>>keys off (more importantly). Then you can delete the key with the command
    >>>>
    >>>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>>
    >>>>
    >>>>
    >>>>If you want further help - will need to post smoe info here, but that

    >
    >
    >>>>should mostly cover it.
    >>>>
    >>>>
    >>>>go wrote:
    >>>>
    >>>>
    >>>>>hi all,
    >>>>>after vcs 4.1 installed and configured as two node cluster(implemented
    >>>
    >>>VXIO
    >>>
    >>>
    >>>>>FENCING), i disconnected two HB cables. the primary node got the system
    >>>
    >>>panic
    >>>
    >>>
    >>>>>and reboot, all service groups running on primay node failed over to
    >>>
    >>>standby
    >>>
    >>>
    >>>>>node, but vcs failed to import disk group on standby node. i still

    >
    > tried
    >
    >>>>>to import DG on standby node by "vxdg -fC import DG" command, fail again.
    >>>>>all disk group can be imported on primary node when the primary node

    >
    > booted
    >
    >>>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>>
    >>>>>
    >>>
    >>>

    >


  7. Re: fail to import disk group



    here's the error message in engine_A.log
    =====

    2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM1_DGnline:vxdg
    import (clear flag) failed.
    2007/10/29 10:39:56 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    ERROR: vxdg import (force) failed on Disk Group controlmdg02
    2007/10/29 10:40:00 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    ERROR: vxdg import failed on Disk Group controlmdg02 after vxdctl enable
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    disk found containing disk group
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    disk found containing disk group
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    disk found containing disk group
    2007/10/29 10:40:02 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    import failed. Trying again with clear flag option
    2007/10/29 10:40:03 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    import (clear flag) failed.
    2007/10/29 10:40:04 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    ERROR: vxdg import (force) failed on Disk Group controlmdg03
    2007/10/29 10:40:08 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    ERROR: vxdg import failed on Disk Group controlmdg03 after vxdctl enable
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    disk found containing disk group
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    disk found containing disk group
    VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    disk found containing disk group
    2007/10/29 10:40:11 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    import failed. Trying again with clear flag option
    2007/10/29 10:40:11 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    import (clear flag) failed.
    2007/10/29 10:40:12 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMEM_DGnline:**
    ERROR: vxdg import (force) failed on

    ==============
    Thanks

    Me wrote:
    >OK - imported on node1
    >
    >controlmdg02
    >controlmdg03
    >
    >imported on node2
    >controlmdg04
    >controlmdg01
    >controlmdg05
    >
    >
    >
    >
    >diskgroups not imported
    >
    >vxfencoorddg
    >
    >
    >
    >
    >
    >------- from the above, all is fine
    >the vxfencoorddg, I presume, is your coordinator diskgroup, which should


    >never be imported.
    >
    >All the rest is imported somewhere.
    >
    >All the nodes see the same disks (can verify by doing a
    >/etc/vx/diag/d/vxdmpinq to see the serial numbers, but it's OK)
    >
    >
    >
    >OK, so obviously the problem was fixed. Sorry, thought that the problem


    >was still there and needed fixing now.
    >
    >
    >Going to be difficult to determine what went wrong, but the best place
    >to start looking then is to look at the VCS engine log
    >(/var/VRTSvcs/log/engine_A.log). If you remember around which time
    >and/or which diskgroup, search for that, and look for the online of the


    >resource, and where it failed.
    >
    >Then, cut and paste the lines from the engine_A.log here so we can see.
    >
    >You will most likely see something explaining it all there.
    >
    >
    >GO wrote:
    >> output here. thanks for ur help
    >>
    >> pls ignore the resource status, some test were performed during command

    output.
    >>
    >> vxdisk -alldgs list
    >>
    >> Node1:
    >> =========
    >> DEVICE TYPE DISK GROUP STATUS
    >> EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    >> EMC0_1 auto:cdsdisk - (controlmdg01) online
    >> EMC0_2 auto:cdsdisk - (controlmdg01) online
    >> EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    >> EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    >> EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    >> EMC0_6 auto:cdsdisk - (controlmdg04) online
    >> EMC0_7 auto:cdsdisk - (controlmdg05) online
    >> EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    >> EMC0_9 auto:cdsdisk - (controlmdg04) online
    >> EMC0_10 auto:cdsdisk - (controlmdg05) online
    >> EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    >> EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >> c0t0d0s2 auto:sliced rootdisk rootdg online
    >> c0t1d0s2 auto:sliced rootmirror rootdg online
    >> c0t4d0s2 auto:none - - online invalid
    >> c0t5d0s2 auto:none - - online invalid
    >>
    >> Node 2:
    >> ==============
    >> DEVICE TYPE DISK GROUP STATUS
    >> EMC0_0 auto:cdsdisk - (controlmdg02) online
    >> EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    >> EMC0_2 auto:cdsdisk - (controlmdg03) online
    >> EMC0_3 auto:cdsdisk - (controlmdg02) online
    >> EMC0_4 auto:cdsdisk - (controlmdg03) online
    >> EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    >> EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    >> EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    >> EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    >> EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    >> EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    >> EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    >> EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >> c0t0d0s2 auto:sliced rootdisk rootdg online
    >> c0t1d0s2 auto:sliced rootmirror rootdg online
    >>
    >> hastatus -sum
    >> ============
    >>
    >> -- SYSTEM STATE
    >> -- System State Frozen
    >>
    >> A HKCTMP01 RUNNING 0
    >> A HKCTMP02 RUNNING 0
    >>
    >> -- GROUP STATE
    >> -- Group System Probed AutoDisabled State
    >>
    >>
    >> B CM1 HKCTMP01 Y N PARTIAL
    >>
    >> B CM1 HKCTMP02 Y N OFFLINE
    >>
    >> B CM2 HKCTMP01 Y N PARTIAL
    >>
    >> B CM2 HKCTMP02 Y N OFFLINE
    >>
    >> B CM3 HKCTMP01 Y N OFFLINE
    >>
    >> B CM3 HKCTMP02 Y N PARTIAL
    >>
    >> B CM4 HKCTMP01 Y N OFFLINE
    >>
    >> B CM4 HKCTMP02 Y N PARTIAL
    >>
    >> B EM HKCTMP01 Y N OFFLINE|FAULTED
    >> B EM HKCTMP02 Y N PARTIAL
    >>
    >>
    >> -- RESOURCES FAILED
    >> -- Group Type Resource System


    >>
    >>
    >> C CM1 Application APPLICATION_AGCM01 HKCTMP01


    >>
    >> C CM2 Application APPLICATION_AGCM02 HKCTMP01


    >>
    >> C CM3 Application APPLICATION_AGCM03 HKCTMP02


    >>
    >> C CM4 Application APPLICATION_AGCM04 HKCTMP02


    >>
    >> C EM Application APPLICATION_AGEM01 HKCTMP02


    >>
    >> C EM Application APPLICATION_EM HKCTMP01


    >>
    >>
    >> Thanks
    >> Go
    >>
    >> Me wrote:
    >>
    >>>Fencing will prevent data loss by preventing access to a disk.
    >>>
    >>>In this sense, when a node does panic, the keys will be left there and


    >>>other nodes in the cluster will be able to import (with a clearreserve)

    >>
    >>
    >>>the diskgroup.
    >>>
    >>>
    >>>OK, so what is needed to continue ?
    >>>
    >>>
    >>>1. vxdisk -o alldgs list from both the nodes.
    >>>2. hastatus -sum from any disk
    >>>
    >>>
    >>>That is a start and then after seeing which machine sees what, we will


    >>>be able to continue
    >>>
    >>>
    >>>go wrote:
    >>>
    >>>>thanks for your help reply first.
    >>>> here's my reply
    >>>>
    >>>>
    >>>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both


    >>>>>machines and compare or post here).
    >>>>
    >>>>
    >>>>YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT

    >>
    >> THE
    >>
    >>>>SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE

    >>
    >> RESOURCE
    >>
    >>>>GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>>>
    >>>>
    >>>>3. Obviously you tried doing it by hand. If you have fencing enabled,

    >>
    >>
    >>>>>and the diskgroup is not shared (not imported on both the nodes at the

    >>
    >>
    >>>>>same time), then keys will be put onto them as soon as they are
    >>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>
    >>>>
    >>>>>with the vxdg import command (just read the man page - pretty sure it

    >>
    >> is
    >>
    >>>>
    >>>>>there). If that will not work, you can look at the keys on the disk(s)

    >>
    >>
    >>>>>using
    >>>>
    >>>>
    >>>>I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC

    >>
    >> AND
    >>
    >>>>REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD

    >>
    >> NOT
    >>
    >>>>BE TAKEN BY MYSELF/HAND
    >>>>
    >>>>WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>>>
    >>>>BEST THANKS
    >>>>GO
    >>>>Me wrote:
    >>>>
    >>>>
    >>>>>Depends on a lot of things:
    >>>>>
    >>>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both


    >>>>>machines and compare or post here).
    >>>>>
    >>>>>2. The Service Group that includes the DiskGroup Resource, should be


    >>>>>able to run on both the nodes. Look at the SystemList for the Service

    >>
    >>
    >>>>>Group. Will always be best to use VCS to do this (it uses the correct

    >>
    >>
    >>>>>options)
    >>>>>
    >>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,

    >>
    >>
    >>>>>and the diskgroup is not shared (not imported on both the nodes at the

    >>
    >>
    >>>>>same time), then keys will be put onto them as soon as they are
    >>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>
    >>>>
    >>>>>with the vxdg import command (just read the man page - pretty sure it

    >>
    >> is
    >>
    >>>>
    >>>>>there). If that will not work, you can look at the keys on the disk(s)

    >>
    >>
    >>>>>using
    >>>>>
    >>>>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>>>
    >>>>>and then kick of the key from the disk with
    >>>>>
    >>>>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>
    >>>>>
    >>>>>This will place a key called "TEMP" on the disk, but will kick the other
    >>>>
    >>>>
    >>>>>keys off (more importantly). Then you can delete the key with the command
    >>>>>
    >>>>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>
    >>>>>
    >>>>>
    >>>>>If you want further help - will need to post smoe info here, but that

    >>
    >>
    >>>>>should mostly cover it.
    >>>>>
    >>>>>
    >>>>>go wrote:
    >>>>>
    >>>>>
    >>>>>>hi all,
    >>>>>>after vcs 4.1 installed and configured as two node cluster(implemented
    >>>>
    >>>>VXIO
    >>>>
    >>>>
    >>>>>>FENCING), i disconnected two HB cables. the primary node got the system
    >>>>
    >>>>panic
    >>>>
    >>>>
    >>>>>>and reboot, all service groups running on primay node failed over

    to
    >>>>
    >>>>standby
    >>>>
    >>>>
    >>>>>>node, but vcs failed to import disk group on standby node. i still

    >>
    >> tried
    >>
    >>>>>>to import DG on standby node by "vxdg -fC import DG" command, fail

    again.
    >>>>>>all disk group can be imported on primary node when the primary node

    >>
    >> booted
    >>
    >>>>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>>>
    >>>>>>
    >>>>
    >>>>

    >>



  8. Re: fail to import disk group

    Strange that the clear flag failed !!!



    2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02)
    DiskGroup:CTMCM1_DGnline:vxdg import (clear flag) failed.




    VCS does the import with the -o clearreserve flag


    OK, let me explain what it does.

    It will actually kick off the keys that are on there already, and then
    place new keys (it's own) on there.


    The kicking off (pre-emptive abort) is done by calling the command

    vxfenadm -a


    This is not a documented feature, but will kick off any key, even if the
    key does not belong to the machine doing the kicking.


    This (putting on a key and letting the other node kick it off) is part
    of the vxfentsthdw command that should be run on all arrays on all disks
    that you plan to use.

    As this is an EMC, I'm pretty sure it will be supported hardware (only
    the old FC series Clariions are no longer supported).


    The only other thing that you can perhaps look at, is the paths to the
    disks.

    The fencing commands work on power devices (4.1 and later), and thus in
    POwerPath we trust to do the right thing. Always make sure that
    PowerPath later than 5.0 is used (if it is used at all)


    BUT, for a clear reason. sorry, nothing I can say here.

    All I can suggest is to perhaps get another EMC disk, and test
    (vxfentsthdw) or test by hand (putting on a key and then on the other
    node trying to kick it off)




    go wrote:
    > here's the error message in engine_A.log
    > =====
    >
    > 2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM1_DGnline:vxdg
    > import (clear flag) failed.
    > 2007/10/29 10:39:56 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    > ERROR: vxdg import (force) failed on Disk Group controlmdg02
    > 2007/10/29 10:40:00 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    > ERROR: vxdg import failed on Disk Group controlmdg02 after vxdctl enable
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    > disk found containing disk group
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    > disk found containing disk group
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    > disk found containing disk group
    > 2007/10/29 10:40:02 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    > import failed. Trying again with clear flag option
    > 2007/10/29 10:40:03 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    > import (clear flag) failed.
    > 2007/10/29 10:40:04 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    > ERROR: vxdg import (force) failed on Disk Group controlmdg03
    > 2007/10/29 10:40:08 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    > ERROR: vxdg import failed on Disk Group controlmdg03 after vxdctl enable
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    > disk found containing disk group
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    > disk found containing disk group
    > VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    > disk found containing disk group
    > 2007/10/29 10:40:11 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    > import failed. Trying again with clear flag option
    > 2007/10/29 10:40:11 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    > import (clear flag) failed.
    > 2007/10/29 10:40:12 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMEM_DGnline:**
    > ERROR: vxdg import (force) failed on
    >
    > ==============
    > Thanks
    >
    > Me wrote:
    >
    >>OK - imported on node1
    >>
    >>controlmdg02
    >>controlmdg03
    >>
    >>imported on node2
    >>controlmdg04
    >>controlmdg01
    >>controlmdg05
    >>
    >>
    >>
    >>
    >>diskgroups not imported
    >>
    >>vxfencoorddg
    >>
    >>
    >>
    >>
    >>
    >>------- from the above, all is fine
    >>the vxfencoorddg, I presume, is your coordinator diskgroup, which should

    >
    >
    >>never be imported.
    >>
    >>All the rest is imported somewhere.
    >>
    >>All the nodes see the same disks (can verify by doing a
    >>/etc/vx/diag/d/vxdmpinq to see the serial numbers, but it's OK)
    >>
    >>
    >>
    >>OK, so obviously the problem was fixed. Sorry, thought that the problem

    >
    >
    >>was still there and needed fixing now.
    >>
    >>
    >>Going to be difficult to determine what went wrong, but the best place
    >>to start looking then is to look at the VCS engine log
    >>(/var/VRTSvcs/log/engine_A.log). If you remember around which time
    >>and/or which diskgroup, search for that, and look for the online of the

    >
    >
    >>resource, and where it failed.
    >>
    >>Then, cut and paste the lines from the engine_A.log here so we can see.
    >>
    >>You will most likely see something explaining it all there.
    >>
    >>
    >>GO wrote:
    >>
    >>>output here. thanks for ur help
    >>>
    >>>pls ignore the resource status, some test were performed during command

    >
    > output.
    >
    >>>vxdisk -alldgs list
    >>>
    >>>Node1:
    >>>=========
    >>>DEVICE TYPE DISK GROUP STATUS
    >>>EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    >>>EMC0_1 auto:cdsdisk - (controlmdg01) online
    >>>EMC0_2 auto:cdsdisk - (controlmdg01) online
    >>>EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    >>>EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    >>>EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    >>>EMC0_6 auto:cdsdisk - (controlmdg04) online
    >>>EMC0_7 auto:cdsdisk - (controlmdg05) online
    >>>EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    >>>EMC0_9 auto:cdsdisk - (controlmdg04) online
    >>>EMC0_10 auto:cdsdisk - (controlmdg05) online
    >>>EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    >>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>c0t4d0s2 auto:none - - online invalid
    >>>c0t5d0s2 auto:none - - online invalid
    >>>
    >>>Node 2:
    >>>==============
    >>>DEVICE TYPE DISK GROUP STATUS
    >>>EMC0_0 auto:cdsdisk - (controlmdg02) online
    >>>EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    >>>EMC0_2 auto:cdsdisk - (controlmdg03) online
    >>>EMC0_3 auto:cdsdisk - (controlmdg02) online
    >>>EMC0_4 auto:cdsdisk - (controlmdg03) online
    >>>EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    >>>EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    >>>EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    >>>EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    >>>EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    >>>EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    >>>EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    >>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>
    >>>hastatus -sum
    >>>============
    >>>
    >>>-- SYSTEM STATE
    >>>-- System State Frozen
    >>>
    >>>A HKCTMP01 RUNNING 0
    >>>A HKCTMP02 RUNNING 0
    >>>
    >>>-- GROUP STATE
    >>>-- Group System Probed AutoDisabled State
    >>>
    >>>
    >>>B CM1 HKCTMP01 Y N PARTIAL
    >>>
    >>>B CM1 HKCTMP02 Y N OFFLINE
    >>>
    >>>B CM2 HKCTMP01 Y N PARTIAL
    >>>
    >>>B CM2 HKCTMP02 Y N OFFLINE
    >>>
    >>>B CM3 HKCTMP01 Y N OFFLINE
    >>>
    >>>B CM3 HKCTMP02 Y N PARTIAL
    >>>
    >>>B CM4 HKCTMP01 Y N OFFLINE
    >>>
    >>>B CM4 HKCTMP02 Y N PARTIAL
    >>>
    >>>B EM HKCTMP01 Y N OFFLINE|FAULTED
    >>>B EM HKCTMP02 Y N PARTIAL
    >>>
    >>>
    >>>-- RESOURCES FAILED
    >>>-- Group Type Resource System

    >
    >
    >
    >>>
    >>>
    >>>C CM1 Application APPLICATION_AGCM01 HKCTMP01

    >
    >
    >
    >>>
    >>>C CM2 Application APPLICATION_AGCM02 HKCTMP01

    >
    >
    >
    >>>
    >>>C CM3 Application APPLICATION_AGCM03 HKCTMP02

    >
    >
    >
    >>>
    >>>C CM4 Application APPLICATION_AGCM04 HKCTMP02

    >
    >
    >
    >>>
    >>>C EM Application APPLICATION_AGEM01 HKCTMP02

    >
    >
    >
    >>>
    >>>C EM Application APPLICATION_EM HKCTMP01

    >
    >
    >
    >>>
    >>>Thanks
    >>>Go
    >>>
    >>>Me wrote:
    >>>
    >>>
    >>>>Fencing will prevent data loss by preventing access to a disk.
    >>>>
    >>>>In this sense, when a node does panic, the keys will be left there and

    >
    >
    >>>>other nodes in the cluster will be able to import (with a clearreserve)
    >>>
    >>>
    >>>>the diskgroup.
    >>>>
    >>>>
    >>>>OK, so what is needed to continue ?
    >>>>
    >>>>
    >>>>1. vxdisk -o alldgs list from both the nodes.
    >>>>2. hastatus -sum from any disk
    >>>>
    >>>>
    >>>>That is a start and then after seeing which machine sees what, we will

    >
    >
    >>>>be able to continue
    >>>>
    >>>>
    >>>>go wrote:
    >>>>
    >>>>
    >>>>>thanks for your help reply first.
    >>>>>here's my reply
    >>>>>
    >>>>>
    >>>>>
    >>>>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both

    >
    >
    >>>>>>machines and compare or post here).
    >>>>>
    >>>>>
    >>>>>YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY AT
    >>>
    >>>THE
    >>>
    >>>
    >>>>>SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER THE
    >>>
    >>>RESOURCE
    >>>
    >>>
    >>>>>GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>>>>
    >>>>>
    >>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>
    >>>
    >>>>>>and the diskgroup is not shared (not imported on both the nodes at the
    >>>
    >>>
    >>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>
    >>>>>
    >>>>>>with the vxdg import command (just read the man page - pretty sure it
    >>>
    >>>is
    >>>
    >>>
    >>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>
    >>>
    >>>>>>using
    >>>>>
    >>>>>
    >>>>>I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC
    >>>
    >>>AND
    >>>
    >>>
    >>>>>REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD
    >>>
    >>>NOT
    >>>
    >>>
    >>>>>BE TAKEN BY MYSELF/HAND
    >>>>>
    >>>>>WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>>>>
    >>>>>BEST THANKS
    >>>>>GO
    >>>>>Me wrote:
    >>>>>
    >>>>>
    >>>>>
    >>>>>>Depends on a lot of things:
    >>>>>>
    >>>>>>1. Can both nodes see all the disks ? including the disks from the
    >>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both

    >
    >
    >>>>>>machines and compare or post here).
    >>>>>>
    >>>>>>2. The Service Group that includes the DiskGroup Resource, should be

    >
    >
    >>>>>>able to run on both the nodes. Look at the SystemList for the Service
    >>>
    >>>
    >>>>>>Group. Will always be best to use VCS to do this (it uses the correct
    >>>
    >>>
    >>>>>>options)
    >>>>>>
    >>>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>
    >>>
    >>>>>>and the diskgroup is not shared (not imported on both the nodes at the
    >>>
    >>>
    >>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>
    >>>>>
    >>>>>>with the vxdg import command (just read the man page - pretty sure it
    >>>
    >>>is
    >>>
    >>>
    >>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>
    >>>
    >>>>>>using
    >>>>>>
    >>>>>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>>>>
    >>>>>>and then kick of the key from the disk with
    >>>>>>
    >>>>>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>
    >>>>>>
    >>>>>>This will place a key called "TEMP" on the disk, but will kick the other
    >>>>>
    >>>>>
    >>>>>>keys off (more importantly). Then you can delete the key with the command
    >>>>>>
    >>>>>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>If you want further help - will need to post smoe info here, but that
    >>>
    >>>
    >>>>>>should mostly cover it.
    >>>>>>
    >>>>>>
    >>>>>>go wrote:
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>hi all,
    >>>>>>>after vcs 4.1 installed and configured as two node cluster(implemented
    >>>>>
    >>>>>VXIO
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>FENCING), i disconnected two HB cables. the primary node got the system
    >>>>>
    >>>>>panic
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>and reboot, all service groups running on primay node failed over

    >
    > to
    >
    >>>>>standby
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>node, but vcs failed to import disk group on standby node. i still
    >>>
    >>>tried
    >>>
    >>>
    >>>>>>>to import DG on standby node by "vxdg -fC import DG" command, fail

    >
    > again.
    >
    >>>>>>>all disk group can be imported on primary node when the primary node
    >>>
    >>>booted
    >>>
    >>>
    >>>>>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>>>>
    >>>>>>>
    >>>>>
    >>>>>

    >


  9. Re: fail to import disk group


    Hi Me,
    i've done the test (vxfentsthdw) for all data disks and all passed. any
    idea ?
    thanks much

    Go
    Me wrote:
    >Strange that the clear flag failed !!!
    >
    >
    >
    >2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02)
    >DiskGroup:CTMCM1_DGnline:vxdg import (clear flag) failed.
    >
    >
    >
    >
    >VCS does the import with the -o clearreserve flag
    >
    >
    >OK, let me explain what it does.
    >
    >It will actually kick off the keys that are on there already, and then
    >place new keys (it's own) on there.
    >
    >
    >The kicking off (pre-emptive abort) is done by calling the command
    >
    >vxfenadm -a
    >
    >
    >This is not a documented feature, but will kick off any key, even if the


    >key does not belong to the machine doing the kicking.
    >
    >
    >This (putting on a key and letting the other node kick it off) is part
    >of the vxfentsthdw command that should be run on all arrays on all disks


    >that you plan to use.
    >
    >As this is an EMC, I'm pretty sure it will be supported hardware (only
    >the old FC series Clariions are no longer supported).
    >
    >
    >The only other thing that you can perhaps look at, is the paths to the
    >disks.
    >
    >The fencing commands work on power devices (4.1 and later), and thus in


    >POwerPath we trust to do the right thing. Always make sure that
    >PowerPath later than 5.0 is used (if it is used at all)
    >
    >
    >BUT, for a clear reason. sorry, nothing I can say here.
    >
    >All I can suggest is to perhaps get another EMC disk, and test
    >(vxfentsthdw) or test by hand (putting on a key and then on the other
    >node trying to kick it off)
    >
    >
    >
    >
    >go wrote:
    >> here's the error message in engine_A.log
    >> =====
    >>
    >> 2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM1_DGnline:vxdg
    >> import (clear flag) failed.
    >> 2007/10/29 10:39:56 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    >> ERROR: vxdg import (force) failed on Disk Group controlmdg02
    >> 2007/10/29 10:40:00 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    >> ERROR: vxdg import failed on Disk Group controlmdg02 after vxdctl enable
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >> disk found containing disk group
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >> disk found containing disk group
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >> disk found containing disk group
    >> 2007/10/29 10:40:02 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    >> import failed. Trying again with clear flag option
    >> 2007/10/29 10:40:03 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    >> import (clear flag) failed.
    >> 2007/10/29 10:40:04 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    >> ERROR: vxdg import (force) failed on Disk Group controlmdg03
    >> 2007/10/29 10:40:08 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    >> ERROR: vxdg import failed on Disk Group controlmdg03 after vxdctl enable
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >> disk found containing disk group
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >> disk found containing disk group
    >> VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >> disk found containing disk group
    >> 2007/10/29 10:40:11 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    >> import failed. Trying again with clear flag option
    >> 2007/10/29 10:40:11 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    >> import (clear flag) failed.
    >> 2007/10/29 10:40:12 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMEM_DGnline:**
    >> ERROR: vxdg import (force) failed on
    >>
    >> ==============
    >> Thanks
    >>
    >> Me wrote:
    >>
    >>>OK - imported on node1
    >>>
    >>>controlmdg02
    >>>controlmdg03
    >>>
    >>>imported on node2
    >>>controlmdg04
    >>>controlmdg01
    >>>controlmdg05
    >>>
    >>>
    >>>
    >>>
    >>>diskgroups not imported
    >>>
    >>>vxfencoorddg
    >>>
    >>>
    >>>
    >>>
    >>>
    >>>------- from the above, all is fine
    >>>the vxfencoorddg, I presume, is your coordinator diskgroup, which should

    >>
    >>
    >>>never be imported.
    >>>
    >>>All the rest is imported somewhere.
    >>>
    >>>All the nodes see the same disks (can verify by doing a
    >>>/etc/vx/diag/d/vxdmpinq to see the serial numbers, but it's OK)
    >>>
    >>>
    >>>
    >>>OK, so obviously the problem was fixed. Sorry, thought that the problem

    >>
    >>
    >>>was still there and needed fixing now.
    >>>
    >>>
    >>>Going to be difficult to determine what went wrong, but the best place


    >>>to start looking then is to look at the VCS engine log
    >>>(/var/VRTSvcs/log/engine_A.log). If you remember around which time
    >>>and/or which diskgroup, search for that, and look for the online of the

    >>
    >>
    >>>resource, and where it failed.
    >>>
    >>>Then, cut and paste the lines from the engine_A.log here so we can see.
    >>>
    >>>You will most likely see something explaining it all there.
    >>>
    >>>
    >>>GO wrote:
    >>>
    >>>>output here. thanks for ur help
    >>>>
    >>>>pls ignore the resource status, some test were performed during command

    >>
    >> output.
    >>
    >>>>vxdisk -alldgs list
    >>>>
    >>>>Node1:
    >>>>=========
    >>>>DEVICE TYPE DISK GROUP STATUS
    >>>>EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    >>>>EMC0_1 auto:cdsdisk - (controlmdg01) online
    >>>>EMC0_2 auto:cdsdisk - (controlmdg01) online
    >>>>EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    >>>>EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    >>>>EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    >>>>EMC0_6 auto:cdsdisk - (controlmdg04) online
    >>>>EMC0_7 auto:cdsdisk - (controlmdg05) online
    >>>>EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    >>>>EMC0_9 auto:cdsdisk - (controlmdg04) online
    >>>>EMC0_10 auto:cdsdisk - (controlmdg05) online
    >>>>EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    >>>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>>c0t4d0s2 auto:none - - online invalid
    >>>>c0t5d0s2 auto:none - - online invalid
    >>>>
    >>>>Node 2:
    >>>>==============
    >>>>DEVICE TYPE DISK GROUP STATUS
    >>>>EMC0_0 auto:cdsdisk - (controlmdg02) online
    >>>>EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    >>>>EMC0_2 auto:cdsdisk - (controlmdg03) online
    >>>>EMC0_3 auto:cdsdisk - (controlmdg02) online
    >>>>EMC0_4 auto:cdsdisk - (controlmdg03) online
    >>>>EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    >>>>EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    >>>>EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    >>>>EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    >>>>EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    >>>>EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    >>>>EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    >>>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>>
    >>>>hastatus -sum
    >>>>============
    >>>>
    >>>>-- SYSTEM STATE
    >>>>-- System State Frozen
    >>>>
    >>>>A HKCTMP01 RUNNING 0
    >>>>A HKCTMP02 RUNNING 0
    >>>>
    >>>>-- GROUP STATE
    >>>>-- Group System Probed AutoDisabled State
    >>>>
    >>>>
    >>>>B CM1 HKCTMP01 Y N PARTIAL
    >>>>
    >>>>B CM1 HKCTMP02 Y N OFFLINE
    >>>>
    >>>>B CM2 HKCTMP01 Y N PARTIAL
    >>>>
    >>>>B CM2 HKCTMP02 Y N OFFLINE
    >>>>
    >>>>B CM3 HKCTMP01 Y N OFFLINE
    >>>>
    >>>>B CM3 HKCTMP02 Y N PARTIAL
    >>>>
    >>>>B CM4 HKCTMP01 Y N OFFLINE
    >>>>
    >>>>B CM4 HKCTMP02 Y N PARTIAL
    >>>>
    >>>>B EM HKCTMP01 Y N OFFLINE|FAULTED
    >>>>B EM HKCTMP02 Y N PARTIAL
    >>>>
    >>>>
    >>>>-- RESOURCES FAILED
    >>>>-- Group Type Resource System


    >>
    >>
    >>
    >>>>
    >>>>
    >>>>C CM1 Application APPLICATION_AGCM01 HKCTMP01

    >>
    >>
    >>
    >>>>
    >>>>C CM2 Application APPLICATION_AGCM02 HKCTMP01

    >>
    >>
    >>
    >>>>
    >>>>C CM3 Application APPLICATION_AGCM03 HKCTMP02

    >>
    >>
    >>
    >>>>
    >>>>C CM4 Application APPLICATION_AGCM04 HKCTMP02

    >>
    >>
    >>
    >>>>
    >>>>C EM Application APPLICATION_AGEM01 HKCTMP02

    >>
    >>
    >>
    >>>>
    >>>>C EM Application APPLICATION_EM HKCTMP01

    >>
    >>
    >>
    >>>>
    >>>>Thanks
    >>>>Go
    >>>>
    >>>>Me wrote:
    >>>>
    >>>>
    >>>>>Fencing will prevent data loss by preventing access to a disk.
    >>>>>
    >>>>>In this sense, when a node does panic, the keys will be left there and

    >>
    >>
    >>>>>other nodes in the cluster will be able to import (with a clearreserve)
    >>>>
    >>>>
    >>>>>the diskgroup.
    >>>>>
    >>>>>
    >>>>>OK, so what is needed to continue ?
    >>>>>
    >>>>>
    >>>>>1. vxdisk -o alldgs list from both the nodes.
    >>>>>2. hastatus -sum from any disk
    >>>>>
    >>>>>
    >>>>>That is a start and then after seeing which machine sees what, we will

    >>
    >>
    >>>>>be able to continue
    >>>>>
    >>>>>
    >>>>>go wrote:
    >>>>>
    >>>>>
    >>>>>>thanks for your help reply first.
    >>>>>>here's my reply
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>1. Can both nodes see all the disks ? including the disks from the


    >>>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both

    >>
    >>
    >>>>>>>machines and compare or post here).
    >>>>>>
    >>>>>>
    >>>>>>YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY

    AT
    >>>>
    >>>>THE
    >>>>
    >>>>
    >>>>>>SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER

    THE
    >>>>
    >>>>RESOURCE
    >>>>
    >>>>
    >>>>>>GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>>>>>
    >>>>>>
    >>>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>>
    >>>>
    >>>>>>>and the diskgroup is not shared (not imported on both the nodes at

    the
    >>>>
    >>>>
    >>>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>>
    >>>>>>
    >>>>>>>with the vxdg import command (just read the man page - pretty sure

    it
    >>>>
    >>>>is
    >>>>
    >>>>
    >>>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>>
    >>>>
    >>>>>>>using
    >>>>>>
    >>>>>>
    >>>>>>I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC
    >>>>
    >>>>AND
    >>>>
    >>>>
    >>>>>>REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD
    >>>>
    >>>>NOT
    >>>>
    >>>>
    >>>>>>BE TAKEN BY MYSELF/HAND
    >>>>>>
    >>>>>>WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>>>>>
    >>>>>>BEST THANKS
    >>>>>>GO
    >>>>>>Me wrote:
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>Depends on a lot of things:
    >>>>>>>
    >>>>>>>1. Can both nodes see all the disks ? including the disks from the


    >>>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both

    >>
    >>
    >>>>>>>machines and compare or post here).
    >>>>>>>
    >>>>>>>2. The Service Group that includes the DiskGroup Resource, should

    be
    >>
    >>
    >>>>>>>able to run on both the nodes. Look at the SystemList for the Service
    >>>>
    >>>>
    >>>>>>>Group. Will always be best to use VCS to do this (it uses the correct
    >>>>
    >>>>
    >>>>>>>options)
    >>>>>>>
    >>>>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>>
    >>>>
    >>>>>>>and the diskgroup is not shared (not imported on both the nodes at

    the
    >>>>
    >>>>
    >>>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>>
    >>>>>>
    >>>>>>>with the vxdg import command (just read the man page - pretty sure

    it
    >>>>
    >>>>is
    >>>>
    >>>>
    >>>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>>
    >>>>
    >>>>>>>using
    >>>>>>>
    >>>>>>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>>>>>
    >>>>>>>and then kick of the key from the disk with
    >>>>>>>
    >>>>>>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>>
    >>>>>>>
    >>>>>>>This will place a key called "TEMP" on the disk, but will kick the

    other
    >>>>>>
    >>>>>>
    >>>>>>>keys off (more importantly). Then you can delete the key with the

    command
    >>>>>>>
    >>>>>>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>If you want further help - will need to post smoe info here, but that
    >>>>
    >>>>
    >>>>>>>should mostly cover it.
    >>>>>>>
    >>>>>>>
    >>>>>>>go wrote:
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>hi all,
    >>>>>>>>after vcs 4.1 installed and configured as two node cluster(implemented
    >>>>>>
    >>>>>>VXIO
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>>FENCING), i disconnected two HB cables. the primary node got the

    system
    >>>>>>
    >>>>>>panic
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>>and reboot, all service groups running on primay node failed over

    >>
    >> to
    >>
    >>>>>>standby
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>>node, but vcs failed to import disk group on standby node. i still
    >>>>
    >>>>tried
    >>>>
    >>>>
    >>>>>>>>to import DG on standby node by "vxdg -fC import DG" command, fail

    >>
    >> again.
    >>
    >>>>>>>>all disk group can be imported on primary node when the primary node
    >>>>
    >>>>booted
    >>>>
    >>>>
    >>>>>>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>>>>>
    >>>>>>>>
    >>>>>>
    >>>>>>

    >>



  10. Re: fail to import disk group

    Then I'm outta ideas. sorry

    go wrote:
    > Hi Me,
    > i've done the test (vxfentsthdw) for all data disks and all passed. any
    > idea ?
    > thanks much
    >
    > Go
    > Me wrote:
    >
    >>Strange that the clear flag failed !!!
    >>
    >>
    >>
    >>2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02)
    >>DiskGroup:CTMCM1_DGnline:vxdg import (clear flag) failed.
    >>
    >>
    >>
    >>
    >>VCS does the import with the -o clearreserve flag
    >>
    >>
    >>OK, let me explain what it does.
    >>
    >>It will actually kick off the keys that are on there already, and then
    >>place new keys (it's own) on there.
    >>
    >>
    >>The kicking off (pre-emptive abort) is done by calling the command
    >>
    >>vxfenadm -a
    >>
    >>
    >>This is not a documented feature, but will kick off any key, even if the

    >
    >
    >>key does not belong to the machine doing the kicking.
    >>
    >>
    >>This (putting on a key and letting the other node kick it off) is part
    >>of the vxfentsthdw command that should be run on all arrays on all disks

    >
    >
    >>that you plan to use.
    >>
    >>As this is an EMC, I'm pretty sure it will be supported hardware (only
    >>the old FC series Clariions are no longer supported).
    >>
    >>
    >>The only other thing that you can perhaps look at, is the paths to the
    >>disks.
    >>
    >>The fencing commands work on power devices (4.1 and later), and thus in

    >
    >
    >>POwerPath we trust to do the right thing. Always make sure that
    >>PowerPath later than 5.0 is used (if it is used at all)
    >>
    >>
    >>BUT, for a clear reason. sorry, nothing I can say here.
    >>
    >>All I can suggest is to perhaps get another EMC disk, and test
    >>(vxfentsthdw) or test by hand (putting on a key and then on the other
    >>node trying to kick it off)
    >>
    >>
    >>
    >>
    >>go wrote:
    >>
    >>>here's the error message in engine_A.log
    >>>=====
    >>>
    >>>2007/10/29 10:39:55 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM1_DGnline:vxdg
    >>>import (clear flag) failed.
    >>>2007/10/29 10:39:56 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    >>>ERROR: vxdg import (force) failed on Disk Group controlmdg02
    >>>2007/10/29 10:40:00 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM1_DGnline:**
    >>>ERROR: vxdg import failed on Disk Group controlmdg02 after vxdctl enable
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >>>disk found containing disk group
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >>>disk found containing disk group
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg02: import failed: No valid
    >>>disk found containing disk group
    >>>2007/10/29 10:40:02 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    >>>import failed. Trying again with clear flag option
    >>>2007/10/29 10:40:03 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMCM2_DGnline:vxdg
    >>>import (clear flag) failed.
    >>>2007/10/29 10:40:04 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    >>>ERROR: vxdg import (force) failed on Disk Group controlmdg03
    >>>2007/10/29 10:40:08 VCS ERROR V-16-10001-1004 (HKCTMP02) DiskGroup:CTMCM2_DGnline:**
    >>>ERROR: vxdg import failed on Disk Group controlmdg03 after vxdctl enable
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >>>disk found containing disk group
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >>>disk found containing disk group
    >>>VxVM vxdg ERROR V-5-1-587 Disk group controlmdg03: import failed: No valid
    >>>disk found containing disk group
    >>>2007/10/29 10:40:11 VCS WARNING V-16-10001-1001 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    >>>import failed. Trying again with clear flag option
    >>>2007/10/29 10:40:11 VCS WARNING V-16-10001-1016 (HKCTMP02) DiskGroup:CTMEM_DGnline:vxdg
    >>>import (clear flag) failed.
    >>>2007/10/29 10:40:12 VCS ERROR V-16-10001-1003 (HKCTMP02) DiskGroup:CTMEM_DGnline:**
    >>>ERROR: vxdg import (force) failed on
    >>>
    >>>==============
    >>>Thanks
    >>>
    >>>Me wrote:
    >>>
    >>>
    >>>>OK - imported on node1
    >>>>
    >>>>controlmdg02
    >>>>controlmdg03
    >>>>
    >>>>imported on node2
    >>>>controlmdg04
    >>>>controlmdg01
    >>>>controlmdg05
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>diskgroups not imported
    >>>>
    >>>>vxfencoorddg
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>
    >>>>------- from the above, all is fine
    >>>>the vxfencoorddg, I presume, is your coordinator diskgroup, which should
    >>>
    >>>
    >>>>never be imported.
    >>>>
    >>>>All the rest is imported somewhere.
    >>>>
    >>>>All the nodes see the same disks (can verify by doing a
    >>>>/etc/vx/diag/d/vxdmpinq to see the serial numbers, but it's OK)
    >>>>
    >>>>
    >>>>
    >>>>OK, so obviously the problem was fixed. Sorry, thought that the problem
    >>>
    >>>
    >>>>was still there and needed fixing now.
    >>>>
    >>>>
    >>>>Going to be difficult to determine what went wrong, but the best place

    >
    >
    >>>>to start looking then is to look at the VCS engine log
    >>>>(/var/VRTSvcs/log/engine_A.log). If you remember around which time
    >>>>and/or which diskgroup, search for that, and look for the online of the
    >>>
    >>>
    >>>>resource, and where it failed.
    >>>>
    >>>>Then, cut and paste the lines from the engine_A.log here so we can see.
    >>>>
    >>>>You will most likely see something explaining it all there.
    >>>>
    >>>>
    >>>>GO wrote:
    >>>>
    >>>>
    >>>>>output here. thanks for ur help
    >>>>>
    >>>>>pls ignore the resource status, some test were performed during command
    >>>
    >>>output.
    >>>
    >>>
    >>>>>vxdisk -alldgs list
    >>>>>
    >>>>>Node1:
    >>>>>=========
    >>>>>DEVICE TYPE DISK GROUP STATUS
    >>>>>EMC0_0 auto:cdsdisk - (vxfencoorddg) online
    >>>>>EMC0_1 auto:cdsdisk - (controlmdg01) online
    >>>>>EMC0_2 auto:cdsdisk - (controlmdg01) online
    >>>>>EMC0_3 auto:cdsdisk controlmdg0201 controlmdg02 online
    >>>>>EMC0_4 auto:cdsdisk controlmdg0202 controlmdg02 online
    >>>>>EMC0_5 auto:cdsdisk controlmdg0301 controlmdg03 online
    >>>>>EMC0_6 auto:cdsdisk - (controlmdg04) online
    >>>>>EMC0_7 auto:cdsdisk - (controlmdg05) online
    >>>>>EMC0_8 auto:cdsdisk controlmdg0302 controlmdg03 online
    >>>>>EMC0_9 auto:cdsdisk - (controlmdg04) online
    >>>>>EMC0_10 auto:cdsdisk - (controlmdg05) online
    >>>>>EMC0_11 auto:cdsdisk - (vxfencoorddg) online
    >>>>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>>>c0t4d0s2 auto:none - - online invalid
    >>>>>c0t5d0s2 auto:none - - online invalid
    >>>>>
    >>>>>Node 2:
    >>>>>==============
    >>>>>DEVICE TYPE DISK GROUP STATUS
    >>>>>EMC0_0 auto:cdsdisk - (controlmdg02) online
    >>>>>EMC0_1 auto:cdsdisk controlmdg0401 controlmdg04 online
    >>>>>EMC0_2 auto:cdsdisk - (controlmdg03) online
    >>>>>EMC0_3 auto:cdsdisk - (controlmdg02) online
    >>>>>EMC0_4 auto:cdsdisk - (controlmdg03) online
    >>>>>EMC0_5 auto:cdsdisk controlmdg0102 controlmdg01 online
    >>>>>EMC0_6 auto:cdsdisk controlmdg0101 controlmdg01 online
    >>>>>EMC0_7 auto:cdsdisk controlmdg0502 controlmdg05 online
    >>>>>EMC0_8 auto:cdsdisk - (vxfencoorddg) online
    >>>>>EMC0_9 auto:cdsdisk controlmdg0402 controlmdg04 online
    >>>>>EMC0_10 auto:cdsdisk - (vxfencoorddg) online
    >>>>>EMC0_11 auto:cdsdisk controlmdg0501 controlmdg05 online
    >>>>>EMC0_12 auto:cdsdisk - (vxfencoorddg) online
    >>>>>c0t0d0s2 auto:sliced rootdisk rootdg online
    >>>>>c0t1d0s2 auto:sliced rootmirror rootdg online
    >>>>>
    >>>>>hastatus -sum
    >>>>>============
    >>>>>
    >>>>>-- SYSTEM STATE
    >>>>>-- System State Frozen
    >>>>>
    >>>>>A HKCTMP01 RUNNING 0
    >>>>>A HKCTMP02 RUNNING 0
    >>>>>
    >>>>>-- GROUP STATE
    >>>>>-- Group System Probed AutoDisabled State
    >>>>>
    >>>>>
    >>>>>B CM1 HKCTMP01 Y N PARTIAL
    >>>>>
    >>>>>B CM1 HKCTMP02 Y N OFFLINE
    >>>>>
    >>>>>B CM2 HKCTMP01 Y N PARTIAL
    >>>>>
    >>>>>B CM2 HKCTMP02 Y N OFFLINE
    >>>>>
    >>>>>B CM3 HKCTMP01 Y N OFFLINE
    >>>>>
    >>>>>B CM3 HKCTMP02 Y N PARTIAL
    >>>>>
    >>>>>B CM4 HKCTMP01 Y N OFFLINE
    >>>>>
    >>>>>B CM4 HKCTMP02 Y N PARTIAL
    >>>>>
    >>>>>B EM HKCTMP01 Y N OFFLINE|FAULTED
    >>>>>B EM HKCTMP02 Y N PARTIAL
    >>>>>
    >>>>>
    >>>>>-- RESOURCES FAILED
    >>>>>-- Group Type Resource System

    >
    >
    >>>
    >>>
    >>>
    >>>>>
    >>>>>
    >>>>>C CM1 Application APPLICATION_AGCM01 HKCTMP01
    >>>
    >>>
    >>>
    >>>
    >>>>>
    >>>>>C CM2 Application APPLICATION_AGCM02 HKCTMP01
    >>>
    >>>
    >>>
    >>>
    >>>>>
    >>>>>C CM3 Application APPLICATION_AGCM03 HKCTMP02
    >>>
    >>>
    >>>
    >>>
    >>>>>
    >>>>>C CM4 Application APPLICATION_AGCM04 HKCTMP02
    >>>
    >>>
    >>>
    >>>
    >>>>>
    >>>>>C EM Application APPLICATION_AGEM01 HKCTMP02
    >>>
    >>>
    >>>
    >>>
    >>>>>
    >>>>>C EM Application APPLICATION_EM HKCTMP01
    >>>
    >>>
    >>>
    >>>
    >>>>>Thanks
    >>>>>Go
    >>>>>
    >>>>>Me wrote:
    >>>>>
    >>>>>
    >>>>>
    >>>>>>Fencing will prevent data loss by preventing access to a disk.
    >>>>>>
    >>>>>>In this sense, when a node does panic, the keys will be left there and
    >>>
    >>>
    >>>>>>other nodes in the cluster will be able to import (with a clearreserve)
    >>>>>
    >>>>>
    >>>>>>the diskgroup.
    >>>>>>
    >>>>>>
    >>>>>>OK, so what is needed to continue ?
    >>>>>>
    >>>>>>
    >>>>>>1. vxdisk -o alldgs list from both the nodes.
    >>>>>>2. hastatus -sum from any disk
    >>>>>>
    >>>>>>
    >>>>>>That is a start and then after seeing which machine sees what, we will
    >>>
    >>>
    >>>>>>be able to continue
    >>>>>>
    >>>>>>
    >>>>>>go wrote:
    >>>>>>
    >>>>>>
    >>>>>>
    >>>>>>>thanks for your help reply first.
    >>>>>>>here's my reply
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>1. Can both nodes see all the disks ? including the disks from the

    >
    >
    >>>>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>
    >>>
    >>>>>>>>machines and compare or post here).
    >>>>>>>
    >>>>>>>
    >>>>>>>YES, BOTH NODES CAN SEE ALL THE DISKS AND MOUNTED AT ONE NODE ONLY

    >
    > AT
    >
    >>>>>THE
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>SAMETIME (CONFIGURED AS FAILOVER GROUP). ACTUALLY, I CAN SWITCHOVER

    >
    > THE
    >
    >>>>>RESOURCE
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>GROUPS(hagrp -switch) BETWEEN TWO NODES, NO PROBLEM OCCURS.
    >>>>>>>
    >>>>>>>
    >>>>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>>>
    >>>>>
    >>>>>>>>and the diskgroup is not shared (not imported on both the nodes at

    >
    > the
    >
    >>>>>
    >>>>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>>>
    >>>>>>>
    >>>>>>>>with the vxdg import command (just read the man page - pretty sure

    >
    > it
    >
    >>>>>is
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>>>
    >>>>>
    >>>>>>>>using
    >>>>>>>
    >>>>>>>
    >>>>>>>I ASSUMED, FENCING SHOULD HANDLE THIS PART AS IT FORCE A NODE TO PANIC
    >>>>>
    >>>>>AND
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>REBOOT DUE TO LOST TWO HEARTBEAT. RIGTH? SO THIS CLEAR KEY STEP SHOULD
    >>>>>
    >>>>>NOT
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>BE TAKEN BY MYSELF/HAND
    >>>>>>>
    >>>>>>>WAHT INFORMATION YOU NEED FOR YOUR FURTHER HELP .
    >>>>>>>
    >>>>>>>BEST THANKS
    >>>>>>>GO
    >>>>>>>Me wrote:
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>Depends on a lot of things:
    >>>>>>>>
    >>>>>>>>1. Can both nodes see all the disks ? including the disks from the

    >
    >
    >>>>>>>>diskgroup that did not import ? (do "vxdisk -o alldg list" on both
    >>>
    >>>
    >>>>>>>>machines and compare or post here).
    >>>>>>>>
    >>>>>>>>2. The Service Group that includes the DiskGroup Resource, should

    >
    > be
    >
    >>>
    >>>>>>>>able to run on both the nodes. Look at the SystemList for the Service
    >>>>>
    >>>>>
    >>>>>>>>Group. Will always be best to use VCS to do this (it uses the correct
    >>>>>
    >>>>>
    >>>>>>>>options)
    >>>>>>>>
    >>>>>>>>3. Obviously you tried doing it by hand. If you have fencing enabled,
    >>>>>
    >>>>>
    >>>>>>>>and the diskgroup is not shared (not imported on both the nodes at

    >
    > the
    >
    >>>>>
    >>>>>>>>same time), then keys will be put onto them as soon as they are
    >>>>>>>>imported. If you want to remove keys , use the "-o clearreserve" option
    >>>>>>>
    >>>>>>>
    >>>>>>>>with the vxdg import command (just read the man page - pretty sure

    >
    > it
    >
    >>>>>is
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>>there). If that will not work, you can look at the keys on the disk(s)
    >>>>>
    >>>>>
    >>>>>>>>using
    >>>>>>>>
    >>>>>>>>/sbin/vxfenadm -r /dev/rdsk/c#t#d#s2
    >>>>>>>>
    >>>>>>>>and then kick of the key from the disk with
    >>>>>>>>
    >>>>>>>>/sbin/vxfenadm -a -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>This will place a key called "TEMP" on the disk, but will kick the

    >
    > other
    >
    >>>>>>>
    >>>>>>>>keys off (more importantly). Then you can delete the key with the

    >
    > command
    >
    >>>>>>>>vxfenadm -x -k TEMP /dev/rdsk/c#t#d#s2
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>If you want further help - will need to post smoe info here, but that
    >>>>>
    >>>>>
    >>>>>>>>should mostly cover it.
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>go wrote:
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>
    >>>>>>>>>hi all,
    >>>>>>>>>after vcs 4.1 installed and configured as two node cluster(implemented
    >>>>>>>
    >>>>>>>VXIO
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>>FENCING), i disconnected two HB cables. the primary node got the

    >
    > system
    >
    >>>>>>>panic
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>>and reboot, all service groups running on primay node failed over
    >>>
    >>>to
    >>>
    >>>
    >>>>>>>standby
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>
    >>>>>>>>>node, but vcs failed to import disk group on standby node. i still
    >>>>>
    >>>>>tried
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>>>to import DG on standby node by "vxdg -fC import DG" command, fail
    >>>
    >>>again.
    >>>
    >>>
    >>>>>>>>>all disk group can be imported on primary node when the primary node
    >>>>>
    >>>>>booted
    >>>>>
    >>>>>
    >>>>>
    >>>>>>>>>up and joined as cluster member. any idea ? thanks all first.
    >>>>>>>>>
    >>>>>>>>>
    >>>>>>>
    >>>>>>>

    >


+ Reply to Thread