rdac failover failing for DS4300 SAN - Aix

This is a discussion on rdac failover failing for DS4300 SAN - Aix ; Hello I have the following SAN configuration to hopefully provide complete resilience for storage. I maybe being naive, but when I test the strength of the resilience it fails on certain conditions. I have 3 pseries 52A with 2 fibre ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: rdac failover failing for DS4300 SAN

  1. rdac failover failing for DS4300 SAN

    Hello

    I have the following SAN configuration to hopefully provide complete
    resilience for storage. I maybe being naive, but when I test the
    strength of the resilience it fails on certain conditions.

    I have 3 pseries 52A with 2 fibre HBAs each.
    These are connected to 2 identical switches, so that each server is
    connected to each switch.
    I have a DS4300 SAN with 2 controllers, each with 2 fibre ports.
    each controller has one fibre connected to each switch.

    (There are 3 fibres from the servers into each switch and two out to
    the SAN)

    Essentially all components are duplicated in the network.

    To test the resilience I have been pulling fibres from the back of the
    SAN.
    If I pull both fibres from a single controller, there is no loss of
    service.
    If I pull 3 fibres from the SAN, service is lost. (i would hope this
    should survive)
    With all fibres connected correctly, if I pull the power on one of the
    switches, service is also lost.

    I was hoping that a total switch failure should be handled, is this
    unrealistic?

    Background info
    I am running with AIX 5.3 ML04. rdac appears to be correctly
    installed.

    On each server, I have 4 dac devices configured and a single dar.

    By reference to the WWN,
    dac0 and dac2 refer to controller0 on the SAN
    dac1 and dac3 to controller1

    dar0 has the following:

    #lsattr -E -l dar0
    act_controller dac2,dac3 Active Controllers
    False
    aen_freq 600 Polled AEN frequency in seconds
    True
    all_controller dac2,dac3 Available Controllers
    False
    autorecovery no Autorecover after failure is corrected
    True
    balance_freq 600 Dynamic Load Balancing frequency in seconds
    True
    cache_size 128 Cache size for both controllers
    False
    fast_write_ok yes Fast Write available
    False
    held_in_reset none Held-in-reset controller
    True
    hlthchk_freq 600 Health check frequency in seconds
    True
    load_balancing no Dynamic Load Balancing
    True
    switch_retries 5 Number of times to retry failed switches
    True

    fget_config -A -v

    ---dar0---

    User array name = 'U_Integration'
    dac2 ACTIVE dac3 ACTIVE

    Disk DAC LUN Logical Drive
    utm 31
    hdisk2 dac2 0 DBPrimary1
    hdisk3 dac2 1 FlashPrimary1
    hdisk4 dac2 2 DBMirror1
    hdisk5 dac2 3 FlashMirror1
    hdisk6 dac2 4 DBPrimary2
    hdisk7 dac2 5 FlashPrimary2
    hdisk8 dac2 6 DBMirror2
    hdisk9 dac2 7 FlashMirror2
    hdisk10 dac2 8 DBPrimary3
    hdisk11 dac2 9 FlashPrimary3
    hdisk12 dac2 10 DBMirror3
    hdisk13 dac2 11 FlashMirror3
    hdisk14 dac2 12 DBPrimary4
    hdisk15 dac2 13 FlashPrimary4
    hdisk16 dac2 14 DBMirror4
    hdisk17 dac2 15 FlashMirror4
    hdisk18 dac2 16 SharedFileSystem
    hdisk19 dac2 17 Cluster_OCR
    hdisk20 dac2 18 Cluster_VD

    The switches, I think, are Brocade. They also switch access to a tape
    library on a separate zone, on one of the switches.

    I would appreciate any advice on further configuration or pointers to
    documentation, as I have found very little. IN fact, an explanation of
    what's going on would be helpful?

    Rob


  2. Re: rdac failover failing for DS4300 SAN

    If I understand your configuration, I would expect to see one dar and two
    dacs not four dacs on each server, so something doesn't seem right to me. It
    sounds like each HBA is zoned to both controller A and to controller B and
    that's a no-no.

    If I were setting this up, I would cable both connections on controller A to
    one switch and both connection on controller B to the other swithch.

    On the front end I would select the server with the heaviest SAN traffic and
    zone one of its HBA's to contorllers A's first SAN connection and zone its
    other HBA to controller B's first SAN connection. For the other two servers,
    I would zone one HBA on each server to controller A's second SAN connection
    and the other HBA to controller B's second SAN connection.

    This is how we set up our AIX servers on our DS4800.


    "openstream rob" wrote in message
    news:1192053037.886046.276670@k79g2000hse.googlegr oups.com...
    > Hello
    >
    > I have the following SAN configuration to hopefully provide complete
    > resilience for storage. I maybe being naive, but when I test the
    > strength of the resilience it fails on certain conditions.
    >
    > I have 3 pseries 52A with 2 fibre HBAs each.
    > These are connected to 2 identical switches, so that each server is
    > connected to each switch.
    > I have a DS4300 SAN with 2 controllers, each with 2 fibre ports.
    > each controller has one fibre connected to each switch.
    >
    > (There are 3 fibres from the servers into each switch and two out to
    > the SAN)
    >
    > Essentially all components are duplicated in the network.
    >
    > To test the resilience I have been pulling fibres from the back of the
    > SAN.
    > If I pull both fibres from a single controller, there is no loss of
    > service.
    > If I pull 3 fibres from the SAN, service is lost. (i would hope this
    > should survive)
    > With all fibres connected correctly, if I pull the power on one of the
    > switches, service is also lost.
    >
    > I was hoping that a total switch failure should be handled, is this
    > unrealistic?
    >
    > Background info
    > I am running with AIX 5.3 ML04. rdac appears to be correctly
    > installed.
    >
    > On each server, I have 4 dac devices configured and a single dar.
    >
    > By reference to the WWN,
    > dac0 and dac2 refer to controller0 on the SAN
    > dac1 and dac3 to controller1
    >
    > dar0 has the following:
    >
    > #lsattr -E -l dar0
    > act_controller dac2,dac3 Active Controllers
    > False
    > aen_freq 600 Polled AEN frequency in seconds
    > True
    > all_controller dac2,dac3 Available Controllers
    > False
    > autorecovery no Autorecover after failure is corrected
    > True
    > balance_freq 600 Dynamic Load Balancing frequency in seconds
    > True
    > cache_size 128 Cache size for both controllers
    > False
    > fast_write_ok yes Fast Write available
    > False
    > held_in_reset none Held-in-reset controller
    > True
    > hlthchk_freq 600 Health check frequency in seconds
    > True
    > load_balancing no Dynamic Load Balancing
    > True
    > switch_retries 5 Number of times to retry failed switches
    > True
    >
    > fget_config -A -v
    >
    > ---dar0---
    >
    > User array name = 'U_Integration'
    > dac2 ACTIVE dac3 ACTIVE
    >
    > Disk DAC LUN Logical Drive
    > utm 31
    > hdisk2 dac2 0 DBPrimary1
    > hdisk3 dac2 1 FlashPrimary1
    > hdisk4 dac2 2 DBMirror1
    > hdisk5 dac2 3 FlashMirror1
    > hdisk6 dac2 4 DBPrimary2
    > hdisk7 dac2 5 FlashPrimary2
    > hdisk8 dac2 6 DBMirror2
    > hdisk9 dac2 7 FlashMirror2
    > hdisk10 dac2 8 DBPrimary3
    > hdisk11 dac2 9 FlashPrimary3
    > hdisk12 dac2 10 DBMirror3
    > hdisk13 dac2 11 FlashMirror3
    > hdisk14 dac2 12 DBPrimary4
    > hdisk15 dac2 13 FlashPrimary4
    > hdisk16 dac2 14 DBMirror4
    > hdisk17 dac2 15 FlashMirror4
    > hdisk18 dac2 16 SharedFileSystem
    > hdisk19 dac2 17 Cluster_OCR
    > hdisk20 dac2 18 Cluster_VD
    >
    > The switches, I think, are Brocade. They also switch access to a tape
    > library on a separate zone, on one of the switches.
    >
    > I would appreciate any advice on further configuration or pointers to
    > documentation, as I have found very little. IN fact, an explanation of
    > what's going on would be helpful?
    >
    > Rob
    >




+ Reply to Thread