IP Failover - Veritas Cluster Server

This is a discussion on IP Failover - Veritas Cluster Server ; Hi ! I have two node cluster on which I have configured IP failovers. i.e. I have resource group called IPFail. Under this group I have two resources. IPFail_IP (of type IP) and IPFail_NIC (type NIC). IPFail_IP is online on ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: IP Failover

  1. IP Failover

    Hi !
    I have two node cluster on which I have configured IP failovers.

    i.e. I have resource group called IPFail. Under this group I have two
    resources. IPFail_IP (of type IP) and IPFail_NIC (type NIC). IPFail_IP
    is online on one system (system A) and off-line on the other (system B).

    When system A crashes, IPFail_IP is automatically enabled on system B
    and system B assumes the IP address of A. This part is cool.

    However when system A recovers, resource IPFail_IP on A is brought
    online, causing a concurrency violation (since IPFail_IP got enabled on

    B after the A crashed). I have added system A to the autostart list.
    This problem can be solved if I removed system A from the autostart
    list, but then since A is the primary server, whenever A comes online I
    want IPFail_IP to be online on A and off-line on B ( B is the standby).

    Is there a way to automate this. I hope my explanation is clear. Any
    help will be appreciated

    Thanks & Regards
    Manik


  2. Re: IP Failover

    I have a similar situation. My understanding is that the service
    remains on the 'secondary' system, until it is 'forced' to fail back
    over to the 'primary' system.

    My guess is it would be possible to do a cron or something on the
    'secondary' system, to check the health of the 'primary', and if
    the 'primary' was healthy, AND the 'secondary' is running 'primary'
    services, it could force an artificial 'failure' and have VCS
    move the 'primary' services back to the 'primary' system.

    The logic is there, just a little out of my league.

    .... Jack



    Manik Taneja wrote:
    >
    > Hi !
    > I have two node cluster on which I have configured IP failovers.
    >
    > i.e. I have resource group called IPFail. Under this group I have two
    > resources. IPFail_IP (of type IP) and IPFail_NIC (type NIC). IPFail_IP
    > is online on one system (system A) and off-line on the other (system B).
    >
    > When system A crashes, IPFail_IP is automatically enabled on system B
    > and system B assumes the IP address of A. This part is cool.
    >
    > However when system A recovers, resource IPFail_IP on A is brought
    > online, causing a concurrency violation (since IPFail_IP got enabled on
    >
    > B after the A crashed). I have added system A to the autostart list.
    > This problem can be solved if I removed system A from the autostart
    > list, but then since A is the primary server, whenever A comes online I
    > want IPFail_IP to be online on A and off-line on B ( B is the standby).
    >
    > Is there a way to automate this. I hope my explanation is clear. Any
    > help will be appreciated
    >
    > Thanks & Regards
    > Manik


  3. Re: IP Failover


    Jack wrote:
    >I have a similar situation. My understanding is that the service
    >remains on the 'secondary' system, until it is 'forced' to fail back
    >over to the 'primary' system.
    >
    >My guess is it would be possible to do a cron or something on the
    >'secondary' system, to check the health of the 'primary', and if
    >the 'primary' was healthy, AND the 'secondary' is running 'primary'
    >services, it could force an artificial 'failure' and have VCS
    >move the 'primary' services back to the 'primary' system.
    >
    >The logic is there, just a little out of my league.
    >
    >.... Jack
    >
    >
    >
    >Manik Taneja wrote:
    >>
    >> Hi !
    >> I have two node cluster on which I have configured IP failovers.
    >>
    >> i.e. I have resource group called IPFail. Under this group I have two
    >> resources. IPFail_IP (of type IP) and IPFail_NIC (type NIC). IPFail_IP
    >> is online on one system (system A) and off-line on the other (system B).
    >>
    >> When system A crashes, IPFail_IP is automatically enabled on system B
    >> and system B assumes the IP address of A. This part is cool.
    >>
    >> However when system A recovers, resource IPFail_IP on A is brought
    >> online, causing a concurrency violation (since IPFail_IP got enabled

    on
    >>
    >> B after the A crashed). I have added system A to the autostart list.
    >> This problem can be solved if I removed system A from the autostart
    >> list, but then since A is the primary server, whenever A comes online

    I
    >> want IPFail_IP to be online on A and off-line on B ( B is the standby).
    >>
    >> Is there a way to automate this. I hope my explanation is clear. Any
    >> help will be appreciated
    >>
    >> Thanks & Regards
    >> Manik

    The both of you have deferent situations -

    Jack –
    *** NEVER USE CRON TO CONTROL A HA PRODUCT! ***
    This can cause very bad things to happen.
    Had a customer do a cron job to do a “hastop –all –force” and a “hastart”
    every 15 minutes because of Oracle failing. The problem was that Oracle was
    testing and updating a record and ran out of user licenses. So he fixed VCS
    NOT Oracle and you can only guess as to how VCS responded.

    An “auto-fail-back” is not available in VCS out of the box. I would try to
    use the triggers to do something to cause a “hagrp –switch “ to happen. Sorry
    can do the agent for you but, hope this helps some.

    Manik –

    I have to ask if the IP’s used in the resource IP the same as the interface
    on the first node. If it is, change it to a VIP (virtual IP) and not bound
    to a interface on a node. You may need to change your font end server(s)
    or client(s) and application(s) to start using the VIP and not the main interface
    of a node.


+ Reply to Thread