finding past failovers in engine_A.log - Veritas Cluster Server

This is a discussion on finding past failovers in engine_A.log - Veritas Cluster Server ; Working w/v4.0 on Solaris. I am looking to find a way to find the beginning and end of past failovers in the logs so I can determine when and how long they took. I started looking for "switch" as the ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: finding past failovers in engine_A.log

  1. finding past failovers in engine_A.log


    Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
    and end of past failovers in the logs so I can determine when and how long
    they took. I started looking for "switch" as the start of a failover, but
    that only finds it when we manually switch it, not failovers caused by faulted
    resources. Then I started looking for a faulted immediately followed by
    an offline, but that too resulted in finding some lines that weren't really
    failovers. Anyone know a good way to do this. I found a message from '02
    that said look for TAG_C, but that doesn't seem to correlate to anything.
    Thanks.

  2. Re: finding past failovers in engine_A.log

    If the failover was because of a fault, look for the word "FAULT". If it
    was user induced, look for the word "fired"

    Matthew Libhart wrote:
    > Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
    > and end of past failovers in the logs so I can determine when and how long
    > they took. I started looking for "switch" as the start of a failover, but
    > that only finds it when we manually switch it, not failovers caused by faulted
    > resources. Then I started looking for a faulted immediately followed by
    > an offline, but that too resulted in finding some lines that weren't really
    > failovers. Anyone know a good way to do this. I found a message from '02
    > that said look for TAG_C, but that doesn't seem to correlate to anything.
    > Thanks.


  3. Re: finding past failovers in engine_A.log


    Thanks for the suggestions, but I don't think it'll get me there.

    For user-induced I can look for "switch". "fired" won't work because that
    appears all over, everytime a resource is modified for instance ("User blah
    first command: hares -modify....).

    The word FAULT also appears many places where a failover did not occur.
    If a DB2 resource goes offline for instance, VCS records a FAULT, but its
    first reaction is to try to restart it. If it's successful, no failover
    takes place, but you still see the word FAULT.

    I'm running 4.0, and looking at that notified piece. The manual for the
    notifier even explicitly states they don't have an event for "starting a
    failover because of a fault". You have to look for a "faulted->group_offline->group_online"
    sequence. You'd think it'd be easier.

    Me wrote:
    >If the failover was because of a fault, look for the word "FAULT". If it


    >was user induced, look for the word "fired"
    >
    >Matthew Libhart wrote:
    >> Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
    >> and end of past failovers in the logs so I can determine when and how

    long
    >> they took. I started looking for "switch" as the start of a failover,

    but
    >> that only finds it when we manually switch it, not failovers caused by

    faulted
    >> resources. Then I started looking for a faulted immediately followed

    by
    >> an offline, but that too resulted in finding some lines that weren't really
    >> failovers. Anyone know a good way to do this. I found a message from

    '02
    >> that said look for TAG_C, but that doesn't seem to correlate to anything.
    >> Thanks.



  4. Re: finding past failovers in engine_A.log


    I take some of it back. I was looking at an old 2.0 cluster's messages because
    I didn't think they changed the logging as much as they really did. v4.0
    (and 3.5 I think) have a message that states "Group is faulted".
    I think this definitely marks the start of a failover. "Initiating switch"
    marks the beginning of a manual failover.

+ Reply to Thread