finding past failovers in engine_A.log - Veritas Cluster Server
This is a discussion on finding past failovers in engine_A.log - Veritas Cluster Server ; Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
and end of past failovers in the logs so I can determine when and how long
they took. I started looking for "switch" as the ...
-
finding past failovers in engine_A.log
Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
and end of past failovers in the logs so I can determine when and how long
they took. I started looking for "switch" as the start of a failover, but
that only finds it when we manually switch it, not failovers caused by faulted
resources. Then I started looking for a faulted immediately followed by
an offline, but that too resulted in finding some lines that weren't really
failovers. Anyone know a good way to do this. I found a message from '02
that said look for TAG_C, but that doesn't seem to correlate to anything.
Thanks.
-
Re: finding past failovers in engine_A.log
If the failover was because of a fault, look for the word "FAULT". If it
was user induced, look for the word "fired"
Matthew Libhart wrote:
> Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
> and end of past failovers in the logs so I can determine when and how long
> they took. I started looking for "switch" as the start of a failover, but
> that only finds it when we manually switch it, not failovers caused by faulted
> resources. Then I started looking for a faulted immediately followed by
> an offline, but that too resulted in finding some lines that weren't really
> failovers. Anyone know a good way to do this. I found a message from '02
> that said look for TAG_C, but that doesn't seem to correlate to anything.
> Thanks.
-
Re: finding past failovers in engine_A.log
Thanks for the suggestions, but I don't think it'll get me there.
For user-induced I can look for "switch". "fired" won't work because that
appears all over, everytime a resource is modified for instance ("User blah
first command: hares -modify....).
The word FAULT also appears many places where a failover did not occur.
If a DB2 resource goes offline for instance, VCS records a FAULT, but its
first reaction is to try to restart it. If it's successful, no failover
takes place, but you still see the word FAULT.
I'm running 4.0, and looking at that notified piece. The manual for the
notifier even explicitly states they don't have an event for "starting a
failover because of a fault". You have to look for a "faulted->group_offline->group_online"
sequence. You'd think it'd be easier.
Me wrote:
>If the failover was because of a fault, look for the word "FAULT". If it
>was user induced, look for the word "fired"
>
>Matthew Libhart wrote:
>> Working w/v4.0 on Solaris. I am looking to find a way to find the beginning
>> and end of past failovers in the logs so I can determine when and how
long
>> they took. I started looking for "switch" as the start of a failover,
but
>> that only finds it when we manually switch it, not failovers caused by
faulted
>> resources. Then I started looking for a faulted immediately followed
by
>> an offline, but that too resulted in finding some lines that weren't really
>> failovers. Anyone know a good way to do this. I found a message from
'02
>> that said look for TAG_C, but that doesn't seem to correlate to anything.
>> Thanks.
-
Re: finding past failovers in engine_A.log
I take some of it back. I was looking at an old 2.0 cluster's messages because
I didn't think they changed the logging as much as they really did. v4.0
(and 3.5 I think) have a message that states "Group is faulted".
I think this definitely marks the start of a failover. "Initiating switch"
marks the beginning of a manual failover.