We have a couple of VCS clusters running on Intel Linux machines. We
have one in the Netherlands using an MSA storage device for the cluster
filesystem and we have another in the UK using a HDS 9980 SAN. Both of
these clusters, although not experiencing any tangible problems keep
kicking out the following series of messages into the engine_A log at
random times throughout the day for the diskgroup before trying to
offline the diskgroup, failing to do so and then seemingly rectifying
the problem itself...

2006/09/24 18:11:33 VCS ERROR V-16-2-13027 (nlnl3l16)
Resource(oraDCNL3P01_DG) - monitor procedure did not complete within
the expected time.
2006/09/24 18:17:37 VCS ERROR V-16-2-13210 (nlnl3l16) Agent is calling
clean for resource(oraDCNL3P01_DG) because 4 successive invocations of
the monitor procedure did not complete within the expected time.

2006/09/24 18:17:38 VCS INFO V-16-2-13068 (nlnl3l16)
Resource(oraDCNL3P01_DG) - clean completed successfully.

2006/09/24 18:18:40 VCS ERROR V-16-2-13077 (nlnl3l16) Agent is unable
to offline resource(oraDCNL3P01_DG). Administrative intervention may be

2006/09/24 18:18:40 VCS INFO V-16-6-15004 (nlnl3l16) hatrigger:Failed
to send trigger for resnotoff; script doesn't exist

2006/09/24 18:27:40 VCS INFO V-16-2-13026 (nlnl3l16)
Resource(oraDCNL3P01_DG) - monitor procedure finished successfully
after failing to complete within the expected time for (9) consecutive
2006/09/24 18:27:40 VCS INFO V-16-2-13082 (nlnl3l16)
Resource(oraDCNL3P01_DG) recovered from fault, on its own.

I don't understand what's causing this at all. Nothing out of the
ordinary appears to be going on at the time that these messages occur
and they happen quite regularly.

Has anyone seen anything like this before, or have any ideas as to what
might be causing it?

Many thanks in advance - Lee

big_sid's Profile: http://forums.yourdomain.com.au/member.php?userid=28
View this thread: http://forums.yourdomain.com.au/showthread.php?t=171878