This is a discussion on ScriptAgent not responsive every so often and is restarted in VCS 1.1.2 - Veritas Cluster Server ; Every so often I see messages like the following: TAG_B 2001/03/29 17:22:54 Thread(15) Error receiving from the engine. Agent is exiting TAG_B 2001/04/01 00:47:35 Thread(4) Operation(1608) on resource(cboe_aq_risk1) timed out TAG_B 2001/04/01 00:47:35 Thread(4) Cancelling thread (7) TAG_B 2001/04/01 00:47:36 ...
Every so often I see messages like the following:
TAG_B 2001/03/29 17:22:54 Thread(15) Error receiving from the engine. Agent
is exiting
TAG_B 2001/04/01 00:47:35 Thread(4) Operation(1608) on resource(cboe_aq_risk1)
timed out
TAG_B 2001/04/01 00:47:35 Thread(4) Cancelling thread (7)
TAG_B 2001/04/01 00:47:36 Thread(7) Got error(2) when deleting file(/var/VRTSvcs/log/tmp/Mobius-cboe_aq_risk1)
TAG_B 2001/04/01 00:47:36 Thread(11) E3003: Resource(cboe_aq_risk1) - monitor
procedure did not complete within the expected time.
TAG_B 2001/04/04 17:29:33 Thread(1) Could open IPM connection to vcs on localhost
TAG_B 2001/04/04 18:01:40 Thread(15) Error receiving from the engine. Agent
is exiting
TAG_B 2001/04/05 08:05:50 Thread(5) Resource(orbixd) in Offline state received
Offline command
You'll notice the thread messages and that it's offline. This is from
an agent we wrote called Mobius, which is just a script run by a copy of
the ScriptAgent.
In the engine log we see something like this:
TAG_E 2001/04/05 08:04:07 Initiating Offline of Resource orbixd on System
pendragon0
TAG_C 2001/04/05 08:05:44 Agent Mobius not sending alive messages since Thu
Apr 5 08:03:34 2001
TAG_D 2001/04/05 08:05:44 Agent /opt/VRTSvcs/bin/Mobius/MobiusAgent for resource
type Mobius successfully started at Thu Apr 5 08:05:44 2001
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_yet is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_demo is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_pfw is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_any is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_crash is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_rmb is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_spo is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_rsc is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_rtps_aq is online on pendragon0
TAG_E 2001/04/05 08:05:46 Resource cboe_aq_risk1 is online on pendragon0
TAG_E 2001/04/05 08:05:47 Resource cboe_rtps_aqo is online on pendragon0
TAG_E 2001/04/05 08:05:47 Resource cboe_aq_ett is online on pendragon0
The agent stops being responsive and is restarted by VCS' had. It then
does a monitor on each resource, hence the multiple online statements.
VCS seems to restart the agent just fine, leaving the resouces under the
agent *alone*, and we notice no service distruptions. I'm wondering,
though, if anyone else sees behavior like this?
thx,
Greg Gallagher
Sr. UNIX Systems Administrator
First Options of Chicago