application agent died occasionally - Veritas Cluster Server

This is a discussion on application agent died occasionally - Veritas Cluster Server ; Hi guys, Occasionally, application agent dies in our 4 nodes cluster with vcs4.0 Engine_A log is saying "Agent Application not sending alive messages since Sun May 13 06:51:11 2007", followed by "Agent /opt/VRTSvcs/bin/Application/ApplicationAgent for resource type Application successfully started at ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: application agent died occasionally

  1. application agent died occasionally


    Hi guys,

    Occasionally, application agent dies in our 4 nodes cluster with vcs4.0

    Engine_A log is saying "Agent Application not sending alive messages since
    Sun May 13 06:51:11 2007", followed by "Agent /opt/VRTSvcs/bin/Application/ApplicationAgent
    for resource type Application successfully started at Sun May 13 06:53:21
    2007" during the time when problem occurs. there is very likely 2 applications
    agent will be running since the problem occured. One agent(say pid 123) is
    considered to be dead by HAD and another agent (say pid 456) is the newly
    started by HAD.
    truss -p 123 showes the old agent is doing nothing, seems is really dead.
    truss -p 456 will show it continues to communicate with HAD. The application
    agent is reporting nothing from agent logs at the time.

    Any clue to address this issue? or is there an known bug related to this?

    Thanks.

    Eric

  2. Re: application agent died occasionally

    Any core file for the application agent ? (check
    /opt/VRTSvcs/bin/Application for a core file)

    Then do :


    file core


    Should say it is from the ApplicationAgent

    Now, depends on the Operating System you have, but

    pstack core

    for Solaris, will give you a stack trace for this.

    Post the stack here so we can see.



    OK, explanations:

    The Agents (includnig the ApplicationAgent) runs in user space and thus
    shares user space with other user programs. Anything (a program not
    behacving properly) can mess up the addres space used by the
    ApplicationAgent. A stack trace and the core file will help here.

    Would also check the binary (compare size and checksum to other nodes in
    the cluster)

    There are no known issues with this.




    Eric G wrote:
    > Hi guys,
    >
    > Occasionally, application agent dies in our 4 nodes cluster with vcs4.0
    >
    > Engine_A log is saying "Agent Application not sending alive messages since
    > Sun May 13 06:51:11 2007", followed by "Agent /opt/VRTSvcs/bin/Application/ApplicationAgent
    > for resource type Application successfully started at Sun May 13 06:53:21
    > 2007" during the time when problem occurs. there is very likely 2 applications
    > agent will be running since the problem occured. One agent(say pid 123) is
    > considered to be dead by HAD and another agent (say pid 456) is the newly
    > started by HAD.
    > truss -p 123 showes the old agent is doing nothing, seems is really dead.
    > truss -p 456 will show it continues to communicate with HAD. The application
    > agent is reporting nothing from agent logs at the time.
    >
    > Any clue to address this issue? or is there an known bug related to this?
    >
    > Thanks.
    >
    > Eric


+ Reply to Thread