VCS unmount the external EMC storage - Veritas Cluster Server

This is a discussion on VCS unmount the external EMC storage - Veritas Cluster Server ; Hi Gurus i have 2 SunF880 cluster with the VCS and using the external storage EMC. every some time 1 of the server will just unmount the EMC filesystem by itself without reason. This only happen to 1 server. Based ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: VCS unmount the external EMC storage

  1. VCS unmount the external EMC storage


    Hi Gurus

    i have 2 SunF880 cluster with the VCS and using the external storage EMC.


    every some time 1 of the server will just unmount the EMC filesystem by itself

    without reason. This only happen to 1 server.

    Based on the engine_A.log, below is the info:

    TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor procedure
    did not complete within the expected time.
    TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for resource(datadg)
    because monitor did not complete within the expected time.
    TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean completed
    successfully.

    Appreciate any help
    Chee Kiong

  2. Re: VCS unmount the external EMC storage

    Hi,

    The reason is that the monitor for the disk group reached a timeout, MonitorTimeout. Default timeout is 60 seconds. Is your system heavily loaded? This can be an issue for all
    monitors in VCS. Are there more than this disk group in the configuration? Does they face the same problem on that node? Are there any other resources that have the same type of
    problem?

    /P

    Chee Kiong wrote:
    > Hi Gurus
    >
    > i have 2 SunF880 cluster with the VCS and using the external storage EMC.
    >
    >
    > every some time 1 of the server will just unmount the EMC filesystem by itself
    >
    > without reason. This only happen to 1 server.
    >
    > Based on the engine_A.log, below is the info:
    >
    > TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor procedure
    > did not complete within the expected time.
    > TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for resource(datadg)
    > because monitor did not complete within the expected time.
    > TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean completed
    > successfully.
    >
    > Appreciate any help
    > Chee Kiong



  3. Re: VCS unmount the external EMC storage


    Hi Peter,

    i am running oracle 9i RAC on the 2 node. only 1 node is having this problem
    of
    unmounting by itself. All datagroup in this node are unmounted. The 2nd node
    do not have this problem.

    Appreciate your advice.

    Thanks
    Chee Kiong

    Peter Sevborn wrote:
    >Hi,
    >
    >The reason is that the monitor for the disk group reached a timeout, MonitorTimeout.
    >Default timeout is 60 seconds. Is your system heavily loaded? This can be

    an issue
    >for all
    >monitors in VCS. Are there more than this disk group in the configuration?

    Does they
    >face the same problem on that node? Are there any other resources that have

    the same
    >type of
    >problem?
    >
    >/P
    >
    >Chee Kiong wrote:
    >> Hi Gurus
    >>
    >> i have 2 SunF880 cluster with the VCS and using the external storage EMC.
    >>
    >>
    >> every some time 1 of the server will just unmount the EMC filesystem by

    itself
    >>
    >> without reason. This only happen to 1 server.
    >>
    >> Based on the engine_A.log, below is the info:
    >>
    >> TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor

    procedure
    >> did not complete within the expected time.
    >> TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for

    resource(datadg)
    >> because monitor did not complete within the expected time.
    >> TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean completed
    >> successfully.
    >>
    >> Appreciate any help
    >> Chee Kiong

    >



  4. Re: VCS unmount the external EMC storage

    Hi,

    Have you checked the load on node 1 when the oracle service is runnig on that node? Have you tested to run all services on node2? Use vmstat or sar to monitor load on the nodes.

    /Peter

    Chee Kiong wrote:
    > Hi Peter,
    >
    > i am running oracle 9i RAC on the 2 node. only 1 node is having this problem
    > of
    > unmounting by itself. All datagroup in this node are unmounted. The 2nd node
    > do not have this problem.
    >
    > Appreciate your advice.
    >
    > Thanks
    > Chee Kiong
    >
    > Peter Sevborn wrote:
    >
    >>Hi,
    >>
    >>The reason is that the monitor for the disk group reached a timeout, MonitorTimeout.
    >>Default timeout is 60 seconds. Is your system heavily loaded? This can be

    >
    > an issue
    >
    >>for all
    >>monitors in VCS. Are there more than this disk group in the configuration?

    >
    > Does they
    >
    >>face the same problem on that node? Are there any other resources that have

    >
    > the same
    >
    >>type of
    >>problem?
    >>
    >>/P
    >>
    >>Chee Kiong wrote:
    >>
    >>>Hi Gurus
    >>>
    >>>i have 2 SunF880 cluster with the VCS and using the external storage EMC.
    >>>
    >>>
    >>>every some time 1 of the server will just unmount the EMC filesystem by

    >
    > itself
    >
    >>>without reason. This only happen to 1 server.
    >>>
    >>>Based on the engine_A.log, below is the info:
    >>>
    >>>TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor

    >
    > procedure
    >
    >>>did not complete within the expected time.
    >>>TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for

    >
    > resource(datadg)
    >
    >>>because monitor did not complete within the expected time.
    >>>TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean completed
    >>>successfully.
    >>>
    >>>Appreciate any help
    >>>Chee Kiong

    >>

    >



  5. Re: VCS unmount the external EMC storage


    The load on node 1 was very busy. But when node 1 unmounted the EMC, all
    services were connected to node 2. Node 2 do not face this problem,
    even though the load is very busy.

    This is quite strange.

    Thanks
    Chee Kiong

    Peter Sevborn wrote:
    >Hi,
    >
    >Have you checked the load on node 1 when the oracle service is runnig on

    that node?
    >Have you tested to run all services on node2? Use vmstat or sar to monitor

    load on
    >the nodes.
    >
    >/Peter
    >
    >Chee Kiong wrote:
    >> Hi Peter,
    >>
    >> i am running oracle 9i RAC on the 2 node. only 1 node is having this problem
    >> of
    >> unmounting by itself. All datagroup in this node are unmounted. The 2nd

    node
    >> do not have this problem.
    >>
    >> Appreciate your advice.
    >>
    >> Thanks
    >> Chee Kiong
    >>
    >> Peter Sevborn wrote:
    >>
    >>>Hi,
    >>>
    >>>The reason is that the monitor for the disk group reached a timeout, MonitorTimeout.
    >>>Default timeout is 60 seconds. Is your system heavily loaded? This can

    be
    >>
    >> an issue
    >>
    >>>for all
    >>>monitors in VCS. Are there more than this disk group in the configuration?

    >>
    >> Does they
    >>
    >>>face the same problem on that node? Are there any other resources that

    have
    >>
    >> the same
    >>
    >>>type of
    >>>problem?
    >>>
    >>>/P
    >>>
    >>>Chee Kiong wrote:
    >>>
    >>>>Hi Gurus
    >>>>
    >>>>i have 2 SunF880 cluster with the VCS and using the external storage

    EMC.
    >>>>
    >>>>
    >>>>every some time 1 of the server will just unmount the EMC filesystem

    by
    >>
    >> itself
    >>
    >>>>without reason. This only happen to 1 server.
    >>>>
    >>>>Based on the engine_A.log, below is the info:
    >>>>
    >>>>TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor

    >>
    >> procedure
    >>
    >>>>did not complete within the expected time.
    >>>>TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for

    >>
    >> resource(datadg)
    >>
    >>>>because monitor did not complete within the expected time.
    >>>>TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean

    completed
    >>>>successfully.
    >>>>
    >>>>Appreciate any help
    >>>>Chee Kiong
    >>>

    >>

    >



  6. Re: VCS unmount the external EMC storage


    are there any system differences between node1 and 2? when the monitor does
    not complete within the expected time that means that the scipt is not returning
    the exit code within the monitor timout value. if the system is under heavy
    load, things can get backed up wainting for thier turn to complete. a simple
    solution would be to extend the MonitorTimeout and MonitorInterval attributes
    for that resource.

    hatype -modify MonitorInterval 120
    hatype -modify MonitorTimeout 120

    usually the defaults are 60 and 60.

    otherwise you need to address the system state at the time of the timeout's
    to see if the system is infact to busy to reply to the agent. this is important
    because if the system get to a point where HA is swapped out, then you could
    panic the box via GAB.

    "Chee Kiong" wrote:
    >
    >The load on node 1 was very busy. But when node 1 unmounted the EMC, all


    >services were connected to node 2. Node 2 do not face this problem,
    >even though the load is very busy.
    >
    >This is quite strange.
    >
    >Thanks
    >Chee Kiong
    >
    >Peter Sevborn wrote:
    >>Hi,
    >>
    >>Have you checked the load on node 1 when the oracle service is runnig on

    >that node?
    >>Have you tested to run all services on node2? Use vmstat or sar to monitor

    >load on
    >>the nodes.
    >>
    >>/Peter
    >>
    >>Chee Kiong wrote:
    >>> Hi Peter,
    >>>
    >>> i am running oracle 9i RAC on the 2 node. only 1 node is having this

    problem
    >>> of
    >>> unmounting by itself. All datagroup in this node are unmounted. The 2nd

    >node
    >>> do not have this problem.
    >>>
    >>> Appreciate your advice.
    >>>
    >>> Thanks
    >>> Chee Kiong
    >>>
    >>> Peter Sevborn wrote:
    >>>
    >>>>Hi,
    >>>>
    >>>>The reason is that the monitor for the disk group reached a timeout,

    MonitorTimeout.
    >>>>Default timeout is 60 seconds. Is your system heavily loaded? This can

    >be
    >>>
    >>> an issue
    >>>
    >>>>for all
    >>>>monitors in VCS. Are there more than this disk group in the configuration?
    >>>
    >>> Does they
    >>>
    >>>>face the same problem on that node? Are there any other resources that

    >have
    >>>
    >>> the same
    >>>
    >>>>type of
    >>>>problem?
    >>>>
    >>>>/P
    >>>>
    >>>>Chee Kiong wrote:
    >>>>
    >>>>>Hi Gurus
    >>>>>
    >>>>>i have 2 SunF880 cluster with the VCS and using the external storage

    >EMC.
    >>>>>
    >>>>>
    >>>>>every some time 1 of the server will just unmount the EMC filesystem

    >by
    >>>
    >>> itself
    >>>
    >>>>>without reason. This only happen to 1 server.
    >>>>>
    >>>>>Based on the engine_A.log, below is the info:
    >>>>>
    >>>>>TAG_B 2003/10/21 14:05:16 (dbs01) VCS:13027:Resource(datadg) - monitor
    >>>
    >>> procedure
    >>>
    >>>>>did not complete within the expected time.
    >>>>>TAG_D 2003/10/21 14:11:18 (dbs01) VCS:13080:Agent is calling clean for
    >>>
    >>> resource(datadg)
    >>>
    >>>>>because monitor did not complete within the expected time.
    >>>>>TAG_D 2003/10/21 14:11:19 (dbs01) VCS:13068:Resource(datadg) - clean

    >completed
    >>>>>successfully.
    >>>>>
    >>>>>Appreciate any help
    >>>>>Chee Kiong
    >>>>
    >>>

    >>

    >



+ Reply to Thread