URGENT : http error and STOPPING|PARTIAL status - Veritas Cluster Server

This is a discussion on URGENT : http error and STOPPING|PARTIAL status - Veritas Cluster Server ; Hi Suddenly my nagios sent me email, http is critical, when I checked on one of node where all service groups were running fine, hastatus -sum -- SYSTEM STATE -- System State Frozen A server1 RUNNING 0 A server2 RUNNING ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: URGENT : http error and STOPPING|PARTIAL status

  1. URGENT : http error and STOPPING|PARTIAL status


    Hi

    Suddenly my nagios sent me email, http is critical, when I checked on one
    of node where all service groups were running fine,

    hastatus -sum

    -- SYSTEM STATE
    -- System State Frozen

    A server1 RUNNING 0
    A server2 RUNNING 0

    -- GROUP STATE
    -- Group System Probed AutoDisabled State

    B ClusterService server1 Y N OFFLINE
    B ClusterService server2 Y N ONLINE
    B bb server1 Y N STOPPING|PARTIAL
    B bb server2 Y N OFFLINE

    -- RESOURCES OFFLINING
    -- Group Type Resource System
    IState

    F bb DiskGroup bbdg server1
    W_OFFLINE_PROPAGATE



    Server log in messages file

    Jan 18 23:22:51 server1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault
    detected external to device; service degraded
    Jan 18 23:22:51 server1 genunix: [ID 451854 kern.warning ] WARNING: ce1:
    xcvr addr:0x01 - link down
    Jan 18 23:22:58 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 8 sec (2421280/13593146)
    Jan 18 23:22:59 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 9 sec (2421280/13593150)
    Jan 18 23:23:00 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 10 sec (2421280/13593154)
    Jan 18 23:23:01 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 11 sec (2421280/13593158)
    Jan 18 23:23:02 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 12 sec (2421280/13593162)
    Jan 18 23:23:03 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 13 sec (2421280/13593168)
    Jan 18 23:23:04 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 14 sec (2421280/13593172)
    Jan 18 23:23:05 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 15 sec (2421280/13593176)
    Jan 18 23:23:06 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    node 1 inactive 16 sec (2421280/13593180)
    Jan 18 23:23:06 server1 llt: [ID 106513 kern.notice] LLT:10033: link 2 (ce1)
    node 1 expired
    Jan 18 23:32:39 server1 genunix: [ID 408789 kern.notice] NOTICE: ce1: fault
    cleared external to device; service available
    Jan 18 23:32:39 server1 genunix: [ID 451854 kern.notice] NOTICE: ce1: xcvr
    addr:0x01 - link up 1000 Mbps full duplex
    Jan 18 23:32:42 server1 llt: [ID 465730 kern.notice] LLT:10024: link 2 (ce1)
    node 1 active


    What could be the issue ?

  2. Re: URGENT : http error and STOPPING|PARTIAL status

    OK, there are a couple of things here.

    1. There is a DiskGroup that will not go offline . This is normally due
    to the fact that either the disks are not available or that vxconfigd is
    not running.

    2. One of the private links is going up and down (hardware related - not
    VCS). This is causing the systems to go in jeopardy and this could
    influence the online/offline of resource.



    --- What would I suggest ?

    hagrp -flush bb

    then

    hagrp -offline bb -sys server1


    that should do it




    Upendra wrote:
    > Hi
    >
    > Suddenly my nagios sent me email, http is critical, when I checked on one
    > of node where all service groups were running fine,
    >
    > hastatus -sum
    >
    > -- SYSTEM STATE
    > -- System State Frozen
    >
    > A server1 RUNNING 0
    > A server2 RUNNING 0
    >
    > -- GROUP STATE
    > -- Group System Probed AutoDisabled State
    >
    > B ClusterService server1 Y N OFFLINE
    > B ClusterService server2 Y N ONLINE
    > B bb server1 Y N STOPPING|PARTIAL
    > B bb server2 Y N OFFLINE
    >
    > -- RESOURCES OFFLINING
    > -- Group Type Resource System
    > IState
    >
    > F bb DiskGroup bbdg server1
    > W_OFFLINE_PROPAGATE
    >
    >
    >
    > Server log in messages file
    >
    > Jan 18 23:22:51 server1 genunix: [ID 408789 kern.warning] WARNING: ce1: fault
    > detected external to device; service degraded
    > Jan 18 23:22:51 server1 genunix: [ID 451854 kern.warning ] WARNING: ce1:
    > xcvr addr:0x01 - link down
    > Jan 18 23:22:58 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 8 sec (2421280/13593146)
    > Jan 18 23:22:59 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 9 sec (2421280/13593150)
    > Jan 18 23:23:00 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 10 sec (2421280/13593154)
    > Jan 18 23:23:01 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 11 sec (2421280/13593158)
    > Jan 18 23:23:02 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 12 sec (2421280/13593162)
    > Jan 18 23:23:03 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 13 sec (2421280/13593168)
    > Jan 18 23:23:04 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 14 sec (2421280/13593172)
    > Jan 18 23:23:05 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 15 sec (2421280/13593176)
    > Jan 18 23:23:06 server1 llt: [ID 120420 kern.notice] LLT:10032: link 2 (ce1)
    > node 1 inactive 16 sec (2421280/13593180)
    > Jan 18 23:23:06 server1 llt: [ID 106513 kern.notice] LLT:10033: link 2 (ce1)
    > node 1 expired
    > Jan 18 23:32:39 server1 genunix: [ID 408789 kern.notice] NOTICE: ce1: fault
    > cleared external to device; service available
    > Jan 18 23:32:39 server1 genunix: [ID 451854 kern.notice] NOTICE: ce1: xcvr
    > addr:0x01 - link up 1000 Mbps full duplex
    > Jan 18 23:32:42 server1 llt: [ID 465730 kern.notice] LLT:10024: link 2 (ce1)
    > node 1 active
    >
    >
    > What could be the issue ?


+ Reply to Thread