WebSphere Portal - High Availabilty - Websphere

This is a discussion on WebSphere Portal - High Availabilty - Websphere ; *Environment* 1. WAS-ND for Portal Cluster Deployment Manager V 6.0.2.23 (with WAS JAVA-SDK version 6.0.2.25 installed) 2. WebSphere Portal version 6.0.1.3 (no WCM installed) (Two-node cluster) 3. Portal is running on each node on WebSphere Application Server version 6.0.2.23 (with ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: WebSphere Portal - High Availabilty

  1. WebSphere Portal - High Availabilty

    *Environment*

    1. WAS-ND for Portal Cluster Deployment Manager V 6.0.2.23 (with WAS JAVA-SDK version 6.0.2.25 installed)
    2. WebSphere Portal version 6.0.1.3 (no WCM installed) (Two-node cluster)
    3. Portal is running on each node on WebSphere Application Server version 6.0.2.23 (with WAS JAVA-SDK version 6.0.2.25 installed)
    3. Portal is running on each node on a Solaris 10 platform ( Release 08/2007 )
    4. IBM HTTP Server V6 installed on each portal cluster node, configured with the DMGR and managed by it. (Each HTTP can forward requests to the two portal nodes)
    6. Remote Oracle Database Server 10g ( 10.2.0.3.0 )
    7. Remote SunOne LDAP Directory Version 5


    *Problem Description*

    In the case where the portal server on node 1 is running (i.e.: the O.S. indicates that the portal process is still running in the system)
    But when we access the portal node 1 URL on a web-browser, we get no response or we get the following result on the browser:
    "Internal server error"
    or
    "Page cannot be found"

    (i.e. the portal process on node 1 is freezing/hanging).

    Will the HTTP server continue forwarding requests to the freezing portal node 1? Or is the HTTP server smart enough to sense that the running portal node 1 process is freezing, and thus starts forwarding all the requests only to the working portal node 2?

  2. Re: WebSphere Portal - High Availabilty

    It depends on what the problem is with the server. If the server is hanging due to a CPU spike then most likely you won't see a failover. This document explains the process in more detail.

    http://www-01.ibm.com/support/docvie...id=swg21219808

    Regards,
    Brian

  3. Re: WebSphere Portal - High Availabilty

    I understand from this document that if I set a suitable ServerIOTimeOut value, I will solve the problem, as the HTTP Server will mark the cluster member down because HTTP didn't receive any response to it's request from the cluster member, even if the cluster member portal java process was still running in the system.

    Correct me please if I'm wrong.

  4. Re: WebSphere Portal - High Availabilty

    It doesn't quite work how you would initially think. If your portal app is hung, then application server can technically still be running. As long as the http plugin can open sockets to the http transports on the application server then the http server won't mark that cluster member down. The plugin is watching the application server, not necessarily the applications running on that server.

    Brian

  5. Re: WebSphere Portal - High Availabilty

    Thanks for the explanation Brian. This explains the point exactly.
    So is there any way using the HTTP Plugin or using anything else which can let me detect the case when the application server is responding to the HTTP Plugin and can open sockets together, but the portal application is not? I mean anyway to monitor the status of the portal application or any other application running on the application server?

  6. Re: WebSphere Portal - High Availabilty

    I would use a separate monitoring tool such as the following:

    Tivoli Composite Application Manager for WebSphere
    http://www-01.ibm.com/software/tivol...mgr-websphere/

    CA Wily Introscope
    http://www.wilytech.com/solutions/pr...P_IBM_WAS.html


    Regards,
    Brian

  7. Re: WebSphere Portal - High Availabilty

    A few additional suggestions for ways to monitor for hang that we utilize:

    1. monitor ESTABLISHED connections to the wc_defaulthost or wc_defaulthost_secure (depending on type of traffic).
    - when a clone is hung, requests will still get sent by IHS, but not be handled. Set a threshold of what
    the highest should be even in peak times, and notify if higher. Perl, bash, etc. should do the trick.

    2. Enable /server-status in IHS. Write something to parse the number of "W" Sending Reply. These count the number
    of connections from IHS to all the clones. It doesn't narrow down which clone is having the problem, but
    can give a warning something is wrong.

    3. If you have a url testing application, or can write a simple one, create a 'plain' page and hit it. If the jvm is
    hung, you won't even be able to get this page back, so it's a good enough test. We just test the portal
    login screen. If it spins, it's hung.

+ Reply to Thread