Plugin fails to mark a cluster member down - Websphere

This is a discussion on Plugin fails to mark a cluster member down - Websphere ; Hy everyone, First off all, here is my question: how can I configure the plugin(Http server) to MARK down (unavailable) an app.server if this is in HUNG state or cannot answere to requests??? I have a big problem solving an ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: Plugin fails to mark a cluster member down

  1. Plugin fails to mark a cluster member down

    Hy everyone,
    First off all, here is my question: how can I configure the plugin(Http server) to MARK down (unavailable) an app.server if this is in HUNG state or cannot answere to requests???


    I have a big problem solving an IBM_HTTP_SERVER/1.3.28.1 plugin task.
    I am using 1 Http server and 2 WAS 5.1 app. servers:
    PLUGIN: --------------------System Information-----------------------
    PLUGIN: Bld version: 5.1.0
    PLUGIN: Bld date: Mar 21 2006, 16:31:30
    PLUGIN: Webserver: IBM_HTTP_SERVER/1.3.28.1 Apache/1.3.28 (Unix)
    PLUGIN: Hostname = xxxxxxx.yy.zz
    PLUGIN: NOFILES = hard: INFINITE, soft: 2000
    PLUGIN: MAX COREFILE SZ = hard: INFINITE, soft: 1073741312
    PLUGIN: DATA = hard: INFINITE, soft: 134217728
    PLUGIN: --------------------------------------------------------------

    I want to make a load balancing between the 2 application servers using the plugin-cfg.xml file as folows:





    The problem is that the plugin doesn't do failover.
    My scenario:
    I made an application which sends request to the plugin and this should load balance the requests. It works fine!But when an app.server gets in HUNG state, the plugin does not recognise it that the app.is no more answering to request. It should mar the server DOWN, but its NOT hapenning with the default plugin-cfg settings:
    RetryInterval="60"
    ConnectTimeout="0"
    MaxConnections="-1"
    ServerIOTimeout="0"
    What is wrong with this settings???
    Than I read some IBM docs, and I modified the plugin like this:
    RetryInterval="300"
    ConnectTimeout="10"
    MaxConnections="50"
    ServerIOTimeout="300"

    And now the plugin marks down the HUNG state server, BUT even if it is marked down IT also gets requests!!! and this is absurd. Why the app. server gets request from the plugin, if this is marked down?????


    Please help me as soon as possible, it is very important!




  2. Re: Plugin fails to mark a cluster member down

    barnikam@yahoo.com wrote:

    > And now the plugin marks down the HUNG state server, BUT even if it is marked down IT also gets requests!!! and this is absurd. Why the app. server gets request from the plugin, if this is marked down?????
    >


    If your webserver uses multiple child processes, I don't think the
    plug-in can share the fact that the server is down among them.

    This might confuse a quick artificial test.

    (MaxConnections definitely applies to each child process, not aggregate)

  3. Re: Plugin fails to mark a cluster member down

    Hy,

    thanks for the answere! Finaly I figured out myself, that the multi-process IHS was the problem. Now I updated to IHS 2.0(multi-threaded), BUT now there are 2 processes created when the browser sends request to plugin !!!why 2??? why not only one?
    If there are 2 process, than I got the old problem: MaxConnection for each process! I want to have only 1 process. How can I configure in that way?
    I don't understand this behavior!
    Pls help.
    here is my httpd.conf: (this is OK, when using IHS 2.0??)

    #MinSpareServers 5
    #MaxSpareServers 10
    #StartServers 5
    #MaxClients 150
    #MaxRequestsPerChild 500

    StartServers 1
    MinSpareServers 1
    MaxSpareServers 1
    MaxClients 150
    MaxRequestsPerChild 1024


  4. Re: Plugin fails to mark a cluster member down


    ThreadLimit 250
    ServerLimit 1
    StartServers 1
    MaxClients 250
    MinSpareThreads 250
    MaxSpareThreads 250
    ThreadsPerChild 250
    MaxRequestsPerChild 0


    This will start just one worker process and max to 250 concurrent
    connections (threads). Also:
    The use of the MaxConnections parameter in the WebSphere plug-in
    configuration is most effective when IBM HTTP Server 2.0 and above is used
    and there is a single IHS child process. However, there are other tradeoffs:
    linuxthreads (traditional pthread library on Linux): ThreadsPerChild greater
    than about 100 results in high CPU overhead
    SSL on any platform: threadsPerChild greater than about 100 results in high
    CPU overhead
    WebSphere 5.x plug-in has a file descriptor limitation which will be
    encountered on Linux and Solaris if ThreadsPerChild is greater than 500

    See; http://publib.boulder.ibm.com/httpse...rformance.html

    Sunit

    wrote in message
    news:173096664.1171282367535.JavaMail.wassrvr@ltsg was009.sby.ibm.com...
    > Hy,
    >
    > thanks for the answere! Finaly I figured out myself, that the
    > multi-process IHS was the problem. Now I updated to IHS
    > 2.0(multi-threaded), BUT now there are 2 processes created when the
    > browser sends request to plugin !!!why 2??? why not only one?
    > If there are 2 process, than I got the old problem: MaxConnection for each
    > process! I want to have only 1 process. How can I configure in that way?
    > I don't understand this behavior!
    > Pls help.
    > here is my httpd.conf: (this is OK, when using IHS 2.0??)
    >
    > #MinSpareServers 5
    > #MaxSpareServers 10
    > #StartServers 5
    > #MaxClients 150
    > #MaxRequestsPerChild 500
    >
    > StartServers 1
    > MinSpareServers 1
    > MaxSpareServers 1
    > MaxClients 150
    > MaxRequestsPerChild 1024
    >




  5. Re: Plugin fails to mark a cluster member down

    Hello,

    Thanks for the answeres, they were effective for me. I will try it on AIX using IHS 2.0.
    I have a last question: in IHS 1.3.x in httpd.conf exists this module:



    In IHS 2.0 exists only ?? or both and ?
    If both, than how can be configured the ?

    Thx, Barni

  6. Re: Plugin fails to mark a cluster member down

    See http://httpd.apache.org/docs/2.0/mod/prefork.html
    wrote in message
    news:1776755362.1171349831434.JavaMail.wassrvr@lts gwas010.sby.ibm.com...
    > Hello,
    >
    > Thanks for the answeres, they were effective for me. I will try it on AIX
    > using IHS 2.0.
    > I have a last question: in IHS 1.3.x in httpd.conf exists this module:
    >
    >

    >
    > In IHS 2.0 exists only ?? or both
    > and ?
    > If both, than how can be configured the ?
    >
    > Thx, Barni




  7. Re: Plugin fails to mark a cluster member down

    barnikam@yahoo.com wrote:
    > Hello,
    >
    > Thanks for the answeres, they were effective for me. I will try it on AIX using IHS 2.0.
    > I have a last question: in IHS 1.3.x in httpd.conf exists this module:
    >
    >

    >
    > In IHS 2.0 exists only ?? or both and ?
    > If both, than how can be configured the ?
    >
    > Thx, Barni


    Only the worker MPM is provided with IHS 2.x/6.x for unix operating
    systems. You can't use prefork.

  8. Re: Plugin fails to mark a cluster member down

    Also, be aware of an issue I am seeing. My application responds in sub-seconds unless a jvm hangs or is down. Then, a live request is held up to whatever the ServerIOTimeout=1 value is set and the plug-in then decides to send the request to a known good server. (I don't like using a real request for a healthcheck).
    When the condition exists where a jvm in a cluster is down or hung, and the plugin marks this jvm as down, I see the active connection to this server drop since the load has been reduced. This is good and proves that the plug-in can still connect and is not reaching max connections and the ConnectTimeout=1 does not come into play and the connections are working fine. As HW and apps get faster and faster, we need to push vendors to allow us to set timeout values closer to the values for good responses. This means driving them to allow time-outs set to microseconds as we see in the response times running apps on Linux HW. I think the first step is to architest the plug-in so the healthcheck does not affect a real request. IBM says to fix the reason the app server crashed or is hung, if app servers and code could be developed so the failures don't occur, I wouldn't need the time-out configuration values in the first place. I think the first step to my issue is to figure out if a open-source plugin or router exists that doesn't have this issue. Does anyone have one.

+ Reply to Thread