lwps; how many are too many? - SUN

This is a discussion on lwps; how many are too many? - SUN ; Fellow admins; please help with troubleshooting an issue: I have a problem intermittently on a production server (Sun 429R with 4 X 450 CPUs and 4 GB ram).The server is running Oracle Applications, with Java and Apache. At issue, is ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: lwps; how many are too many?

  1. lwps; how many are too many?

    Fellow admins; please help with troubleshooting an issue:

    I have a problem intermittently on a production server (Sun 429R with 4
    X 450 CPUs and 4 GB ram).The server is running Oracle Applications,
    with Java and Apache. At issue, is that the Oracle applications users
    will report that the server is hung; they cannot perform any tasks, and
    it appears that there is no connection to the DB.

    In each and every case of this occurring, the server appears to be
    fine; I've been running top to get info, coupled with a script I
    cobbled together to gather some stats as to what's going on right at
    that moment some iostat, vmstat, df -k, mount, etc.. The top info shows
    that we have plenty of memory, very little usr stats, no (or VERY
    little) i/o wait, and very high CPU idle. The other stats I gather at
    the time bears this out. The default sar stats show nothing out of the
    ordinary.

    In every case, a reboot takes care of the issue for a while, until the
    next occurrance. I know...I know; find and fix the problem, don't
    simply reboot. Unfortunately, I'm inheriting this problem; this was the
    only solution that the previous admin seemed to use. Since this is our
    production server, management and the user community want it back
    available NOW, and nevermind the troubleshooting. To quote Yossarian;
    "That's some catch, that catch-22".

    The database it connects to is on another server, connectivity to it is
    never an issue; the DB all looks OK, and remains available for direct
    login.

    I've been doing some looking into vmstat, (which seems to be more
    powerful than top), and am curious about the lightweight Processes
    (lwps) statistics. Currently, the system has @380 lpws, 237 of them
    owned by oracle. The remainder are owned by root and daemon.

    Of all the processes, it appears that the 2 of the 3 java processes
    have the most lwps assigned to them; 25 for one PID and 14 for
    another). The remainder of the processes all have onsey-twosey, a
    couple have 4 or 5. Again, these are on the currently running server,
    with no apparent user problems as yet. I've yet to get a snapshot of
    the lwps when it is 'hanging'.

    I poked around Google, and came across an article dealing with thread
    related hangs in JVM, the symptoms of which mirror what our users see;
    the app simply hangs. Unfortunately, I'm lost in the jargon; it seems
    to assume a level of knowledge that I apparently don't have. It does
    speak of a patch 108993 that we have on our test server, but not on
    production. I'll investigate that and apply next available maint.
    window (or reboot, whichever comes first).

    Can anyone tell me, in plain english, whether or not 380 lpws on a
    server with 4 CPUs, and only @ 115 running procceses is OK? With
    rewgard to the java processes having 25 lwps, is there any rule of
    thumb for the max # of lwps a process SHOULD have? Any other pointers
    for identifying our issue?

    Thanks in advance for reading this far, and double thanks if you can
    shed any light, or point me in the right direction.

    Sincerely;

    Joe D.


  2. Re: lwps; how many are too many?

    You might want to check the network connection on the server to see how
    many established, idle or sync... I remembered I had a similar issue
    with Sun ldap server and fixed it with a patch installation.

    BT



    Joe D. wrote:
    > Fellow admins; please help with troubleshooting an issue:
    >
    > I have a problem intermittently on a production server (Sun 429R with 4
    > X 450 CPUs and 4 GB ram).The server is running Oracle Applications,
    > with Java and Apache. At issue, is that the Oracle applications users
    > will report that the server is hung; they cannot perform any tasks, and
    > it appears that there is no connection to the DB.
    >
    > In each and every case of this occurring, the server appears to be
    > fine; I've been running top to get info, coupled with a script I
    > cobbled together to gather some stats as to what's going on right at
    > that moment some iostat, vmstat, df -k, mount, etc.. The top info shows
    > that we have plenty of memory, very little usr stats, no (or VERY
    > little) i/o wait, and very high CPU idle. The other stats I gather at
    > the time bears this out. The default sar stats show nothing out of the
    > ordinary.
    >
    > In every case, a reboot takes care of the issue for a while, until the
    > next occurrance. I know...I know; find and fix the problem, don't
    > simply reboot. Unfortunately, I'm inheriting this problem; this was the
    > only solution that the previous admin seemed to use. Since this is our
    > production server, management and the user community want it back
    > available NOW, and nevermind the troubleshooting. To quote Yossarian;
    > "That's some catch, that catch-22".
    >
    > The database it connects to is on another server, connectivity to it is
    > never an issue; the DB all looks OK, and remains available for direct
    > login.
    >
    > I've been doing some looking into vmstat, (which seems to be more
    > powerful than top), and am curious about the lightweight Processes
    > (lwps) statistics. Currently, the system has @380 lpws, 237 of them
    > owned by oracle. The remainder are owned by root and daemon.
    >
    > Of all the processes, it appears that the 2 of the 3 java processes
    > have the most lwps assigned to them; 25 for one PID and 14 for
    > another). The remainder of the processes all have onsey-twosey, a
    > couple have 4 or 5. Again, these are on the currently running server,
    > with no apparent user problems as yet. I've yet to get a snapshot of
    > the lwps when it is 'hanging'.
    >
    > I poked around Google, and came across an article dealing with thread
    > related hangs in JVM, the symptoms of which mirror what our users see;
    > the app simply hangs. Unfortunately, I'm lost in the jargon; it seems
    > to assume a level of knowledge that I apparently don't have. It does
    > speak of a patch 108993 that we have on our test server, but not on
    > production. I'll investigate that and apply next available maint.
    > window (or reboot, whichever comes first).
    >
    > Can anyone tell me, in plain english, whether or not 380 lpws on a
    > server with 4 CPUs, and only @ 115 running procceses is OK? With
    > rewgard to the java processes having 25 lwps, is there any rule of
    > thumb for the max # of lwps a process SHOULD have? Any other pointers
    > for identifying our issue?
    >
    > Thanks in advance for reading this far, and double thanks if you can
    > shed any light, or point me in the right direction.
    >
    > Sincerely;
    >
    > Joe D.



  3. Re: lwps; how many are too many?


    Joe D. wrote:
    > Fellow admins; please help with troubleshooting an issue:

    [ snip]
    > I've been doing some looking into vmstat, (which seems to be more
    > powerful than top), and am curious about the lightweight Processes
    > (lwps) statistics. Currently, the system has @380 lpws, 237 of them
    > owned by oracle. The remainder are owned by root and daemon.
    >
    > Of all the processes, it appears that the 2 of the 3 java processes
    > have the most lwps assigned to them; 25 for one PID and 14 for
    > another). The remainder of the processes all have onsey-twosey, a
    > couple have 4 or 5. Again, these are on the currently running server,
    > with no apparent user problems as yet. I've yet to get a snapshot of
    > the lwps when it is 'hanging'.

    [snip]
    > Can anyone tell me, in plain english, whether or not 380 lpws on a
    > server with 4 CPUs, and only @ 115 running procceses is OK? With
    > rewgard to the java processes having 25 lwps, is there any rule of
    > thumb for the max # of lwps a process SHOULD have? Any other pointers
    > for identifying our issue?


    380 lwps (threads) on your system is not excessive.
    25 threads in a single process is also not excessive.
    The problem must be a bug somewhere. The sheer number
    of threads/lwps you have is not the problem.

    A thousand or more threads in a single process would
    definitely be excessive, as would thousands of processes.
    I have run tests with tens of thousands of threads on sufficiently
    large machines with no problems with Solaris, but I don't
    recommend programming real applications that way.

    Roger Faulkner
    Sun Microsystems


  4. Re: lwps; how many are too many?

    [Followup set to comp.unix.solaris]

    On 2006-03-27, Joe D. wrote:

    > In each and every case of this occurring, the server appears to be
    > fine; I've been running top to get info, coupled with a script I
    > cobbled together to gather some stats as to what's going on right at
    > that moment some iostat, vmstat, df -k, mount, etc.. The top info shows
    > that we have plenty of memory, very little usr stats, no (or VERY
    > little) i/o wait, and very high CPU idle. The other stats I gather at
    > the time bears this out. The default sar stats show nothing out of the
    > ordinary.
    >
    > In every case, a reboot takes care of the issue for a while, until the
    > next occurrance. I know...I know; find and fix the problem, don't
    > simply reboot. Unfortunately, I'm inheriting this problem; this was the
    > only solution that the previous admin seemed to use. Since this is our
    > production server, management and the user community want it back
    > available NOW, and nevermind the troubleshooting. To quote Yossarian;
    > "That's some catch, that catch-22".
    >
    > The database it connects to is on another server, connectivity to it is
    > never an issue; the DB all looks OK, and remains available for direct
    > login.


    Try sending a SIGQUIT to the JVM; this should cause it to write a
    thread dump to stdout (you will have to find out where that is going
    yourself), which will tell you exactly what it is up to.

    Ceri
    --
    That must be wonderful! I don't understand it at all.
    -- Moliere

+ Reply to Thread