CPU pegged; any suggestions?
I hav a 280R, 6 GB ram and 2 900 MHz processors, running Solaris 8 and
bea weblogic as a webserver. The web developer recently increased the
memory allotted to the webserver, as we had added memory to the machine
a while back. Prior to that increase, CPU stats showed very little in
the way of activity; I'm talking 25 - 30% usr at most for very brief
periods; little kernel activity, and the rest idle.
Now, since the web memory increase, we're experiencing very HIGH usr
stats; averaging 95 - 99% usr, the remaining in kernel and idle;
sometimes it will shift to maybe 80/20 usr/kernel, but not for long.
I've been trying to see if there's a bottleneck somewhere; the usual
tools don't show anything. I found the following link
checked through all the OS related things. iostat shows good response
time on the disk (very little I/O on this machine, in fact). Netstat
shows no collisions.
There were some suggestions as to the OS kernel param tuning; here's
what I have on the server now.
The below are the recommendations per the article in the link above
with the exception of the tcp_conn_hash_size (32768 vs. recommended
panda (/) # tail /etc/system
* End MDD database info (do not edit)
set rlim_fd_cur = 8192
set rlim_fd_max = 8182
set tcp:tcp_conn_hash_size = 32768
set autoup = 900
set tune_t_fsflushr = 1
The below are the current tcp settings; I'm not sure if our server
qualifies as a 'high throughput java server' that the tuning article
panda (/) # ndd -get /dev/tcp tcp_xmit_lowat 4096
panda (/) # ndd -get /dev/tcp tcp_xmit_hiwat 16384
panda (/) # ndd -get /dev/tcp tcp_recv_hiwat 24576
panda (/) # ndd -get /dev/tcp tcp_cwnd_max 1048576
panda (/) # ndd -get /dev/tcp tcp_rexmit_interval_min 400
panda (/) # ndd -get /dev/tcp tcp_rexmit_interval_max 60000
panda (/) # ndd -get /dev/tcp tcp_rexmit_interval_initial 3000
panda (/) # ndd -get /dev/tcp tcp_time_wait_interval 240000
panda (/) # ndd -get /dev/tcp tcp_keepalive_interval 7200000
panda (/) # ndd -get /dev/tcp tcp_conn_req_max_q 128
panda (/) # ndd -get /dev/tcp tcp_conn_req_max_q0 1024
panda (/) # ndd -get /dev/tcp tcp_ip_abort_interval 480000
panda (/) # ndd -get /dev/tcp tcp_smallest_anon_port 32768
Adding to my general confusion is the comment in said article that "A
well-tuned application under full load (0% idle) should fall within 80%
to 90% usr, and 20% to 10% sys times, respectively. A smaller
percentage value for sys reflects more time for user code and fewer
preemptions, which result in greater throughput for a Java
I guess I should state that there have been no complaints of response
time, so I'm wondering if I should be concerned at all. From my own
personal POV as the administrator, I'd like to see the user stats in
the 80s. I'm noticing very slow response out of administrative commands
like lpstat -o; everything else seems fine on the server. I'm going to
have the web admin reduce the amount of memory she allocated to the
processes to see if it brings it in line with 80 - 90% usr CPU usage.
I guess the bottom line question it; should I be concerned? Do we
simply have a (by accitent) VERY well tuned server? (I doubt it). Has
anyone run into this? Any sugestions appreciated.
Re: CPU pegged; any suggestions?
If you're seeing no degredation in performance, you're likely
well-tuned. Did you recently adjust file descriptor limits (rlim_fd_cur
and rlim_fd_max)? This would cause an increase in cpu use as you allow
more connections from the network to use the web server simultaneously.
If you've got low i/o wait, kernel and sys numbers and some spare idle,
running in the 80s just means you've throttled out your web server to
fully utilize system resources. You can tune it back some if you find
your command responsiveness too slow. If you reduce max connections via
the web server config or through the file descriptor limits in
/etc/system. But the higher utilization, if everything is running
right, normally translates into faster web server performance.
Re: CPU pegged; any suggestions?
thanks for the reply. I didn't change any of the file descriptors
(oddly enough, tho, was looking into that for another reason. I must
have been on the right track! : ) ...
Apparently, the issue was due to a setting on the web side; the web
admin had increased the size of the maximum message size in the web
server, withough a corresponding adjustment on the OS side. So
essintially, as I understand it, the web server was trying to force 5
lbs. of s**t into a 2 lb. bag (one of my favorite metaphors; couldn't
resist trotting it out here).
She decided to lower the size, and my stats are now well in line.
Hopefully, she and I can work together to avoid such mis-matches in the
Thanks again for posting, and the advice. This newsgroup is the best!