current cpu usage - Aix

This is a discussion on current cpu usage - Aix ; Hello, I have very stupid question... For every unix I worked with, I haven't got this problem - now I do :-) How can I check what process _now_ consumes most CPU on machine? There are many of them - ...

+ Reply to Thread
Results 1 to 18 of 18

Thread: current cpu usage

  1. current cpu usage

    Hello,

    I have very stupid question... For every unix I worked with,
    I haven't got this problem - now I do :-)

    How can I check what process _now_ consumes most CPU on machine?

    There are many of them - topas, nmon, ps - but all of them
    shows the CPU usage but from the startup of the machine.

    This is not what I want. I have 100% CPU usage (which I can
    see using sar, nmon etc for example) but I don't know,
    which process generate this load

    regards,

    Sierp

  2. Re: current cpu usage

    On Jul 12, 12:11 pm, Sierp wrote:
    > Hello,
    >
    > I have very stupid question... For every unix I worked with,
    > I haven't got this problem - now I do :-)
    >
    > How can I check what process _now_ consumes most CPU on machine?
    >
    > There are many of them - topas, nmon, ps - but all of them
    > shows the CPU usage but from the startup of the machine.
    >
    > This is not what I want. I have 100% CPU usage (which I can
    > see using sar, nmon etc for example) but I don't know,
    > which process generate this load
    >
    > regards,
    >
    > Sierp


    You can still use topas etc.. - the culprit will be the process whose
    total consumed CPU time is growing fastest.

    Although, another explanation that is incredibly unlikely is this;

    http://www.cs.huji.ac.il/~dants/pape...07Security.pdf



  3. Re: current cpu usage

    Ok, they can all display current % of total CPU, so are you saying
    that when running "topas" or nmon ("t" subcommand) that you still
    cannot see what proc is chewing your CPU ?

    You may have a run away shell script is a tight loop and the PIDs are
    being forked/execed/exiting before any data is able to make it into
    one of those tools, try pprof to generate a report and look in the
    pprof.famind to see if this is the case. Its nicley tabulated so you
    can trace a PID back via all its PPIDs to the top level.

    http://publib.boulder.ibm.com/infoce...mds4/pprof.htm

    HTH
    Mark Taylor


  4. Re: current cpu usage

    MattWarren pisze:
    > You can still use topas etc.. - the culprit will be the process whose
    > total consumed CPU time is growing fastest.


    belive me or not, I tried it

    In man of topas I can read

    % CPU Utilization
    The average CPU utilization of the process over the
    monitoring interval. The
    first time a process is shown, this value is the
    average CPU utilization over
    the lifetime of the process.

    but I have 100% of CPU usage for more than 3 hours, and I can't
    tell, which process is doing it. The biggest value in topas
    is less then 2%

    Sierp

  5. Re: current cpu usage

    On Jul 12, 2:40 pm, Sierp wrote:
    ....
    > tell, which process is doing it. The biggest value in topas
    > is less then 2%
    >
    > Sierp


    Please provide an output of a

    $ vmstat 1 15

    Hajo





  6. Re: current cpu usage

    Sierp writes:

    > MattWarren pisze:
    >> You can still use topas etc.. - the culprit will be the process whose
    >> total consumed CPU time is growing fastest.

    >
    > belive me or not, I tried it
    >
    > In man of topas I can read
    >
    > % CPU Utilization
    > The average CPU utilization of the process over the
    > monitoring interval. The
    > first time a process is shown, this value is the
    > average CPU utilization over
    > the lifetime of the process.
    >
    > but I have 100% of CPU usage for more than 3 hours, and I can't
    > tell, which process is doing it. The biggest value in topas
    > is less then 2%
    >
    > Sierp


    Do you have APAR IY90522 installed ? ( It's delivered in 5.3 ML 05 SP 05 ).

    It's notional title is "topas doesn't calculate CPU percentage per
    process properly".

    --
    #include /* I don't speak for IBM ... */
    /* Heck, I don't even speak for myself */
    /* Don't believe me ? Ask my wife :-) */
    Richard D. Latham lathamr@us.ibm.com

  7. Re: current cpu usage

    Richard D. Latham pisze:
    > Do you have APAR IY90522 installed ? ( It's delivered in 5.3 ML 05 SP 05 ).
    >
    > It's notional title is "topas doesn't calculate CPU percentage per
    > process properly".


    yes, I do
    My AIX's are 5300-05-05

    regards,

    Sierp

  8. Re: current cpu usage

    Hajo Ehlers pisze:
    > Please provide an output of a
    >
    > $ vmstat 1 15


    currently the load is lower, so I can't check it right now.
    But for example I have todays sar (every 10 minutes)

    00:00:01 %usr %sys %wio %idle physc %entc
    [...]
    06:10:03 60 40 0 0 0.10 99.9
    06:20:03 59 40 0 0 0.10 99.9
    06:30:05 59 40 0 0 0.10 99.9
    06:40:02 51 46 0 2 0.10 99.9
    06:50:02 46 50 0 4 0.10 99.9
    07:00:03 46 50 0 4 0.10 99.9
    07:10:01 50 47 0 2 0.10 99.9
    07:20:03 50 48 0 3 0.10 99.9
    07:30:01 53 46 0 2 0.10 99.9
    07:40:01 49 48 0 3 0.10 99.9
    07:50:02 51 47 0 2 0.10 99.9
    08:00:02 47 50 0 3 0.10 99.9
    08:10:03 59 41 0 0 0.10 99.9
    08:20:04 55 44 0 1 0.10 99.9
    08:30:04 55 45 0 1 0.10 99.9
    08:40:03 53 46 0 1 0.10 99.9
    08:50:01 52 46 0 1 0.10 99.9
    09:00:03 51 47 0 2 0.10 99.9
    09:10:01 53 46 0 2 0.10 99.9

    regards,

    Sierp

  9. Re: current cpu usage

    Hajo Ehlers pisze:
    >> tell, which process is doing it. The biggest value in topas
    >> is less then 2%

    >
    > Please provide an output of a
    >
    > $ vmstat 1 15


    ok, problem recreated

    System configuration: lcpu=2 mem=4096MB ent=0.10

    kthr memory page faults cpu
    ----- ----------- ------------------------ ------------ -----------------------
    r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
    6 0 752486 75094 0 0 0 0 0 0 23 12874 510 61 39 0 0 0.10 100.3
    5 0 751251 76329 0 0 0 0 0 0 45 9937 520 59 40 1 0 0.10 99.5
    8 0 757150 70430 0 0 0 0 0 0 20 7478 407 48 52 0 0 0.10 99.8
    8 0 757527 70051 0 0 0 0 0 0 60 9758 513 38 62 0 0 0.10 99.9
    5 0 757726 69852 0 0 0 0 0 0 17 9241 378 43 57 0 0 0.10 99.9
    6 0 751607 75971 0 0 0 0 0 0 16 13350 387 56 43 0 0 0.10 99.9
    12 0 756936 70641 0 0 0 0 0 0 25 9173 496 49 51 0 0 0.10 99.9
    7 0 761475 66102 0 0 0 0 0 0 19 6526 432 31 68 0 0 0.10 99.8
    8 0 764562 63015 0 0 0 0 0 0 50 5036 552 51 49 0 0 0.10 99.9
    12 0 763980 63597 0 0 0 0 0 0 30 4593 507 37 63 0 0 0.10 99.9
    10 0 763937 63640 0 0 0 0 0 0 18 11010 435 45 55 0 0 0.10 99.9
    7 0 758608 68967 0 0 0 0 0 0 13 4427 391 46 54 0 0 0.10 99.9
    6 0 752040 75535 0 0 0 0 0 0 23 12348 493 59 41 0 0 0.10 99.9
    7 0 754525 73050 0 0 0 0 0 0 17 10439 444 48 52 0 0 0.10 99.9
    7 0 757294 70280 0 0 0 0 0 0 22 13114 467 43 57 0 0 0.10 99.9


    when I'm looking at nmon I can see (a little of garbage)

    ┌─nmon────────#=PURR Stats───────Host=xxxxxx──── ────Refresh=2 secs───
    │ CPU-Utilisation-Small-View ───────────EntitledCPU= 0.10 UsedCPU= 0.105
    │Logical CPUs 0----------25-----------50----------75----------100

    │CPU User% Sys% Wait% Idle%| | | | |

    │ 0 43.3 56.7 0.0 0.0|UUUUUUUUUUUUUUUUUUUUUsssssssssssssssssssssssss sss>

    │ 1 41.9 58.1 0.0 0.0|UUUUUUUUUUUUUUUUUUUUssssssssssssssssssssssssss sss>

    │EntitleCapacity/VirtualCPU +-----------|------------|-----------|------------+

    │EC+ 65.7 34.3 0.0 0.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss sss|

    │ VP 6.9 3.6 0.0 0.0|UUUs---------------------------------------------|

    │EC= 104.8% VP= 10.5% +--Capped---|------------|-----------100% VP=1 CPU+

    │ Top-Processes-(229) ─────Mode=3 [1=Basic 2=CPU 3=Perf 4=Size 5=I/O 6=Cmds]──
    │ PID %CPU Size Res Res Res Char RAM Paging Command

    │ Used KB Set Text Data I/O Use io other repage

    │ 1368126 1.5 42548 42560 12 42548 885K 1% 0 0 0 emagent

    │ 1568900 1.4 42300 42312 12 42300 849K 1% 0 0 0 emagent

    │ 893050 0.8 97172 82032 66504 15528 4095 2% 0 0 0 oracle

    │ 1564834 0.4 17552 15324 12932 2392 173K 0% 0 326 0 sqlplus

    │ 913508 0.2 27936 16048 14604 1444 535 0% 0 215 0 tnslsnr

    │ 765962 0.2 88464 68328 66504 1824 0 2% 0 0 0 oracle

    │ 921836 0.2 9344 9528 328 9200 1596 0% 0 0 0 nmon_aix53

    │ 1036490 0.2 88516 68980 66504 2476 0 2% 0 0 0 oracle

    │ 962678 0.1 85648 68536 66504 2032 0 2% 0 0 0 oracle

    │ 499790 0.1 68268 34588 76 34512 0 1% 0 0 0 java

    │ 549038 0.1 91196 75476 66504 8972 2690 2% 0 0 0 oracle

    │ 1208488 0.0 20700 7840 6100 1740 0 0% 0 0 0 mstragent

    │ 1118228 0.0 14468 6320 6100 220 0 0% 0 0 0 mstragent
    [...]

    regards,

    Sierp

  10. Re: current cpu usage

    See my post above, run pprof or one of the other cpu trace profilers
    and you will find your culprit ..

    Rgds
    Mark Taylor



  11. Re: current cpu usage

    Mark Taylor pisze:
    > Ok, they can all display current % of total CPU, so are you saying
    > that when running "topas" or nmon ("t" subcommand) that you still
    > cannot see what proc is chewing your CPU ?


    nope

    > You may have a run away shell script is a tight loop and the PIDs are
    > being forked/execed/exiting before any data is able to make it into
    > one of those tools, try pprof to generate a report and look in the
    > pprof.famind to see if this is the case. Its nicley tabulated so you
    > can trace a PID back via all its PPIDs to the top level.
    >
    > http://publib.boulder.ibm.com/infoce...mds4/pprof.htm


    is there any documents describing how to read this outputs? ;-)

    Sierp

  12. Re: current cpu usage

    > >http://publib.boulder.ibm.com/infoce...ndex.jsp?topic...
    >
    > is there any documents describing how to read this outputs? ;-)
    >
    > Sierp


    There is a little bit of info here, but from the report you should be
    able to work it out from the headings in each report (if you
    understand process metrics that is)

    http://publib.boulder.ibm.com/infoce...easure_cpu.htm

    So, you may want to start here

    http://publib.boulder.ibm.com/infoce...d/cpu_perf.htm

    HTH
    Mark Taylor


  13. Re: current cpu usage

    Also, check this redbook out

    http://www.redbooks.ibm.com/abstract...6039.html?Open

    Rgds
    Mark Taylor


  14. Re: current cpu usage

    Mark Taylor pisze:
    >>> http://publib.boulder.ibm.com/infoce...ndex.jsp?topic...

    >> is there any documents describing how to read this outputs? ;-)

    >
    > There is a little bit of info here, but from the report you should be
    > able to work it out from the headings in each report (if you
    > understand process metrics that is)


    well... not that easy :-)
    of course, I can see much more I've seen before.

    ie I can see something like this


    1.064 93.546 0.526 663622 1 721021 0 AA 2 .. tnslsnr
    1.109 1.200 0.008 966868 663622 2269199 721021 FX 3 ... tnslsnr
    1.129 1.130 0.001 995356 966868 2236659 2269199 FE 4 .... tnslsnr
    1.139 1.819 0.056 995356 966868 2236659 2269199 EX 4 .... oracleDB
    4.848 4.938 0.007 995358 663622 2236661 721021 FX 3 ... tnslsnr
    4.868 4.878 0.001 950504 995358 2171127 2236661 FE 4 .... tnslsnr
    4.898 5.598 0.056 950504 995358 2171127 2236661 EX 4 .... oracleDB
    5.239 5.278 0.003 995360 663622 2236663 721021 FX 3 ... tnslsnr
    5.259 5.269 0.001 966870 995360 2269201 2236663 FE 4 .... tnslsnr
    5.348 5.948 0.051 966870 995360 2269201 2236663 EX 4 .... oracleDB
    5.398 95.438 0.003 663622 1 1601569 0 AA 2 .. tnslsnr
    6.858 6.888 0.003 950506 663622 2171129 721021 FX 3 ... tnslsnr
    6.878 6.888 0.001 995362 950506 2236665 2171129 FE 4 .... tnslsnr
    6.898 8.157 0.062 995362 950506 2236665 2171129 EX 4 .... oracleDB
    7.078 7.108 0.003 950508 663622 2171131 721021 FX 3 ... tnslsnr
    7.098 7.108 0.001 938232 950508 2011295 2171131 FE 4 .... tnslsnr
    7.118 8.158 0.060 938232 950508 2011295 2171131 EX 4 .... oracleDB
    7.658 7.698 0.004 966872 663622 2269203 721021 FX 3 ... tnslsnr
    7.688 7.698 0.001 950510 966872 2171133 2269203 FE 4 .... tnslsnr
    7.707 8.647 0.064 950510 966872 2171133 2269203 EX 4 .... oracleDB
    8.228 8.258 0.005 966874 663622 2269205 721021 FX 3 ... tnslsnr
    8.248 8.258 0.001 921796 966874 2330841 2269205 FE 4 .... tnslsnr
    8.277 9.048 0.065 921796 966874 2330841 2269205 EX 4 .... oracleDB
    10.747 10.787 0.003 938236 663622 2339051 721021 FX 3 ... tnslsnr

    Is this normal for tnslsnr?
    Also, a lot of this

    19.193 83.329 0.023 819374 786524 1564687 0 AA 1 . ksh
    19.194 19.205 0.001 647246 819374 2191425 1564687 FE 2 .. ksh
    19.214 19.225 0.002 647246 819374 2191425 1564687 EX 2 .. date
    19.225 19.254 0.002 647248 819374 2191427 1564687 FE 2 .. ksh
    19.235 19.245 0.001 995368 647248 2339059 2191427 FE 3 ... ksh
    19.245 19.265 0.010 937988 647248 2170881 2191427 FE 3 ... ksh
    19.254 19.255 0.002 995368 647248 2339059 2191427 EX 3 ... cat
    19.274 19.285 0.002 937988 647248 2170881 2191427 EX 3 ... grep
    19.275 19.295 0.002 647248 819374 2191427 1564687 EX 2 .. awk
    19.304 19.335 0.001 647250 819374 2191429 1564687 FE 2 .. ksh
    19.305 19.315 0.001 950514 647250 2142431 2191429 FE 3 ... ksh
    19.324 19.335 0.003 950514 647250 2142431 2191429 EX 3 ... env
    19.325 19.335 0.001 966884 647250 1126609 2191429 FE 3 ... ksh
    19.335 19.375 0.005 647250 819374 2191429 1564687 EX 2 .. awk
    19.344 19.355 0.001 966884 647250 1126609 2191429 EX 3 ... grep
    19.375 19.405 0.002 647252 819374 2191431 1564687 FE 2 .. ksh
    19.385 19.395 0.001 999658 647252 2146359 2191431 FE 3 ... ksh
    19.395 19.415 0.001 946208 647252 2302199 2191431 FE 3 ... ksh
    19.405 19.405 0.000 999658 647252 2146359 2191431 EX 3 ... env
    19.414 19.454 0.004 647252 819374 2191431 1564687 EX 2 .. awk
    19.435 19.435 0.000 946208 647252 2302199 2191431 EX 3 ... grep

    so some shell script is running I guess... the question is - is there a real
    problem? Few shell commands is not that bad. There is a lot of them,
    but not all the time - every 20seconds for 1,5s

    regards,

    Sierp

  15. Re: current cpu usage

    How many of those shell script paragraphs can you see in the
    pprof.famind ? The number of dots before the command name indicates
    the level of fork/exec .. do a ps -fp on all the top level PIDs to see
    what has fired them off to give you an indication of what is going
    on .. then if you understand what the scripts do then you can stop/
    kill them one at a time. If it is a shell script firing of fork/execs
    (cut,awk,sed,tr) in a tight loop then its gonna be pretty obviuos as
    the size of the paragraph for your top level PID is gonna be big and
    the levels of fork/exec will also be fairly deep ( 4+ etc ) ..

    The other thing would be smaller parapraghs of PID fork/exec levels
    but more of them, so run pprof for for 15 seconds every minute etc..



  16. Re: current cpu usage

    Mark Taylor pisze:
    > How many of those shell script paragraphs can you see in the
    > pprof.famind ?


    about 60 of them, durin 1s more over
    it runs every 30s

    Thanks for help, I belive I fill have to fight with it by myself now
    The worst thing is, that there is no good doc describing tools like
    this (or I can't find them)

    Usually there is only mentiond what the tool do or what is able to do

    regards,

    Sierp

  17. Re: current cpu usage

    did you check this redbook out ? chap 19 .. has some examples ..

    http://www.redbooks.ibm.com/abstract...6039.html?Open




  18. Re: current cpu usage

    > about 60 of them, durin 1s more over
    > it runs every 30s


    yeah, just take each one and work it back up via the parent pid and
    you should find your issue root cause.



+ Reply to Thread