Two CPUs permanently 100% wait - Aix

This is a discussion on Two CPUs permanently 100% wait - Aix ; Since weeks our AIX machine, which runs Oracle, shows 2 CPU's permanently 100% wait. The other CPU's show 'normal' behaviour. The System seems to run fine but i wonder whats causing this. Could it be cause we're our of memory? ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Two CPUs permanently 100% wait

  1. Two CPUs permanently 100% wait

    Since weeks our AIX machine, which runs Oracle, shows 2 CPU's
    permanently 100% wait. The other CPU's show 'normal' behaviour.
    The System seems to run fine but i wonder whats causing this.
    Could it be cause we're our of memory?
    What things could we check to make sure everythng is ok ?
    The disks seem to be ok, there are no disks who are always busy or the
    like ..

    This is what Topas says:
    nmon v9a [H for help] Hostname=rz101e0 Refresh=2.0secs 09:12.11
    RS/6000 & pSeries Details
    Hardware-Type(NIM)=CHRP=Common H/W Reference Platform Bus-Type=PCI
    Logical partition=Dynamic
    CPU Architecture=PowerPC Implementation=RS64-IV or POWER4, 64 bit
    Machine has 16 CPUs (7 CPUs activated)
    CPU Level 1 Cache is Combined Instruction=65536 bytes & Data=32768
    bytes
    Level 2 Cache size=1572864
    AIX 5.2.0.63 Kernel=Multi-Processor 64 bit
    uname=rz101e0 hostname=rz101e0
    CPU Utilisation
    +-------------------------------------------------+
    CPU User% Sys% Wait% Idle|0 |25 |50 |
    75 100|
    0 16.5 10.5 2.5 70.5|
    UUUUUUUUsssssW >
    1 28.5 7.0 5.0 59.5|
    UUUUUUUUUUUUUUsssWW >
    2 26.5 7.0 3.5 63.0|
    UUUUUUUUUUUUUsssW > |
    3 24.4 13.9 2.5 59.2|
    UUUUUUUUUUUUssssssW >
    4 20.0 10.0 8.5 61.5|
    UUUUUUUUUUsssssWWWW >
    5 0.0 0.0 100.0 0.0|
    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
    6 0.0 0.0 100.0 0.0|
    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>

    +-------------------------------------------------+
    33.4 8.1 3.8 54.7|
    UUUUUUUUUUUUUUUUssssW >

    +-------------------------------------------------+
    Memory Use Physical Virtual Paging pages/sec In Out VM
    parameters
    % Used 99.8% 62.1% to Paging Space 6.0 0.0
    numperm 11.4%
    % Free 0.2% 37.9% to File System 43.9 215.0
    minperm 4.8%
    MB Used 21981.6MB 7315.8MB Page Scans 1445.4
    maxperm 14.4%
    MB Free 34.3MB 4460.2MB Page Cycles 0.0
    minfree 480
    Total(MB) 22016.0MB 11776.0MB Page Reclaim 0.0
    maxfree 512
    Network I/O
    I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv
    Trans
    en0 3032.4 3375.6 5028.2 3958.0 617.5 873.3 25719.7
    16882.9
    en1 0.0 0.0 0.5 0.0 48.0 0.0
    1.2 2.2
    en2 0.0 0.0 0.0 0.0 0.0 0.0
    0.2 0.0
    en3 0.9 0.2 9.0 2.0 103.4 118.0
    11.0 1.8
    lo0 318.1 318.1 189.6 189.6 1718.2 1718.2
    426.7 426.8
    Disk I/O all data is Kbytes per second
    DiskName Busy Read Write |0 |25 |50 |
    75 100|
    hdisk0 13% 8.0 111.8KB|
    WWWWWWWR |R
    hdisk1 19% 16.0 111.8KB|
    WWWWWWWWWRR |
    hdisk220 2% 0.0 96.0KB|W
    hdisk274 1% 0.0 69.6KB|W
    hdisk328 2% 0.0 60.9KB|W
    hdisk7 1% 20.0 126.0KB|WR
    hdisk58 2% 0.0 117.7KB|W
    hdisk451 1% 16.0 0.0KB|R
    hdisk76 2% 29.9 93.8KB|WR
    hdisk112 2% 0.0 129.5KB|W
    hdisk166 2% 0.0 104.0KB|W
    hdisk169 1% 0.0 64.6KB|W
    hdisk184 1% 0.0 70.1KB|W
    hdisk510 3% 8.0 0.0KB|RR


  2. Re: Two CPUs permanently 100% wait

    echo "th -w WPGIN" | kdb

    Then run filemon to see what your disk subsystems are doing and fix
    the issue or your code ..

    HTH
    Mark Taylor



  3. Re: Two CPUs permanently 100% wait

    On Oct 18, 3:21 am, Christoph wrote:
    > Since weeks our AIX machine, which runs Oracle, shows 2 CPU's
    > permanently 100% wait. The other CPU's show 'normal' behaviour.
    > The System seems to run fine but i wonder whats causing this.
    > Could it be cause we're our of memory?
    > What things could we check to make sure everythng is ok ?
    > The disks seem to be ok, there are no disks who are always busy or the
    > like ..
    >
    > This is what Topas says:
    > nmon v9a [H for help] Hostname=rz101e0 Refresh=2.0secs 09:12.11
    > RS/6000 & pSeries Details
    > Hardware-Type(NIM)=CHRP=Common H/W Reference Platform Bus-Type=PCI
    > Logical partition=Dynamic
    > CPU Architecture=PowerPC Implementation=RS64-IV or POWER4, 64 bit
    > Machine has 16 CPUs (7 CPUs activated)
    > CPU Level 1 Cache is Combined Instruction=65536 bytes & Data=32768
    > bytes
    > Level 2 Cache size=1572864
    > AIX 5.2.0.63 Kernel=Multi-Processor 64 bit
    > uname=rz101e0 hostname=rz101e0
    > CPU Utilisation
    > +-------------------------------------------------+
    > CPU User% Sys% Wait% Idle|0 |25 |50 |
    > 75 100|
    > 0 16.5 10.5 2.5 70.5|
    > UUUUUUUUsssssW >
    > 1 28.5 7.0 5.0 59.5|
    > UUUUUUUUUUUUUUsssWW >
    > 2 26.5 7.0 3.5 63.0|
    > UUUUUUUUUUUUUsssW > |
    > 3 24.4 13.9 2.5 59.2|
    > UUUUUUUUUUUUssssssW >
    > 4 20.0 10.0 8.5 61.5|
    > UUUUUUUUUUsssssWWWW >
    > 5 0.0 0.0 100.0 0.0|
    > WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
    > 6 0.0 0.0 100.0 0.0|
    > WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW>
    >
    > +-------------------------------------------------+
    > 33.4 8.1 3.8 54.7|
    > UUUUUUUUUUUUUUUUssssW >
    >
    > +-------------------------------------------------+
    > Memory Use Physical Virtual Paging pages/sec In Out VM
    > parameters
    > % Used 99.8% 62.1% to Paging Space 6.0 0.0
    > numperm 11.4%
    > % Free 0.2% 37.9% to File System 43.9 215.0
    > minperm 4.8%
    > MB Used 21981.6MB 7315.8MB Page Scans 1445.4
    > maxperm 14.4%
    > MB Free 34.3MB 4460.2MB Page Cycles 0.0
    > minfree 480
    > Total(MB) 22016.0MB 11776.0MB Page Reclaim 0.0
    > maxfree 512
    > Network I/O
    > I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv
    > Trans
    > en0 3032.4 3375.6 5028.2 3958.0 617.5 873.3 25719.7
    > 16882.9
    > en1 0.0 0.0 0.5 0.0 48.0 0.0
    > 1.2 2.2
    > en2 0.0 0.0 0.0 0.0 0.0 0.0
    > 0.2 0.0
    > en3 0.9 0.2 9.0 2.0 103.4 118.0
    > 11.0 1.8
    > lo0 318.1 318.1 189.6 189.6 1718.2 1718.2
    > 426.7 426.8
    > Disk I/O all data is Kbytes per second
    > DiskName Busy Read Write |0 |25 |50 |
    > 75 100|
    > hdisk0 13% 8.0 111.8KB|
    > WWWWWWWR |R
    > hdisk1 19% 16.0 111.8KB|
    > WWWWWWWWWRR |
    > hdisk220 2% 0.0 96.0KB|W
    > hdisk274 1% 0.0 69.6KB|W
    > hdisk328 2% 0.0 60.9KB|W
    > hdisk7 1% 20.0 126.0KB|WR
    > hdisk58 2% 0.0 117.7KB|W
    > hdisk451 1% 16.0 0.0KB|R
    > hdisk76 2% 29.9 93.8KB|WR
    > hdisk112 2% 0.0 129.5KB|W
    > hdisk166 2% 0.0 104.0KB|W
    > hdisk169 1% 0.0 64.6KB|W
    > hdisk184 1% 0.0 70.1KB|W
    > hdisk510 3% 8.0 0.0KB|RR


    I'm not sure that this is really a problem. As I understand it the AIX
    dispatcher tries to always redispatch a thread on the same CPU it
    last ran on. The obverse of this, on a lightly loaded system such
    as yours, is that CPU's which haven't had threads dispatched onto
    should tend to stay that way. It looks like you're over-configured
    in the CPU department. This is a good thing!

    Regards,
    Jim Lane


  4. Re: Two CPUs permanently 100% wait

    > I'm not sure that this is really a problem. As I understand it the AIX
    > dispatcher tries to always redispatch a thread on the same CPU it
    > last ran on.


    This is true, its called processor affinity and if you can get back
    onto the same run queue then that is less costly than another run
    queue.. but I am not sure that is a solution.

    I can see that hdisk0 and hdisk1 are busy writing and there are pages
    in from paging space .. there are also filesystem writes / reads so
    you could have a jfs locking issue .. run those commands I pasted
    above and filemon to give you the info you need to track down the
    culprit and if it really is an issue etc .. you may have a couple of
    procs/apps doing tiny writes to the same file in /tmp .. you just wont
    know until you trace it a little further.

    Rgds
    Mark Taylor


+ Reply to Thread