AIX Oracle Slowness Issue - Aix

This is a discussion on AIX Oracle Slowness Issue - Aix ; Hello, I'm coming across this odd issue where every so often (really sporadic), my p5 510 running 1.9Ghz 1-way with 5GB of RAM on AIX 5.3 ML04 will run in a super slow state. It always happens on a weekday, ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: AIX Oracle Slowness Issue

  1. AIX Oracle Slowness Issue

    Hello,

    I'm coming across this odd issue where every so often (really
    sporadic), my p5 510 running 1.9Ghz 1-way with 5GB of RAM on AIX 5.3
    ML04 will run in a super slow state. It always happens on a weekday,
    always between 9:15AM - 10:00AM, and the only fix seems to be a reboot.
    I try to run commands during this state but the response is too long
    for me to wait since this is a Production machine. I run my commands
    from the console too, but still takes too long. I can ping the server,
    but my oracle apps can't connect.

    I can't really tell if the system is 'Thrashing', but isn't that a
    tell-tale sign of a system that takes forever to respond?

    The SGA size is 1.8GB.

    I am using CIO mounted filesystems for Oracle. Since I'm using CIO,
    isn't tweaking the vmo values moot?

    I did do a test yesterday w/ our DBA and had him run several SQL
    statements that ran reads/writes to our Oracle Database and there were
    high periods of 'po' values in vmstat when ran all at the same time.

    In the /etc/security/limits file, here is what I have:

    oracle:
    fsize = -1
    data = -1
    rss = -1
    stack = -1
    cpu = -1
    nofiles = -1

    This pretty much allows oracle user unlimited usage of resources
    including memory. I was told by Oracle to do this. What do you guys
    think of putting a hard limit on rss_hard instead so oracle can't usurp
    all of the memory.


    Here are some statistics of the system during window I've specified
    above.

    # vmstat -I 2

    kthr memory page faults cpu

    -------- ----------- ------------------------ ------------ -----------

    r b p avm fre fi fo pi po fr sr in sy cs us sy id
    wa
    1 0 0 1269791 2305 271 25 150 0 183 5736 281 2081 616 22 3 37
    39
    0 1 0 1269826 3601 250 73 174 60 1128 32260 400 4178 920 17 5 37
    41
    2 1 0 1264021 8311 410 6 137 0 0 0 285 2690 729 17 2 39 41

    0 1 0 1262058 9109 410 32 154 0 0 0 341 2006 737 14 3 41 43

    1 1 0 1265977 4259 327 4 138 0 0 0 270 2932 594 22 3 37 38

    0 1 0 1270081 2322 301 8 162 27 1546 34085 577 4924 1136 17 5 38
    40
    1 1 0 1270085 3823 310 22 164 44 1260 18072 442 3102 957 14 4 40
    42
    1 1 0 1264271 8398 485 26 145 0 0 0 285 1333 648 9 2 43 45

    1 1 0 1262308 9653 206 7 148 0 0 0 256 1993 565 23 2 37 38

    1 0 0 1266088 5085 271 30 104 0 0 0 259 2841 563 39 3 28 31

    2 0 0 1264067 8189 561 11 78 31 1187 23368 352 6659 754 58 6 17
    19
    5 0 0 1272173 3331 801 9 72 574 2266 32042 455 9520 720 42 7 24
    27
    1 1 0 1266288 7477 807 6 62 0 0 0 280 2056 576 35 2 30 32

    1 1 0 1266024 6482 589 5 37 0 0 0 582 6605 1196 51 4 22
    23
    2 0 0 1270404 2423 562 11 45 137 769 5268 648 10259 1271 52 7 20
    21
    0 1 0 1274318 2226 682 6 48 1543 2633 3702 488 4457 585 39 7 27
    28
    0 1 0 1269710 6955 60 17 40 129 129 151 546 9169 1115 28 6 54
    12


    # svmon -G

    size inuse free pin virtual

    memory 1261568 1245228 16340 132874 1188198

    pg space 3637248 254426


    work pers clnt

    pin 132874 0 0

    in use 1024221 0 221007


    PageSize PoolSize inuse pgsp pin virtual

    s 4 KB - 1194956 238202 100378 1122966

    m 64 KB - 3142 1014 2031 4077
    #
    # lsps -a

    Page Space Physical Volume Volume Group Size %Used Active
    Auto Type
    paging00 hdisk3 oraclevg 8192MB 8 yes
    yes lv
    hd6 hdisk1 rootvg 6016MB 7 yes
    yes lv
    #

    # vmo -a
    cpu_scale_memp = 8
    data_stagger_interval = 161
    defps = 1
    force_relalias_lite = 0
    framesets = 2
    htabscale = n/a
    kernel_heap_psize = 4096
    large_page_heap_size = 0
    lgpg_regions = 0
    lgpg_size = 0
    low_ps_handling = 1
    lru_file_repage = 0
    lru_poll_interval = 10
    lrubucket = 131072
    maxclient% = 80
    maxfree = 1088
    maxperm = 968863
    maxperm% = 80
    maxpin = 1018607
    maxpin% = 80
    mbuf_heap_psize = 65536
    memory_affinity = 1
    memory_frames = 1261568
    memplace_data = 2
    memplace_mapped_file = 2
    memplace_shm_anonymous = 2
    memplace_shm_named = 2
    memplace_stack = 2
    memplace_text = 2
    memplace_unmapped_file = 2
    mempools = 1
    minfree = 960
    minperm = 242215
    minperm% = 20
    nokilluid = 0
    npskill = 36608
    npsrpgmax = 292864
    npsrpgmin = 219648
    npsscrubmax = 292864
    npsscrubmin = 219648
    npswarn = 146432
    num_spec_dataseg = 0
    numpsblks = 4685824
    page_steal_method = 0
    pagecoloring = n/a
    pinnable_frames = 1144691
    npsrpgmin = 219648
    npsscrubmax = 292864
    npsscrubmin = 219648
    npswarn = 146432
    num_spec_dataseg = 0
    numpsblks = 4685824
    page_steal_method = 0
    pagecoloring = n/a
    pinnable_frames = 1144691
    pta_balance_threshold = n/a
    relalias_percentage = 0
    rpgclean = 0
    rpgcontrol = 2
    scrub = 0
    scrubclean = 0
    soft_min_lgpgs_vmpool = 0
    spec_dataseg_int = 512
    strict_maxclient = 1
    strict_maxperm = 0
    v_pinshm = 0
    vm_modlist_threshold = -1
    vmm_fork_policy = 1
    vmm_mpsize_support = 1


    I just think we have too little physical memory for our workload, but I
    would love to hear what you guys think and how your AIX/Oracle
    environments are tuned.

    Thanks!


  2. Re: AIX Oracle Slowness Issue

    dawaves schrieb:

    > Hello,
    >
    > I'm coming across this odd issue where every so often (really
    > sporadic), my p5 510 running 1.9Ghz 1-way with 5GB of RAM on AIX 5.3
    > ML04 will run in a super slow state. It always happens on a weekday,
    > always between 9:15AM - 10:00AM, and the only fix seems to be a reboot.
    > I try to run commands during this state but the response is too long
    > for me to wait since this is a Production machine. I run my commands
    > from the console too, but still takes too long. I can ping the server,
    > but my oracle apps can't connect.
    >
    > I can't really tell if the system is 'Thrashing', but isn't that a
    > tell-tale sign of a system that takes forever to respond?


    From
    http://publib16.boulder.ibm.com/pser...d/memperf5.htm
    and according to your vmstat out put

    ....
    If you notice that the system is paging out to paging space, it could
    be that the file repaging rate is higher than the computational
    repaging rate since the number of file pages in memory is below the
    maxperm value. So, in this case we can prevent computational pages from
    being paged out by lowering the maxperm value to something lower than
    the numperm value. Since the numperm value is approximately 36%, we
    could lower the maxperm value down to 30%. Therefore, the page
    replacement algorithm only steals file pages.
    ....

    The enhanced JFS file system uses client pages for its buffer cache,
    which are not affected by the maxperm and minperm threshold values. To
    establish hard limits on enhanced JFS file system cache, you can tune
    the maxclient parameter. This parameter represents the maximum number
    of client pages that can be used for buffer cache. To change this
    value, you can use the vmo -o maxclient command. The value for
    maxclient is shown as a percentage of real memory.

    ....

    Conclusion:
    Run a ' vmstat -v ' and look for the value of numperm and set the
    maxperm to a percentage lower then the numperm value. Also do not
    forget to set maxclient to a value lower or equal to maxperm.

    This should solve the problem while you interact with the system. In
    the next step you should figure out what causes so much file i/o - I am
    not talking about oracle working with its databases on the CIO mounted
    filesystem - are they really cio mounted ? .
    It could be NFS, creating large tempoery file, backups and all stuff
    that reads/writes many and/or large files.

    nmon is a nice and simple tool to check for programs that cause high
    i/o.

    hth
    Hajo


  3. Re: AIX Oracle Slowness Issue

    Hi,

    I think you're right having to less memory.

    But if your SGA just uses 1.8 G, what is using the other 2/3 of it. It
    is not file caching (see the high avm value).

    Try to config wlm in passive mode (see /etc/wlm/current):
    One class for each oracle user.
    One class for each application user (SAP, WebSphere,...).

    Please provide the output of "wlmstat".

    Andy






    On 14 Nov 2006 09:30:34 -0800, "dawaves" wrote:

    >I just think we have too little physical memory for our workload, but I
    >would love to hear what you guys think and how your AIX/Oracle
    >environments are tuned.
    >
    >Thanks!



    --
    Andreas Beckmann
    Andreas.Beckmann@muenster.de
    http://www.muenster.de/~andy

  4. Re: AIX Oracle Slowness Issue

    Sorry, me again.

    Please provide oracle version and output of "ipcs -ma" as well.

    Andy


    On Wed, 15 Nov 2006 08:47:52 +0100, Andreas Beckmann
    wrote:

    >Hi,
    >
    >I think you're right having to less memory.
    >
    >But if your SGA just uses 1.8 G, what is using the other 2/3 of it. It
    >is not file caching (see the high avm value).
    >
    >Try to config wlm in passive mode (see /etc/wlm/current):
    >One class for each oracle user.
    >One class for each application user (SAP, WebSphere,...).
    >
    >Please provide the output of "wlmstat".
    >
    >Andy
    >
    >
    >
    >
    >
    >
    >On 14 Nov 2006 09:30:34 -0800, "dawaves" wrote:
    >
    >>I just think we have too little physical memory for our workload, but I
    >>would love to hear what you guys think and how your AIX/Oracle
    >>environments are tuned.
    >>
    >>Thanks!



    --
    Andreas Beckmann
    Andreas.Beckmann@muenster.de
    http://www.muenster.de/~andy

  5. Re: AIX Oracle Slowness Issue

    Andreas Beckmann wrote:
    > Hi,
    >
    > I think you're right having to less memory.


    I disagree.
    >
    > But if your SGA just uses 1.8 G, what is using the other 2/3 of it. It
    > is not file caching (see the high avm value).


    If computational pages getting paged out then they will use paging
    space. So the avm number will increase.

    The vmstat output from the OP shows clearly:

    Paging occurs( pi/po not equal zero ) BUT free memory DOES NOT goes
    down zero
    and also file cache trashing occurs ( fi/fo not equal zero )
    The min/maxperm and maxclient parameter seems tobe unchanged.

    This means that the machine is mainly configured as a file server and
    not as DB server.
    So imho the OP should change the parameter named in my previous post.

    regards
    Hajo


  6. Re: AIX Oracle Slowness Issue

    dawaves wrote:
    > Hello,
    >
    > I'm coming across this odd issue where every so often (really
    > sporadic), my p5 510 running 1.9Ghz 1-way with 5GB of RAM on AIX 5.3
    > ML04 will run in a super slow state. It always happens on a weekday,
    > always between 9:15AM - 10:00AM, and the only fix seems to be a reboot.
    > I try to run commands during this state but the response is too long
    > for me to wait since this is a Production machine. I run my commands
    > from the console too, but still takes too long. I can ping the server,
    > but my oracle apps can't connect.


    Feel free to follow the advice given by the other guys but also answer
    the following questions:

    1. Has anything changed on the system recently i.e. new fix installed?
    2. Have the Oracle DBA's set any new or changed any jobs in cron that
    run around that time?
    3. Have you added any other cron or at jobs to run around that time?
    4. Do any jobs run over the weekend that are running into Monday and
    hanging i.e. backup jobs or scripts to monitor things such as backup?

    Has this problem always happened or is it recent?

    Thanks

  7. Re: AIX Oracle Slowness Issue

    Hello,

    On the original post:
    "I am using CIO mounted filesystems for Oracle. Since I'm using CIO,
    isn't tweaking the vmo values moot?"

    there are no cronjobs that happen during that time nor do any cronjobs
    fail. Also the DBA's have not done anything tweak wise.

    I think it has something to do with giving "oracle" user "-1" values in
    the /etc/security/limits file for pretty much everything. So 'rss' has
    "-1" value so that means oracle user can essentially usurp all
    available memory for it's own processes outside of the SGA scope.


    Silver Mane wrote:
    > dawaves wrote:
    > > Hello,
    > >
    > > I'm coming across this odd issue where every so often (really
    > > sporadic), my p5 510 running 1.9Ghz 1-way with 5GB of RAM on AIX 5.3
    > > ML04 will run in a super slow state. It always happens on a weekday,
    > > always between 9:15AM - 10:00AM, and the only fix seems to be a reboot.
    > > I try to run commands during this state but the response is too long
    > > for me to wait since this is a Production machine. I run my commands
    > > from the console too, but still takes too long. I can ping the server,
    > > but my oracle apps can't connect.

    >
    > Feel free to follow the advice given by the other guys but also answer
    > the following questions:
    >
    > 1. Has anything changed on the system recently i.e. new fix installed?
    > 2. Have the Oracle DBA's set any new or changed any jobs in cron that
    > run around that time?
    > 3. Have you added any other cron or at jobs to run around that time?
    > 4. Do any jobs run over the weekend that are running into Monday and
    > hanging i.e. backup jobs or scripts to monitor things such as backup?
    >
    > Has this problem always happened or is it recent?
    >
    > Thanks



  8. Re: AIX Oracle Slowness Issue

    Hi Hajo,

    you're right that the machine is configured as file server
    (minclient=20%) and numclient is approx. 17%. So the VMM works how it
    should.
    Changing the min-/maclient to 3 and 8 will help a lot.

    But my question is:
    Who is using the remaining 50% of the memory?
    The machine would use 98% computational IF there would be no caching.
    And that is in my opinion too much.

    Andy




    On 15 Nov 2006 01:27:55 -0800, "Hajo Ehlers"
    wrote:

    >Andreas Beckmann wrote:
    >> Hi,
    >>
    >> I think you're right having to less memory.

    >
    >I disagree.
    >>
    >> But if your SGA just uses 1.8 G, what is using the other 2/3 of it. It
    >> is not file caching (see the high avm value).

    >
    >If computational pages getting paged out then they will use paging
    >space. So the avm number will increase.
    >
    >The vmstat output from the OP shows clearly:
    >
    >Paging occurs( pi/po not equal zero ) BUT free memory DOES NOT goes
    >down zero
    >and also file cache trashing occurs ( fi/fo not equal zero )
    >The min/maxperm and maxclient parameter seems tobe unchanged.
    >
    >This means that the machine is mainly configured as a file server and
    >not as DB server.
    >So imho the OP should change the parameter named in my previous post.
    >
    >regards
    >Hajo



    --
    Andreas Beckmann
    Andreas.Beckmann@muenster.de
    http://www.muenster.de/~andy

  9. Re: AIX Oracle Slowness Issue

    Hi,

    I am seeing issue with me AIX 5300-08 with Real,MB 80896. Which has got Oracle database running using RAW devices.

    Sometimes I see node eviction due to high memory utilization where numperm% falls below minperm% causing more paging.

    Here is vmo settings and vmstat output

    # vmo -a
    cpu_scale_memp = 8
    data_stagger_interval = 161
    defps = 1
    force_relalias_lite = 0
    framesets = 2
    htabscale = n/a
    kernel_heap_psize = 4096
    kernel_psize = 16777216
    large_page_heap_size = 0
    lgpg_regions = 0
    lgpg_size = 0
    low_ps_handling = 1
    lru_file_repage = 0
    lru_poll_interval = 10
    lrubucket = 131072
    maxclient% = 3
    maxfree = 9152
    maxperm = 997713
    maxperm% = 5
    maxpin = 16698695
    maxpin% = 80
    mbuf_heap_psize = 4096
    memory_affinity = 0
    memory_frames = 20709376
    memplace_data = 2
    memplace_mapped_file = 2
    memplace_shm_anonymous = 2
    memplace_shm_named = 2
    memplace_stack = 2
    memplace_text = 2
    memplace_unmapped_file = 2
    mempools = 3
    minfree = 960
    minperm = 199541
    minperm% = 1
    nokilluid = 0
    npskill = 281600
    npsrpgmax = 2252800
    npsrpgmin = 1689600
    npsscrubmax = 2252800
    npsscrubmin = 1689600
    npswarn = 1126400
    num_spec_dataseg = 0
    numpsblks = 36044800
    page_steal_method = 1
    pagecoloring = n/a
    pinnable_frames = 18876995
    psm_timeout_interval = 5000
    pta_balance_threshold = n/a
    relalias_percentage = 0
    rpgclean = 0
    rpgcontrol = 2
    scrub = 0
    scrubclean = 0
    soft_min_lgpgs_vmpool = 0
    spec_dataseg_int = 512
    strict_maxclient = 0
    strict_maxperm = 0
    v_pinshm = 0
    vm_modlist_threshold = -1
    vmm_fork_policy = 1
    vmm_mpsize_support = 0
    wlm_memlimit_nonpg = 1



    # vmstat -v
    20709376 memory pages
    19954294 lruable pages
    1109952 free pages
    3 memory pools
    1831723 pinned pages
    80.0 maxpin percentage
    1.0 minperm percentage
    5.0 maxperm percentage
    11.4 numperm percentage
    2293811 file pages
    0.0 compressed percentage
    0 compressed pages
    11.4 numclient percentage
    3.0 maxclient percentage
    2293811 client pages
    0 remote pageouts scheduled
    15187 pending disk I/Os blocked with no pbuf
    0 paging space I/Os blocked with no psbuf
    2228 filesystem I/Os blocked with no fsbuf
    3 client filesystem I/Os blocked with no fsbuf
    70 external pager filesystem I/Os blocked with no fsbuf
    0 Virtualized Partition Memory Page Faults
    0.00 Time resolving virtualized partition memory page faults

    Thanks in Advance!

+ Reply to Thread