What's causing this System Slowdown??? - vmstat included - Aix

This is a discussion on What's causing this System Slowdown??? - vmstat included - Aix ; Our Environment: Hardware: P55A 24 GB of memory SAN - fiber attached Hitachi Tagma Software: AIX 5.3 (5300-06-03-0732) DB2 9.1.3 Tomcat 4.1.24 (services INTRA-net accesses only) Apache 2.52 (services INTRA-net accesses only) We had a severe system slowdown for about ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: What's causing this System Slowdown??? - vmstat included

  1. What's causing this System Slowdown??? - vmstat included

    Our Environment:
    Hardware:
    P55A
    24 GB of memory
    SAN - fiber attached Hitachi Tagma
    Software:
    AIX 5.3 (5300-06-03-0732)
    DB2 9.1.3
    Tomcat 4.1.24 (services INTRA-net accesses only)
    Apache 2.52 (services INTRA-net accesses only)

    We had a severe system slowdown for about 15 minutes on 08/12.
    We had a vmstat running with a 30 second sample period.
    I have attached a snippet of the output of that vmstat.

    The only thing KNOWN that was out of the ordinary during that time
    was a new "validation" program was being run. For each file in a
    directory, it was
    invoked to validate the contents of the file (each around 10 meg).
    The program would
    exit after validating that file... then it would be invoked to handle
    the next file.
    Total read during the sample period was about 30 GB. The files were
    all on the
    SAN. The program made no "external" calls (i.e. no DB2, no other
    files, etc.), other that reading the file into memory at invocation.

    I'm primarily a developer, and have had little experience in system
    configuration
    or vmstat interpretation... Any insights from AIX guru's would be
    appreciated.

    Here is the snippet of the problem time...
    ....Sorry about the formatting, but the width of the input window is
    pretty small...
    I could Email it to someone as an attachment and it would be easier to
    view.

    BTW, I pipe vmstat's output thru an awk script to pre-pend a timestamp
    to each line.

    -tony

    08/12/08:11:01:53: kthr memory
    page faults cpu
    08/12/08:11:01:53: ----- ---------------
    ----------------------------------- ----------------- -----------
    08/12/08:11:01:53: r b avm fre re pi po fr
    sr cy in sy cs us sy id wa
    08/12/08:11:01:53: 2 1 3293107 106639 0 3 0 0
    0 0 1524 248777 10468 49 13 35 3
    08/12/08:11:02:23: 3 1 3292328 117308 0 2 0 0
    0 0 1261 207811 8150 37 11 48 4
    08/12/08:11:02:53: 2 1 3292060 108447 0 2 0 0
    0 0 1185 208005 7921 38 11 48 3
    08/12/08:11:03:23: 5 3 3294551 105152 0 16 0 0
    0 0 1361 274026 8826 55 19 25 2
    08/12/08:11:03:53: 5 1 3129244 261357 0 6 0 0
    0 0 1219 208419 7604 39 10 49 2
    08/12/08:11:04:23: 2 1 3130064 260469 0 5 0 0
    0 0 1281 210099 9071 40 11 45 4
    08/12/08:11:04:53: 2 1 3132265 259319 0 1 0 0
    0 0 1374 205440 8436 38 11 48 3
    08/12/08:11:05:23: 2 1 3130582 252820 0 2 0 0
    0 0 1165 206436 18703 38 12 46 3
    08/12/08:11:05:53: 2 1 3153178 227794 0 1 0 0
    0 0 1342 214249 10364 41 12 43 4
    08/12/08:11:06:23: 2 1 3154147 215955 0 2 0 0
    0 0 1267 216599 9438 43 12 42 3
    08/12/08:11:06:53: 2 1 3153881 217783 0 1 0 0
    0 0 1124 192189 7660 34 10 52 3
    08/12/08:11:07:23: 3 1 3154512 205295 0 2 0 0
    0 0 1350 200180 8326 39 11 46 4
    08/12/08:11:07:53: 2 1 3154275 208024 0 0 0 0
    0 0 1085 181783 7097 31 10 55 4
    08/12/08:11:08:23: 2 1 3157057 197642 0 2 0 0
    0 0 1300 231004 8249 43 16 29 12
    08/12/08:11:08:53: 7 1 3157289 185056 0 8 0 0
    0 0 2180 312069 13348 72 16 7 5
    08/12/08:11:09:23: 6 1 3156596 168716 0 2 0 0
    0 0 2070 285178 11812 66 15 13 5
    08/12/08:11:09:53: 5 1 3157545 158788 0 2 0 0
    0 0 1854 268561 10453 61 14 19 6
    08/12/08:11:10:23: 2 0 3157122 148630 0 2 0 0
    0 0 1471 242591 21023 54 15 24 7
    08/12/08:11:10:54: 2 1 3157218 139970 0 0 0 0
    0 0 1216 177244 7919 31 10 49 11
    08/12/08:11:11:24: 3 1 3157290 140217 0 0 0 0
    0 0 1469 208662 9086 42 12 39 7
    08/12/08:11:11:54: kthr memory
    page faults cpu
    08/12/08:11:11:54: ----- ---------------
    ----------------------------------- ----------------- -----------
    08/12/08:11:11:54: r b avm fre re pi po fr
    sr cy in sy cs us sy id wa
    08/12/08:11:11:54: 3 1 3157720 136993 0 7 0 0
    0 0 1632 205286 10742 39 12 35 14
    08/12/08:11:12:24: 3 3 3157802 2990 0 1 36 1939
    5302 0 1808 228485 10947 46 15 28 11
    08/12/08:11:12:54: 2 0 3157788 10763 0 3 21 375
    858 0 1743 217114 12245 43 13 31 13
    08/12/08:11:13:24: 3 0 3157761 3486 0 2 0 68
    147 0 1334 239891 8735 46 18 33 3
    08/12/08:11:13:55: 3 4 3159989 2363 0 9 176 7406
    18336 0 1440 181831 6952 33 15 38 14
    08/12/08:11:14:24: 3 11 3179320 3294 0 19 374 9121
    24184 0 1452 130179 5793 17 12 23 47
    08/12/08:11:14:54: 2 9 3180061 4809 0 18 425 15920
    39474 0 1333 120019 4695 15 16 29 41
    08/12/08:11:15:24: 2 15 3183757 1542 0 19 400 12365
    31747 0 1413 109558 5099 13 13 33 41
    08/12/08:11:15:55: 2 14 3199993 2410 0 24 372 9714
    26028 0 1161 89996 3814 12 11 34 43
    08/12/08:11:16:25: 3 16 3182806 4828 0 21 363 5113
    14519 0 1074 64791 2823 9 8 32 50
    08/12/08:11:16:56: 5 13 3194130 2573 0 211 46 812
    1776 0 1243 242160 20758 35 17 19 28
    08/12/08:11:17:25: 11 14 3187688 1472 0 55 279 2634
    7666 0 1204 68444 6329 17 9 26 48
    08/12/08:11:17:55: 2 25 3190256 4416 0 17 379 7627
    18030 0 1119 46477 3082 9 9 22 59
    08/12/08:11:18:25: 1 26 3183827 4939 0 18 428 12375
    35764 0 1054 30888 2278 10 12 34 45
    08/12/08:11:18:55: 2 19 3183589 9245 0 52 369 13818
    34041 0 1019 77291 2970 18 18 15 48
    08/12/08:11:19:25: 1 10 3179271 3289 0 57 275 14297
    35169 0 1291 115605 4003 17 16 29 38
    08/12/08:11:19:55: 2 4 3178155 2791 0 118 118 6986
    17326 0 1051 123809 4449 16 10 41 33
    08/12/08:11:20:27: 1 10 3179367 1641 0 21 377 14535
    34682 0 1242 89201 2980 14 14 31 41
    08/12/08:11:20:56: 1 14 3179329 2910 0 23 425 11814
    30442 0 900 26656 1716 5 12 39 44
    08/12/08:11:21:28: 2 12 3182451 1520 0 19 385 11270
    28834 0 998 30686 2234 6 10 40 43
    08/12/08:11:21:56: kthr memory
    page faults cpu
    08/12/08:11:21:56: ----- ---------------
    ----------------------------------- ----------------- -----------
    08/12/08:11:21:56: r b avm fre re pi po fr
    sr cy in sy cs us sy id wa
    08/12/08:11:21:56: 1 18 3180416 3491 0 23 442 10451
    26235 0 1140 38194 2355 6 10 40 44
    08/12/08:11:22:26: 2 11 3179764 6732 0 62 285 8532
    20268 0 1023 104247 4730 15 11 36 38
    08/12/08:11:22:56: 1 2 3179276 1697 0 162 114 4975
    14011 0 900 140916 18617 14 11 52 24
    08/12/08:11:23:26: 1 13 3180782 4497 0 17 389 11109
    26601 0 1295 26219 2629 7 12 39 42
    08/12/08:11:23:58: 1 13 3179567 4236 0 21 404 9003
    22203 0 1170 42502 2672 9 14 31 46
    08/12/08:11:24:26: 1 22 3180279 4955 0 15 451 5972
    15736 0 1095 8407 2453 3 7 41 49
    08/12/08:11:24:56: 2 26 3181779 1460 0 16 405 4940
    12500 0 938 12303 2623 4 7 41 47
    08/12/08:11:25:26: 1 22 3180173 4909 0 29 444 10988
    28151 0 985 21215 1937 5 11 40 44
    08/12/08:11:25:57: 1 19 3182919 3075 0 73 305 10123
    26839 0 883 22817 2118 7 10 40 43
    08/12/08:11:26:26: 2 4 3177907 2500 0 218 0 150
    172 0 823 68657 4001 16 6 50 28
    08/12/08:11:26:56: 1 1 3178997 10359 0 199 4 300
    372 0 677 58815 3567 9 5 64 22
    08/12/08:11:27:26: 1 2 3182012 2678 0 167 3 1215
    1586 0 960 99257 4900 17 8 53 22
    08/12/08:11:27:56: 2 2 3185450 3816 0 90 18 1815
    4148 0 774 80650 18784 11 9 56 24
    08/12/08:11:28:26: 1 1 3183627 3481 0 112 1 93
    274 0 746 96248 4114 18 12 51 19
    08/12/08:11:28:56: 2 1 3186212 2724 0 107 10 638
    1950 0 940 85125 4097 30 6 46 18
    08/12/08:11:29:26: 2 3 3185499 2710 0 117 11 589
    1163 0 777 87577 4282 38 5 32 25
    08/12/08:11:29:56: 1 2 3190047 3241 0 237 3 453
    906 0 640 50321 3390 8 4 60 28
    08/12/08:11:30:26: 1 1 3195927 2615 0 174 3 465
    1027 0 798 135787 4074 23 12 45 19
    08/12/08:11:30:56: 1 2 3199394 2781 0 194 1 638
    2120 0 839 178701 8089 34 9 39 18
    08/12/08:11:31:26: 10 9 3194085 7780 0 258 7 576
    1998 0 1324 179284 8888 32 10 25 32
    08/12/08:11:31:57: kthr memory
    page faults cpu
    08/12/08:11:31:57: ----- ---------------
    ----------------------------------- ----------------- -----------
    08/12/08:11:31:57: r b avm fre re pi po fr
    sr cy in sy cs us sy id wa
    08/12/08:11:31:57: 14 2 3205750 4711 0 172 4 1036
    2450 0 1460 215655 11300 49 13 27 11
    08/12/08:11:32:27: 25 3 3199825 8531 0 107 11 829
    1438 0 1931 301679 16571 80 17 1 1
    08/12/08:11:32:58: 22 1 3198273 5145 0 74 14 968
    2160 0 2161 312359 12788 77 18 3 2
    08/12/08:11:33:28: 4 1 3197654 3771 0 46 8 423
    3441 0 1460 262222 9683 53 19 24 4
    08/12/08:11:33:58: 2 1 3196895 10585 0 31 0 223
    291 0 1152 200450 7812 43 13 40 5
    08/12/08:11:34:28: 2 1 3196478 4052 0 40 0 149
    205 0 1189 168226 6658 32 10 48 10
    08/12/08:11:34:58: 2 1 3196308 13650 0 22 2 327
    444 0 1111 178132 7631 40 11 45 4
    08/12/08:11:35:28: 3 1 3196660 3332 0 21 0 126
    157 0 1103 172811 9124 43 12 41 5
    08/12/08:11:35:58: 2 1 3196863 3764 0 14 2 333
    436 0 951 164650 6979 31 10 55 4
    08/12/08:11:36:28: 3 1 3196717 9457 0 16 1 205
    253 0 1281 225138 8751 54 14 29 3
    08/12/08:11:36:58: 2 1 3192755 5072 0 9 0 0
    0 0 979 178366 7478 36 11 51 2
    08/12/08:11:37:28: 3 1 3193788 10970 0 20 0 233
    286 0 1241 212272 7893 48 14 36 3
    08/12/08:11:37:58: 2 1 3193739 4667 0 18 0 139
    167 0 1163 226059 21807 41 15 40 4
    08/12/08:11:38:28: 4 1 3193988 3136 0 26 1 387
    476 0 1410 237995 8810 50 18 29 3
    08/12/08:11:38:58: 2 1 3193739 2905 0 12 1 125
    145 0 1205 171887 7939 34 11 47 8
    08/12/08:11:39:28: 2 1 3193270 3285 0 6 0 384
    470 0 1011 163260 7038 31 11 53 5
    08/12/08:11:39:58: 2 1 3213221 3694 0 14 4 768
    1447 0 1142 177081 9662 36 12 46 6
    08/12/08:11:40:28: 2 1 3213694 4683 0 26 3 393
    2490 0 1062 143089 7920 38 11 47 4
    08/12/08:11:40:58: 3 1 3214250 12205 0 14 7 251
    1472 0 1229 215161 8769 48 14 34 4
    08/12/08:11:41:28: 2 1 3214536 12063 0 11 0 0
    0 0 1026 158716 7300 30 10 56 3

  2. Re: What's causing this System Slowdown??? - vmstat included

    bennett.tony@con-way.com schrieb:

    > We had a severe system slowdown for about 15 minutes on 08/12.
    > We had a vmstat running with a 30 second sample period.
    > I have attached a snippet of the output of that vmstat.
    >
    > The only thing KNOWN that was out of the ordinary during that time
    > was a new "validation" program was being run. For each file in a
    > directory, it was
    > invoked to validate the contents of the file (each around 10 meg).
    > The program would
    > exit after validating that file... then it would be invoked to handle
    > the next file.
    > Total read during the sample period was about 30 GB. The files were
    > all on the
    > SAN. The program made no "external" calls (i.e. no DB2, no other
    > files, etc.), other that reading the file into memory at invocation.
    >


    Free memory drops and your system starts to page out. Seems
    this memory is now used to cache file data.
    You may take a look at the vmo parameter lru_file_repage.
    How is it set? And search/read in the AIX Docu the section
    "VMM page replacement tuning"
    Here you may find:
    If the lru_file_repage parameter is set to 0, only file pages are stolen
    if the number of file pages in memory is greater than the value of the
    minperm parameter.

    To me it seems lru_file_repage on your system is set to 1. Then
    VMM will speal computional pages too... causing the page outs.

  3. Re: What's causing this System Slowdown??? - vmstat included

    On Aug 14, 11:04*pm, Thomas Braunbeck
    wrote:
    > bennett.t...@con-way.com schrieb:
    >
    >
    >
    >
    >
    > > We had a severe system slowdown for about 15 minutes on 08/12.
    > > We had a vmstat running with a 30 second sample period.
    > > I have attached a snippet of the output of that vmstat.

    >
    > > The only thing KNOWN that was out of the ordinary during that time
    > > was a new "validation" program was being run. *For each file in a
    > > directory, it was
    > > invoked to validate the contents of the file (each around 10 meg).
    > > The program would
    > > exit after validating that file... then it would be invoked to handle
    > > the next file.
    > > Total read during the sample period was about 30 GB. *The files were
    > > all on the
    > > SAN. *The program made no "external" calls (i.e. no DB2, no other
    > > files, etc.), other that reading the file into memory at invocation.

    >
    > Free memory drops and your system starts to page out. Seems
    > this memory is now used to cache file data.
    > You may take a look at the vmo parameter lru_file_repage.
    > How is it set? And search/read in the AIX Docu the section
    > "VMM page replacement tuning"
    > Here you may find:
    > If the lru_file_repage parameter is set to 0, only file pages are stolen
    > if the number of file pages in memory is greater than the value of the
    > minperm parameter.
    >
    > To me it seems lru_file_repage on your system is set to 1. Then
    > VMM will speal computional pages too... causing the page outs.- Hide quoted text -
    >
    > - Show quoted text -


    Thanks for the response, Thomas.

    I can't seem to find any documentation on "lru_file_repage" in my man
    page for vmo....
    ...not to mention that I am not ROOT on that system, and therefore
    can't run vmo.

    Are there any other commands I can run to get that info...???

    As to the requested minperm...
    ...here's vmstat -v output which I think gives you the number you
    want:
    6291456 memory pages
    5990577 lruable pages
    53489 free pages
    1 memory pools
    1216582 pinned pages
    80.0 maxpin percentage
    20.0 minperm percentage
    80.0 maxperm percentage
    67.9 numperm percentage
    4071993 file pages
    0.0 compressed percentage
    0 compressed pages
    67.9 numclient percentage
    80.0 maxclient percentage
    4071985 client pages
    0 remote pageouts scheduled
    2022691 pending disk I/Os blocked with no pbuf
    1993637 paging space I/Os blocked with no psbuf
    802494 filesystem I/Os blocked with no fsbuf
    1836 client filesystem I/Os blocked with no fsbuf
    911534 external pager filesystem I/Os blocked with no
    fsbuf
    0 Virtualized Partition Memory Page Faults
    0.00 Time resolving virtualized partition memory
    page faults

    Thanks,
    -tony

  4. Re: What's causing this System Slowdown??? - vmstat included

    bennett.tony@con-way.com schrieb:
    > On Aug 14, 11:04 pm, Thomas Braunbeck
    > wrote:
    >> bennett.t...@con-way.com schrieb:


    >
    > I can't seem to find any documentation on "lru_file_repage" in my man
    > page for vmo....
    > ...not to mention that I am not ROOT on that system, and therefore
    > can't run vmo.
    >
    > Are there any other commands I can run to get that info...???
    >
    > As to the requested minperm...
    > ...here's vmstat -v output which I think gives you the number you
    > want:
    > 6291456 memory pages
    > 5990577 lruable pages
    > 53489 free pages
    > 1 memory pools
    > 1216582 pinned pages
    > 80.0 maxpin percentage
    > 20.0 minperm percentage
    > 80.0 maxperm percentage
    > 67.9 numperm percentage
    > 4071993 file pages
    > 0.0 compressed percentage
    > 0 compressed pages
    > 67.9 numclient percentage
    > 80.0 maxclient percentage
    > 4071985 client pages
    > 0 remote pageouts scheduled
    > 2022691 pending disk I/Os blocked with no pbuf
    > 1993637 paging space I/Os blocked with no psbuf
    > 802494 filesystem I/Os blocked with no fsbuf
    > 1836 client filesystem I/Os blocked with no fsbuf
    > 911534 external pager filesystem I/Os blocked with no
    > fsbuf
    > 0 Virtualized Partition Memory Page Faults
    > 0.00 Time resolving virtualized partition memory
    > page faults
    >
    > Thanks,
    > -tony


    Hello Tony,
    maybe the AIX you run is too old. Search here:
    http://publib.boulder.ibm.com/infoce...v5r3/index.jsp
    Sure, this will not help much if it is not a tunable on your system.
    Then lookup minperm/maxperm/maxclient (vmo tunables minperm%, maxperm%
    and maxclient%). It would be useful to understand how VMM steals pages
    depending on these vmo paramaters.
    Now, the above vmstat -v shows 67.9% RAM are used for the file cache.
    With nim/maxperm(client) set to 20/80 VMM may page out program data
    to be able to cache more file data (==> the po and later pi from the
    vmstat in your initial post). This vmstat shows an AVM in the 3000000
    range (less than 13GB RAM used for computional pages if I remember
    currect). Around 50% available RAM is needed for the program data.
    And with file cache now at 67.9% some of this got paged out (which may
    not be bad if this data is never used).
    The question is: Is it ok that 16+GB RAM are used for the file cache.
    There is a database running on the system. Databases usually have their
    own data caching and caching the data in the filesystem cache is not
    useful. But without all details it is impossible to make any tuning
    suggestion. It is too easy to 'break' the performance for some other
    application....
    Now, you wrote the slowdown happens if you run a program which reads
    and check a number of big files. If this is your program (you got the
    source) and this is a read-only, then if seems to be useless to have
    all this data cached in the file cache. Read "Direct I/O tuning" in
    the AIX docu (search on the above link). Maybe changing this program
    to use direct I/O (and AIO) is an option.




+ Reply to Thread