AIX/RAC Slow response - Aix

This is a discussion on AIX/RAC Slow response - Aix ; I have the following environment: 2 x P570's with 12X48 allocated to each of the RAC nodes. There is another LPAR on each node. Oracle 9.2.0.7 DS8100 SAN (File systems are 200G, one is 400G, total of 11 file systems) ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: AIX/RAC Slow response

  1. AIX/RAC Slow response

    I have the following environment:

    2 x P570's with 12X48 allocated to each of the RAC nodes. There is
    another LPAR on each node.
    Oracle 9.2.0.7
    DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
    systems)
    AIX 5.3
    GPFS 2.3

    Often, one of the nodes goes into a hang state and starts giving a
    lagged response to commands like df and w. df shows more slowness than
    most other commands. While the other node is happy as a clam. CPU
    usage on both nodes are around 30%.
    This is after migration state so system has been up in production only
    for a couple of days.

    Dumps and logs have been sent to IBM but I wanted to know if I could
    check for something that may provide help with troubleshooting this
    issue. Thanks.

    Regards,
    -Adnan

  2. Re: AIX/RAC Slow response

    On Oct 14, 5:03*am, adnan...@gmail.com wrote:
    > I have the following environment:
    >
    > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
    > another LPAR on each node.
    > Oracle 9.2.0.7
    > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
    > systems)
    > AIX 5.3
    > GPFS 2.3
    >
    > Often, one of the nodes goes into a hang state and starts giving a
    > lagged response to commands like df and w. df shows more slowness than
    > most other commands. While the other node is happy as a clam. CPU
    > usage on both nodes are around 30%.
    > This is after migration state so system has been up in production only
    > for a couple of days.
    >
    > Dumps and logs have been sent to IBM but I wanted to know if I could
    > check for something that may provide help with troubleshooting this
    > issue. Thanks.
    >
    > Regards,
    > -Adnan


    Is RAC doing any reconfigurations during this time? What are some of
    the wait statistics on the RAC database? Is it always the same node?
    Have you thought about opening a TAR w/ ORacle? You probably already
    have some monitoring setup, but, you could install Oracle's OSW(OS
    Watcher). It gathers system statistics that you can send in to Oracle
    to help determine if it's a RAC issue.

    HTH,
    Pete's

  3. Re: AIX/RAC Slow response

    On Oct 14, 12:03 pm, adnan...@gmail.com wrote:
    > I have the following environment:
    >
    > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
    > another LPAR on each node.
    > Oracle 9.2.0.7
    > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
    > systems)
    > AIX 5.3
    > GPFS 2.3
    >
    > Often, one of the nodes goes into a hang state and starts giving a
    > lagged response to commands like df and w. df shows more slowness than
    > most other commands. While the other node is happy as a clam. CPU
    > usage on both nodes are around 30%.
    > This is after migration state so system has been up in production only
    > for a couple of days.
    >
    > Dumps and logs have been sent to IBM but I wanted to know if I could
    > check for something that may provide help with troubleshooting this
    > issue. Thanks.
    >
    > Regards,
    > -Adnan


    Since you mention GPFS.
    Run
    $ mmfsadm dump waiters
    during these time on all nodes

    Monitor the inode usage.
    Check the CPU usage for the mmfsd

    Also check the path you are using. Remove any reference to unneeded
    search paths and redo the test.
    Verify whether or not a simple df is slow for all filesystems or it
    hangs more or less on certain fs.

    Use nmon and mmpmon for monitoring.

    cheers
    Hajo

  4. Re: AIX/RAC Slow response

    On Oct 14, 6:04*pm, "Pete's" wrote:
    > On Oct 14, 5:03*am, adnan...@gmail.com wrote:
    >
    >
    >
    > > I have the following environment:

    >
    > > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
    > > another LPAR on each node.
    > > Oracle 9.2.0.7
    > > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
    > > systems)
    > > AIX 5.3
    > > GPFS 2.3

    >
    > > Often, one of the nodes goes into a hang state and starts giving a
    > > lagged response to commands like df and w. df shows more slowness than
    > > most other commands. While the other node is happy as a clam. CPU
    > > usage on both nodes are around 30%.
    > > This is after migration state so system has been up in production only
    > > for a couple of days.

    >
    > > Dumps and logs have been sent to IBM but I wanted to know if I could
    > > check for something that may provide help with troubleshooting this
    > > issue. Thanks.

    >
    > > Regards,
    > > -Adnan

    >
    > Is RAC doing any reconfigurations during this time? *What are some of
    > the wait statistics on the RAC database? *Is it always the same node?
    > Have you thought about opening a TAR w/ ORacle? *You probably already
    > have some monitoring setup, but, you could install Oracle's OSW(OS
    > Watcher). *It gathers system statistics that you can send in to Oracle
    > to help determine if it's a RAC issue.
    >
    > HTH,
    > Pete's


    There is no reconfig taking place during this time. Problem stays for
    couple of hours and goes away for another 3. Not the same node, once
    node 1 and then node 2, and it is random. Opening the TAR is a good
    idea, I have asked DBA's to do that already. I am also having them
    install OSW as you suggested. I am sure they'll ask for the output as
    the result of TAR. Thanks.

    Regards,
    -Adnan

  5. Re: AIX/RAC Slow response

    On Oct 15, 3:00*am, Hajo Ehlers wrote:
    > On Oct 14, 12:03 pm, adnan...@gmail.com wrote:
    >
    >
    >
    > > I have the following environment:

    >
    > > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
    > > another LPAR on each node.
    > > Oracle 9.2.0.7
    > > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
    > > systems)
    > > AIX 5.3
    > > GPFS 2.3

    >
    > > Often, one of the nodes goes into a hang state and starts giving a
    > > lagged response to commands like df and w. df shows more slowness than
    > > most other commands. While the other node is happy as a clam. CPU
    > > usage on both nodes are around 30%.
    > > This is after migration state so system has been up in production only
    > > for a couple of days.

    >
    > > Dumps and logs have been sent to IBM but I wanted to know if I could
    > > check for something that may provide help with troubleshooting this
    > > issue. Thanks.

    >
    > > Regards,
    > > -Adnan

    >
    > Since you mention GPFS.
    > Run
    > $ mmfsadm dump waiters
    > during these time on all nodes
    >
    > Monitor the inode usage.
    > Check the CPU usage for the mmfsd
    >
    > Also check the path you are using. Remove any reference to unneeded
    > search paths and redo the test.
    > Verify whether or not a simple df is slow for all filesystems or it
    > hangs more or less on certain fs.
    >
    > Use nmon and mmpmon for monitoring.
    >
    > cheers
    > Hajo

    Hi,

    mmfsadm doesn't reveal anything significant.
    There are no unneeded paths. It only happens on the file system on one
    of the nodes in the RAC/GPFS. IBM had me run perfprm and that is of no
    use to them as it doesn't show any stress on the system.

    I have noticed one more thing, there is high traffic on lo0 compared
    to the node that is doing good.
    Also, when I try and connect to the system using putty, there are
    times when my connection gets rejected 4-5 times and then I connect.
    Not sure if that indicates anything.
    Here is the mmfs.cfg, I wonder if any changes need to be here.

    clusterName XXXX.eocrac1_gpfs
    clusterId 13882357281488761952
    clusterType lc
    multinode yes
    autoload yes
    useDiskLease yes
    maxFeatureLevelAllowed 833
    tiebreakerDisks gpfs5nsd
    pagepool 100M
    [eocrac1_gpfs]

  6. Re: AIX/RAC Slow response

    Here's what fixed the problem for us. Thanks to all those who have
    helped. It is much appreciated.

    Regards,
    -Adnan

    Applies to:
    Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 10.2.0.3
    AIX5L Based Systems (64-bit)

    Goal
    Is Patch 5496862 applicable on AIX 5.3 TL 06 or TL 07 or TL 08 ?

    Solution
    Yes, the Oracle patch 5496862 is required on AIX 5.3 TL 06 or 07 or 08
    as the issue (IO reading problems after installing IBM Technology
    Level 5) is a two-fold one.

    AIX 5300 TL 05 introduces functionality in their I/O module which
    checked for selected bits in the I/O request. The Oracle patch 5496862
    was required to initialize the I/O properly along wirth APAR IY89080.

    In 5300-06 ie TL 06, the APAR IY89080 is included in the maintainence/
    technology level. Even TL 07 and TL 08 has the fix.

    Note# If the patch has been applied to your Oracle installation
    before the upgrade to 5300-06 or 07 or 08 then it does not need to be
    re-applied, but if the patch has not been applied then it will need to
    be (unless you are running 10.2.0.3)

    # Patch 5496862 is required for AIX 5.3 ML 5 and higher.

    The AIX 5L Based system has mandatory patch 5496862 to fix I/O reading
    problems on version 5.3 Technology level 5 (5300-05) and we need to
    apply this patch if we are running following Oracle Database versions
    8i---8.1.7.4
    9i---9.0.1.5
    9iR2---9.2.0.4 to 9.2.0.8
    10gR1---10.1.0.4, 10.1.0.5
    10gR2---10.2.0.1, 10.2.0.2, 10.2.0.3 (fix included in 10.2.0.3)

    This patch is required for all editions of Oracle RDBMS. There is no
    separate Patch 5496862 for 9.2.0.4. Instead there is a merge Patch
    5565308 that included Patch 5496862.



+ Reply to Thread