AIX/RAC Slow response - Aix
This is a discussion on AIX/RAC Slow response - Aix ; I have the following environment:
2 x P570's with 12X48 allocated to each of the RAC nodes. There is
another LPAR on each node.
Oracle 9.2.0.7
DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
systems)
...
-
AIX/RAC Slow response
I have the following environment:
2 x P570's with 12X48 allocated to each of the RAC nodes. There is
another LPAR on each node.
Oracle 9.2.0.7
DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
systems)
AIX 5.3
GPFS 2.3
Often, one of the nodes goes into a hang state and starts giving a
lagged response to commands like df and w. df shows more slowness than
most other commands. While the other node is happy as a clam. CPU
usage on both nodes are around 30%.
This is after migration state so system has been up in production only
for a couple of days.
Dumps and logs have been sent to IBM but I wanted to know if I could
check for something that may provide help with troubleshooting this
issue. Thanks.
Regards,
-Adnan
-
Re: AIX/RAC Slow response
On Oct 14, 5:03*am, adnan...@gmail.com wrote:
> I have the following environment:
>
> 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
> another LPAR on each node.
> Oracle 9.2.0.7
> DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
> systems)
> AIX 5.3
> GPFS 2.3
>
> Often, one of the nodes goes into a hang state and starts giving a
> lagged response to commands like df and w. df shows more slowness than
> most other commands. While the other node is happy as a clam. CPU
> usage on both nodes are around 30%.
> This is after migration state so system has been up in production only
> for a couple of days.
>
> Dumps and logs have been sent to IBM but I wanted to know if I could
> check for something that may provide help with troubleshooting this
> issue. Thanks.
>
> Regards,
> -Adnan
Is RAC doing any reconfigurations during this time? What are some of
the wait statistics on the RAC database? Is it always the same node?
Have you thought about opening a TAR w/ ORacle? You probably already
have some monitoring setup, but, you could install Oracle's OSW(OS
Watcher). It gathers system statistics that you can send in to Oracle
to help determine if it's a RAC issue.
HTH,
Pete's
-
Re: AIX/RAC Slow response
On Oct 14, 12:03 pm, adnan...@gmail.com wrote:
> I have the following environment:
>
> 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
> another LPAR on each node.
> Oracle 9.2.0.7
> DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
> systems)
> AIX 5.3
> GPFS 2.3
>
> Often, one of the nodes goes into a hang state and starts giving a
> lagged response to commands like df and w. df shows more slowness than
> most other commands. While the other node is happy as a clam. CPU
> usage on both nodes are around 30%.
> This is after migration state so system has been up in production only
> for a couple of days.
>
> Dumps and logs have been sent to IBM but I wanted to know if I could
> check for something that may provide help with troubleshooting this
> issue. Thanks.
>
> Regards,
> -Adnan
Since you mention GPFS.
Run
$ mmfsadm dump waiters
during these time on all nodes
Monitor the inode usage.
Check the CPU usage for the mmfsd
Also check the path you are using. Remove any reference to unneeded
search paths and redo the test.
Verify whether or not a simple df is slow for all filesystems or it
hangs more or less on certain fs.
Use nmon and mmpmon for monitoring.
cheers
Hajo
-
Re: AIX/RAC Slow response
On Oct 14, 6:04*pm, "Pete's" wrote:
> On Oct 14, 5:03*am, adnan...@gmail.com wrote:
>
>
>
> > I have the following environment:
>
> > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
> > another LPAR on each node.
> > Oracle 9.2.0.7
> > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
> > systems)
> > AIX 5.3
> > GPFS 2.3
>
> > Often, one of the nodes goes into a hang state and starts giving a
> > lagged response to commands like df and w. df shows more slowness than
> > most other commands. While the other node is happy as a clam. CPU
> > usage on both nodes are around 30%.
> > This is after migration state so system has been up in production only
> > for a couple of days.
>
> > Dumps and logs have been sent to IBM but I wanted to know if I could
> > check for something that may provide help with troubleshooting this
> > issue. Thanks.
>
> > Regards,
> > -Adnan
>
> Is RAC doing any reconfigurations during this time? *What are some of
> the wait statistics on the RAC database? *Is it always the same node?
> Have you thought about opening a TAR w/ ORacle? *You probably already
> have some monitoring setup, but, you could install Oracle's OSW(OS
> Watcher). *It gathers system statistics that you can send in to Oracle
> to help determine if it's a RAC issue.
>
> HTH,
> Pete's
There is no reconfig taking place during this time. Problem stays for
couple of hours and goes away for another 3. Not the same node, once
node 1 and then node 2, and it is random. Opening the TAR is a good
idea, I have asked DBA's to do that already. I am also having them
install OSW as you suggested. I am sure they'll ask for the output as
the result of TAR. Thanks.
Regards,
-Adnan
-
Re: AIX/RAC Slow response
On Oct 15, 3:00*am, Hajo Ehlers wrote:
> On Oct 14, 12:03 pm, adnan...@gmail.com wrote:
>
>
>
> > I have the following environment:
>
> > 2 x P570's with 12X48 allocated to each of the RAC nodes. There is
> > another LPAR on each node.
> > Oracle 9.2.0.7
> > DS8100 SAN (File systems are 200G, one is 400G, total of 11 file
> > systems)
> > AIX 5.3
> > GPFS 2.3
>
> > Often, one of the nodes goes into a hang state and starts giving a
> > lagged response to commands like df and w. df shows more slowness than
> > most other commands. While the other node is happy as a clam. CPU
> > usage on both nodes are around 30%.
> > This is after migration state so system has been up in production only
> > for a couple of days.
>
> > Dumps and logs have been sent to IBM but I wanted to know if I could
> > check for something that may provide help with troubleshooting this
> > issue. Thanks.
>
> > Regards,
> > -Adnan
>
> Since you mention GPFS.
> Run
> $ mmfsadm dump waiters
> during these time on all nodes
>
> Monitor the inode usage.
> Check the CPU usage for the mmfsd
>
> Also check the path you are using. Remove any reference to unneeded
> search paths and redo the test.
> Verify whether or not a simple df is slow for all filesystems or it
> hangs more or less on certain fs.
>
> Use nmon and mmpmon for monitoring.
>
> cheers
> Hajo
Hi,
mmfsadm doesn't reveal anything significant.
There are no unneeded paths. It only happens on the file system on one
of the nodes in the RAC/GPFS. IBM had me run perfprm and that is of no
use to them as it doesn't show any stress on the system.
I have noticed one more thing, there is high traffic on lo0 compared
to the node that is doing good.
Also, when I try and connect to the system using putty, there are
times when my connection gets rejected 4-5 times and then I connect.
Not sure if that indicates anything.
Here is the mmfs.cfg, I wonder if any changes need to be here.
clusterName XXXX.eocrac1_gpfs
clusterId 13882357281488761952
clusterType lc
multinode yes
autoload yes
useDiskLease yes
maxFeatureLevelAllowed 833
tiebreakerDisks gpfs5nsd
pagepool 100M
[eocrac1_gpfs]
-
Re: AIX/RAC Slow response
Here's what fixed the problem for us. Thanks to all those who have
helped. It is much appreciated.
Regards,
-Adnan
Applies to:
Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 10.2.0.3
AIX5L Based Systems (64-bit)
Goal
Is Patch 5496862 applicable on AIX 5.3 TL 06 or TL 07 or TL 08 ?
Solution
Yes, the Oracle patch 5496862 is required on AIX 5.3 TL 06 or 07 or 08
as the issue (IO reading problems after installing IBM Technology
Level 5) is a two-fold one.
AIX 5300 TL 05 introduces functionality in their I/O module which
checked for selected bits in the I/O request. The Oracle patch 5496862
was required to initialize the I/O properly along wirth APAR IY89080.
In 5300-06 ie TL 06, the APAR IY89080 is included in the maintainence/
technology level. Even TL 07 and TL 08 has the fix.
Note# If the patch has been applied to your Oracle installation
before the upgrade to 5300-06 or 07 or 08 then it does not need to be
re-applied, but if the patch has not been applied then it will need to
be (unless you are running 10.2.0.3)
# Patch 5496862 is required for AIX 5.3 ML 5 and higher.
The AIX 5L Based system has mandatory patch 5496862 to fix I/O reading
problems on version 5.3 Technology level 5 (5300-05) and we need to
apply this patch if we are running following Oracle Database versions
8i---8.1.7.4
9i---9.0.1.5
9iR2---9.2.0.4 to 9.2.0.8
10gR1---10.1.0.4, 10.1.0.5
10gR2---10.2.0.1, 10.2.0.2, 10.2.0.3 (fix included in 10.2.0.3)
This patch is required for all editions of Oracle RDBMS. There is no
separate Patch 5496862 for 9.2.0.4. Instead there is a merge Patch
5565308 that included Patch 5496862.