I have an Oracle EBS system that has hung a couple of days
in a row. At first I saw references in /var/log/messages that
were new and followed them to power management so I
planned on working in the BIOS the next time it happened.

The problem is now it's starting to hang again and the only
entries near all 3 hangs are about pam_unix. So I started
digging on new sessions authenticating. Sure enough -
New cron jobs stop happening severa hours before anyone
notices a problem. Existing oracle sessions keep running
SQL but new ones hang at the creation point. New ssh
sessions hang just after asking for my password but
existing ones work - I have a few root shells running on the
console so I can do any sort of debugging I want.

The load average is consistant with a guess of the number
of hung sessions. Load around 12-15 but CPU utilitization
no more than 10%. Neither netstat nor lsof suggest there
are hung network sessions.

The very puzzling part is /etc/nsswitch.conf has

passwd: files

as its only option. No NIS, LDAP or any other sort of
network authentication. No sign of disk problems this
hang or the previous though there was a 30 second burp
on the FCAL line to the array an hour before the first hang
was noticed.

The only processes that used network authentication
were smd and nmbd so I did chkconfig smdb off and
service smbd stop to remove that possibility.

At this point I figure I'll need to reboot the box by the end
of the day and I really want a plan of action by then to
keep it from having the same problem again tomorrow.

Has anyone seen pam cause a hang with passwd: files?

Thanks in advance!