We're having a strange problem occuring with our machines here. Every once
in awhile, all the user-space processes appear to freeze. All kernel
threads continue to run, interrupt functions continue to run, but the
daemons and other user-space processes just freeze. After a period of time
(which varies from 15 minutes up to 8 hours), everything starts running
again. This odd condition occurs on about half of our machines, and can
occur anywhere from once a day to once every couple of weeks.

I enabled the Magic-SysRq key on some of the boxes, and those keystrokes
work, which is how I determined that the kernel threads are still running.
I can ping the boxes, but cannot telnet into them. Yet any tcp/udp
connections *through* the machine proceed just fine!! Any syslog messages
which occur during that hung time get queued up until the machine
unfreezes, then everything gets logged to /var/log/messages, all with the
current timestamp!! It took me the LONGEST time to realize that the
messages were being accumulated over a period of time!!

I'm completely stumped on what could be hanging the systems up, and what
causes it to recover eventually... has anyone ever seen anything like
this?? Are there any suggestions on how to proceed with debugging it??



----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----