Hi,

We maintain a pair of squid-2.5.STABLE4 caches on Gentoo Linux. They are
identical in configuration and have been installed from Gentoo's Portage
tree. We brought them up one at a time, the first one about 3 weeks ago,
the second one about a week ago. Currently we have services failed over
to the first one exclusively because of problems that happen when the
second server is activated.

We are using squidGuard-1.2.0-r1 (also from Portage) for our
redirect_program and use it to block access to sites that are not on a
whitelist that we maintain. For sites that don't match, the users are
redirected to a simple HTTP error page on the 1st server. We run 'squid
-k reconfigure' every hour to reload this list of whitelist sites.

It seems that every hour when we reload the cache, that error conditions
occur on this second proxy. Here are entries from cache.log during one
such occurance:

2004/04/10 13:01:01| Restarting Squid Cache (version 2.5.STABLE4)...
2004/04/10 13:01:01| FD 31 Closing HTTP connection
2004/04/10 13:01:01| FD 32 Closing ICP connection
2004/04/10 13:01:01| FD 38 Closing SNMP socket
2004/04/10 13:01:01| Closing unlinkd pipe on FD 33
2004/04/10 13:01:01| User-Agent logging is disabled.
2004/04/10 13:01:01| Referer logging is disabled.
2004/04/10 13:01:01| DNS Socket created at 0.0.0.0, port 32854, FD 6
2004/04/10 13:01:01| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2004/04/10 13:01:01| Adding nameserver 10.5.1.11 from /etc/resolv.conf
2004/04/10 13:01:01| helperOpenServers: Starting 20 'squidGuard' processes
2004/04/10 13:01:01| Unlinkd pipe opened on FD 33
2004/04/10 13:01:01| Accepting HTTP connections at 0.0.0.0, port 8080,
FD 31.
2004/04/10 13:01:01| Accepting ICP messages at 0.0.0.0, port 3130, FD 32.
2004/04/10 13:01:01| HTCP Disabled.
2004/04/10 13:01:01| Accepting SNMP messages on port 3401, FD 38.
2004/04/10 13:01:01| WCCP Disabled.
2004/04/10 13:01:01| Configuring Sibling 10.5.1.11/8080/3130
2004/04/10 13:01:01| Loaded Icons.
2004/04/10 13:01:01| Ready to serve requests.
2004/04/10 13:01:18| Failure Ratio at 1.01
2004/04/10 13:01:18| Going into hit-only-mode for 5 minutes...
2004/04/10 13:04:23| urlParse: Illegal character in hostname
'deniedbyproxy'
2004/04/10 13:04:23| urlParse: Illegal character in hostname
'deniedbyproxy'
2004/04/10 13:04:24| urlParse: Illegal character in hostname
'deniedbyproxy'
2004/04/10 13:04:25| urlParse: Illegal character in hostname
'deniedbyproxy'
2004/04/10 13:04:25| urlParse: Illegal character in hostname
'deniedbyproxy '
[...]

As this is happening, the users on this proxy begin to see pages
suddenly redirected to the whitelist error page - the one they are
redirected to when the site they are requesting is not on the whitelist.
This starts to appear for *any* page they visit, even the allowed sites.

Shortly after this happens, the cache shuts down and reloads itself:

2004/04/10 13:11:18| Failure Ratio at 2.83
2004/04/10 13:11:18| Going into hit-only-mode for 5 minutes...
2004/04/10 13:16:18| Failure Ratio at 2.44
2004/04/10 13:16:18| Going into hit-only-mode for 5 minutes...
2004/04/10 13:21:18| Failure Ratio at 1.47
2004/04/10 13:21:18| Going into hit-only-mode for 5 minutes...
2004/04/10 13:25:10| Preparing for shutdown after 302834 requests
[...]
2004/04/10 13:25:42| Squid Cache (Version 2.5.STABLE4): Exiting normally.
2004/04/10 13:25:44| Starting Squid Cache version 2.5.STABLE4 for
i686-pc-linux-gnu...
2004/04/10 13:25:44| Process ID 16189
2004/04/10 13:25:44| With 1024 file descriptors available

Another error we've seen is this (immediately after another reload):

2004/04/10 15:01:01| Ready to serve requests.
2004/04/10 15:01:03| parseHttpRequest: Unsupported method 'Pragma:'
2004/04/10 15:01:03| clientReadRequest: FD 69 Invalid Request
2004/04/10 15:01:05| parseHttpRequest: Unsupported method 'Pragma:'
2004/04/10 15:01:05| clientReadRequest: FD 58 Invalid Request
2004/04/10 15:01:07| parseHttpRequest: Unsupported method 'Pragma:'
2004/04/10 15:01:07| clientReadRequest: FD 42 Invalid Request

It's almost as if clients are passing HTML instead of URLs as the request.

So we're now left stumped as to what might cause this. It seems that in
our troubleshooting we found that if we restart the cache it will run
fine again until the next reload. Between the functioning and
non-functioning servers, the squid.conf files are identical, as are the
squidGuard.conf files. The only immediate difference I can see is that
when diffing the squidGuard .db files, they show up slightly different.
I suspected initially that they just may be different because they are
compiled binary files and the machines might make them differently.

We do suspect that the number of file descriptors (1024) is too small
for this cache's load and would like to set it higher. Might this cause
the errors we are seeing?

More detailed logs are available at the following URL:

http://ganymede.sento.com/squidlogs/cache.log

--
Darren Spruell