On Sun, Jul 27, 2008 at 05:26:24PM -0500, jbratton@rackspace.com wrote:
> Quoting Thomas Jacob :
>
>> That didn't seem to do it for us, the bind instance in question ran
>> for about 38 hours and then it refused to accept tcp connections again.

>
> Did it actually start building up a lot of TCP connections in SYN_RECV
> state again, or did it just crash?


It didn't crash (it never did that for us yet), so it probably
was the SYN_RECV situation, but I wasn't around myself for this particular
incident...


> You can fix that with ulimit. Check out ulimit -n to see how many open
> files you currently allow, and try increasing it. I keep it set to 16384
> on my busier caches without any issues. Note that setting the limit with
> ulimit won't be persistent, you will want to change


Already did that. I simply put it into the startup scripts.
And it looks like the upcoming fixes list seems to have several
entries that could be related to this problem:

http://www.isc.org/sw/bind/view/?release=9.4.2-P1#FIXES

Question: Your initial response regarding not having any problems with
a TCP queue size set to 1000, did you already have the increase in the
open file limit in place when you tried this. Or rather, did you
need to increase both the tcp queue size and the open file limit to get
to stable situation?