On Fri, Jul 25, 2008 at 03:35:41PM -0500, Jason Bratton wrote:
> After posting my email, I finally figured out the problem. For some
> reason, I had to set tcp-listen-queue. I never had it set before, so
> something changed in the code, but yeah, that fixed it. I set both
> tcp-clients and tcp-listen-queue to 1000 and haven't had any problems
> like that since.


That didn't seem to do it for us, the bind instance in question ran
for about 38 hours and then it refused to accept tcp connections again.

I found the following error message in the logs at about the
time of the outage:

27-Jul-2008 15:35:12.234 resolver: notice: clients-per-query decreased to 17
27-Jul-2008 15:35:34.440 general: error: socket.c:1996: unexpected error:
27-Jul-2008 15:35:34.440 general: error: internal_accept: fcntl() failed: Too many open files
27-Jul-2008 15:35:34.452 general: error: socket.c:1996: unexpected error:
27-Jul-2008 15:35:34.452 general: error: internal_accept: fcntl() failed: Too many open files

Since messages there seem to be several messages about the file handle limit being
exceeded on the list already, I presume it's the same problem that other people
are having with the 9.5.0-P1 patch.

Anybody has any suggestions what I specifically I should be looking at at
the next outage?


> -- Jason
>
> Thomas Jacob wrote:
> > Hello list,
> >
> > We're having problems with the -P1 version, some time after
> > starting the server (could be minutes or hours), the tcp request
> > handler seems to get stuck, and all (or almost all) new requests
> > get stuck in the SYN_RECV tcp stat. We haven't found out what
> > exactly triggers this yet, could be load, could be specific
> > types of queries.
> >
> > This seems to be the same problem as described
> > in the following post by Jason Bratton:
> >
> > http://marc.info/?l=bind-users&m=121628960603391&w=2
> >
> > The main difference should be that we're running
> > the version of bind that comes with Ubuntu 8.0.4 LTS x86_64,
> > and the problems happen when upgrading from
> > version bind9_9.4.2-10 to bind9_9.4.2-10ubuntu0.1, a diff
> > between these two shows the exact same -P1 patch as in the upstream
> > version.
> >
> > Our tcp related settings:
> >
> > transfers-out 100;
> > transfers-per-ns 100;
> > tcp-clients 5000;
> > recursive-clients 10000
> >
> > Is anyone else seeing this? Is this really a bind bug? And if yes, is
> > there a workaround?
> >
> >
> > Regards,
> > Thomas
> >

>
>
> Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use of the
> individual or entity to which this message is addressed, and unless otherwise
> expressly indicated, is confidential and privileged information of Rackspace.
> Any dissemination, distribution or copying of the enclosed material is prohibited.
> If you receive this transmission in error, please notify us immediately by e-mail
> at abuse@rackspace.com, and delete the original message.
> Your cooperation is appreciated.
>
>