* JINMEI Tatuya / ?$B?@L@C#:H [2008-08-15]:
> At Fri, 15 Aug 2008 10:27:13 +1000,
> Mark Andrews wrote:
>
> > > > > fctx 0x87b7b20(images.yandex.ru/A'): query
> > > > > fctx 0x87b7b20(images.yandex.ru/A'): done
> > > >
> > > > This seems to indicate creating a query socket somehow failed. Can
> > > > you build BIND by hand to see if you can reproduce the problem with
> > > > it? Then we may add some ad-hock patch to provide more detailed log
> > > > information.
> > >
> > > Can you run sockstat and see if there are a large number of
> > > listening UDP sockets from another process or processes that
> > > maybe named is attempting to BIND to as well (and failing) when
> > > sourcing the queries? I'm not sure how BIND determines (if it
> > > does) if a port is free before attempting to bind to it when
> > > sourcing a query. I know you can specify port ranges to not
> > > use. Maybe the issue is that the port is being used by another
> > > process and eventually after a retry or two, you source from a
> > > port that is not being consumed by another process and it works.

> >
> > The -P2's won't bind(2) to a port that is in use.

>
> And the failure of bind(2) doesn't well explain why "any" recursive
> queries failed. This could happen in theory, but since named retries
> bind(2)ing with different sockets many times before finally giving up,
> the chance of happening this for all queries should be extremely
> small in practice. There should be something unexpected behind the
> scene.


I finally learned some more about the Cisco ASA and was able to
capture all packages to and from the name server. When the recursive
requests fail, there is no trace of communication on the ASA - not
even the first outgoing package in the recursive request. It seems the
name server fails to send the request (or that the ASA is stopping
without logging it) - what's going on? I can also see that when the
request completes successfully (after 2-3 or 10+ tries), it does the
complete recursive request - it's not completing because it has cached
the authorative name server(s).

For some reason I couldn't find sockstat on the (CentOS) box that runs
bind...

Regards,
Hans

PS! I have been contacted by other people outside the list with the
same problem.