max tcp sockets for bind 9.4.2-P1 - DNS

This is a discussion on max tcp sockets for bind 9.4.2-P1 - DNS ; Hello all, Like many of you, I recently upgraded all of our caching nameservers. Since we were already running BIND 9.4.2, I chose to upgrade to 9.4.2-P1. After the upgrade, I started receiving complaints of DNS queries that were truncated ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: max tcp sockets for bind 9.4.2-P1

  1. max tcp sockets for bind 9.4.2-P1

    Hello all,

    Like many of you, I recently upgraded all of our caching nameservers.
    Since we were already running BIND 9.4.2, I chose to upgrade to 9.4.2-P1.
    After the upgrade, I started receiving complaints of DNS queries that were
    truncated and retried over TCP failing.

    It appears that BIND is limiting the number of open TCP connections to ~
    100 per IP address it listens on. For example, on one of our caching
    nameservers:

    cachens-4:~# netstat -an | grep tcp | grep 72.3.128.240 | wc -l
    99
    cachens-4:~# netstat -an | grep tcp | grep 72.3.128.241 | wc -l
    105

    From an rndc status:

    tcp clients: 0/1000

    Almost all (~99%) of the TCP connections in the above netstat are at a
    SYN_RECV state. My guess would be customer servers that have bad firewall
    rules, but in any case, it's really not relevant to this particular
    problem because nothing has changed except for the upgrade from 9.4.2 to
    9.4.2-P1. I didn't change the named.conf or anything, and as you can see,
    tcp-clients is set to 1000.

    Did something change in the source code that would cause this? I'm
    thinking a listen() call with backlog set to 100 that wasn't setup that
    way previously? Something interesting to me is that the ARM specifies the
    default for tcp-clients to be 100, but maybe that is a coincidence.

    FWIW, SOMAXCONN is set to 128 on my servers. Prior to this patch, I was
    using a Debian packaged version of 9.4.2, so maybe they had it set higher?
    I looked all through the source and changes made by Debian to 9.4.2 and
    couldn't find anything to indicate this is the case.

    I'm open for suggestions! This a Debian Etch box running kernel 2.6.18 on
    an x86_64 architecture. Thanks.

    -- Jason


    Confidentiality Notice: This e-mail message (including any attached or
    embedded documents) is intended for the exclusive and confidential use of the
    individual or entity to which this message is addressed, and unless otherwise
    expressly indicated, is confidential and privileged information of Rackspace.
    Any dissemination, distribution or copying of the enclosed material is prohibited.
    If you receive this transmission in error, please notify us immediately by e-mail
    at abuse@rackspace.com, and delete the original message.
    Your cooperation is appreciated.



  2. Re: max tcp sockets for bind 9.4.2-P1

    On Jul 17, 6:09 am, "Jason Bratton" wrote:
    > Hello all,
    >
    > Like many of you, I recently upgraded all of our caching nameservers.
    > Since we were already running BIND 9.4.2, I chose to upgrade to 9.4.2-P1.
    > After the upgrade, I started receiving complaints of DNS queries that were
    > truncated and retried over TCP failing.
    >
    > It appears that BIND is limiting the number of open TCP connections to ~
    > 100 per IP address it listens on. For example, on one of our caching
    > nameservers:
    >
    > cachens-4:~# netstat -an | grep tcp | grep 72.3.128.240 | wc -l
    > 99
    > cachens-4:~# netstat -an | grep tcp | grep 72.3.128.241 | wc -l
    > 105
    >
    > From an rndc status:
    >
    > tcp clients: 0/1000
    >
    > Almost all (~99%) of the TCP connections in the above netstat are at a
    > SYN_RECV state. My guess would be customer servers that have bad firewall
    > rules, but in any case, it's really not relevant to this particular
    > problem because nothing has changed except for the upgrade from 9.4.2 to
    > 9.4.2-P1. I didn't change the named.conf or anything, and as you can see,
    > tcp-clients is set to 1000.
    >
    > Did something change in the source code that would cause this? I'm
    > thinking a listen() call with backlog set to 100 that wasn't setup that
    > way previously? Something interesting to me is that the ARM specifies the
    > default for tcp-clients to be 100, but maybe that is a coincidence.
    >
    > FWIW, SOMAXCONN is set to 128 on my servers. Prior to this patch, I was
    > using a Debian packaged version of 9.4.2, so maybe they had it set higher?
    > I looked all through the source and changes made by Debian to 9.4.2 and
    > couldn't find anything to indicate this is the case.
    >
    > I'm open for suggestions! This a Debian Etch box running kernel 2.6.18 on
    > an x86_64 architecture. Thanks.
    >
    > -- Jason
    >
    > Confidentiality Notice: This e-mail message (including any attached or
    > embedded documents) is intended for the exclusive and confidential use of the
    > individual or entity to which this message is addressed, and unless otherwise
    > expressly indicated, is confidential and privileged information of Rackspace.
    > Any dissemination, distribution or copying of the enclosed material is prohibited.
    > If you receive this transmission in error, please notify us immediately by e-mail
    > at ab...@rackspace.com, and delete the original message.
    > Your cooperation is appreciated.


    I am experiencing a similar issue with vendor supplied bind with 9.4.2-
    p1 fixes:

    QDDNS 4.1 Build 6 - Lucent DNS Server (BIND 9.4.1-P1), Copyright (c)
    2008 Alcatel-Lucent
    + Includes security fixes from BIND 9.4.2-P1

    It all started with a complaint that a query was failing on one of our
    15 internal DNS servers. All 15 servers were recently deployed and
    were identical in configuration. When I looked into the issue, I
    noticed that the query generated a response which was truncated and
    then reattempted using TCP. I then tested queries against the
    problematic server using "dig +tcp" and discovered that all DNS
    queries using TCP were failing on this server. netstat showed lots of
    connections in SYN_RECV. Since the same symptoms were encountered
    before when our firewall team misconfigured rules, I then checked to
    see if this was the cause. I got on the problematic server and issued
    queries to itself using TCP. In doing so, I noticed something very
    strange. A "dig +tcp somehost.domain.com @127.0.0.1" would succeed
    with no issues while a "dig +tcp somehost.domain.com
    @ip.of.the.server" would result in:

    ; <<>> DiG 9.4.1-P1 <<>> +tcp xxxx.xxxx.xxxx @xxx.xxx.xxx.xxx
    ; (1 server found)
    ;; global options: printcmd
    ;; connection timed out; no servers could be reached

    I am still waiting for the vendor to accept this is not a firewall
    issue since I can reproduce this by query the server from itself.


+ Reply to Thread