I have a problem with BIND. It manifests in at least 9.2.1 and 9.2.3
and involves forwarders. The BIND software emits malformed queries.

I have not verified whether the issue is platform-specific. We're
running on Solaris 7.

I have packet traces that show that the BIND server is setting a bit
within the must-be-zero bits within DNS queries going to the forwarders.

The behavior is consistent and reproducible and appears to affect
every zone that uses multiple forwarders.

In the case at hand, we have a zone defined as type forward:

zone "245.141.in-addr.arpa" {
type forward;
forwarders {
141.245.68.6;
141.245.68.7;
141.245.245.242;
141.245.245.241;
};
};

From my client, I query for a name within this zone

C:\> nslookup 141.245.1.14

The server queries its cache, comes up empty and forwards a query
to each of the listed forwarders in turn.

The queries to the first three servers on the list are all malformed.
They have one of the must-be-zero bits in the "Z" field set. And they
contain a bizarre additional record. The target servers drop the
malformed query on the floor, the request times out and the next
forwarder is tried.

The query to the fourth and last server on the list is properly formed,
the target server responds and the result is passed back to the client.

If the order of the forwarders in the bind config file (/etc/named.conf)
is changed, it is always the last listed server that gets the properly
formed query and the preceding servers that get the malformed queries.

The malformed queries are identical to the properly formed query with
the exception of the bit in the Z field, the value in the ARCOUNT field,
the presence of an additional records section and, of course, the
query ID.

Here is an example malformed query:

13:59:03.695868 O 10.40.96.14.33437 > 141.245.68.6.53: 64734+ [b2&3=0x110]
[1au] PTR? 14.1.245.141.in-addr.arpa. (54)

0000 fcde0110 00010000 00000001 02313401 .............14.
0010 31033234 35033134 3107696e 2d616464 1.245.141.in-add
0020 72046172 70610000 0c000100 00290800 r.arpa.......)..
0030 00008000 0000 ......

As I decode this packet, it's:

ID = 0xfcde
QR = 0 (it's a query)
Opcode = 0 (standard query)
AA = 0
TC = 0
RD = 1 (recursion desired)
RA = 0
Z = 1 (must be zero isn't)
RCODE = 0
QDCOUNT = 1 (One query)
ANCOUNT = 0
NSCOUNT = 0
ARCOUNT = 1 (An additional record in a query?)
Question section:
QNAME = 14.1.245.141.in-addr.arpa.
QTYPE = 12 (query for PTR records)
QCLASS = 1 (IN)
Additional record section:
QNAME = .
QTYPE = 41
QCLASS = 2048
TTL = 2048
RDLENGTH = 0
RDATA = ""

And here is the corresponding properly formed query:

13:59:05.705929 O 10.40.96.14.33437 > 141.245.245.241.53: 3746+ PTR?
14.1.245.1
41.in-addr.arpa. (43)

0000 0ea20100 00010000 00000000 02313401 .............14.
0010 31033234 35033134 3107696e 2d616464 1.245.141.in-add
0020 72046172 70610000 0c0001 r.arpa.....

As I decode this packet, it's:

ID = 0x0ea2
QR = 0 (it's a query)
Opcode = 0 (standard query)
AA = 0
TC = 0
RD = 1 (recursion desired)
RA = 0
Z = 0 (must be zero is)
RCODE = 0
QDCOUNT = 1 (One query)
ANCOUNT = 0
NSCOUNT = 0
ARCOUNT = 0 (No additional records this time)
Question section:
QNAME = 14.1.245.141.in-addr.arpa.
QTYPE = 12 (query for PTR records)
QCLASS = 1 (IN)

I am working under the assumption that this is intended behavior and that
BIND is trying do do some private negotiation to control the next-hop
forwarder's behavior on forwarded queries. I have been unable to locate
a document describing this behavior.

Rudimentary testing shows that BIND is willing to respond to these
"malformed" queries. The name server software used on the forwarders
for the 141.245 reverse zone is not so tolerant.

Be warned, if you try to reproduce this problem yourself using the
servers mentioned above on the 141.245 reverse zone you'll encounter
a different set of problems. We're running split DNS and the servers
and forwarders shown here are all on the company intranet.

John Briggs