On Apr 16, 2008, at 2:48 PM, Fr34k wrote:

> One customer has a load balancer that, when queried for a hostname,
> responds with two IP addresses -- always in the same order, with a
> TTL of 0.
> BIND 9.4.2 will still round-robin the responses, however.
> Shouldn't the 0 TTL keep BIND from caching the responses and,
> therefore, duplicate the same order?

No. There is no guarantee in the DNS protocol of the order that
multiple records will be returned. This doesn't depend upon what the
TTL is.

So, even with a TTL of zero, the only server that will always report
the records in a particular order MAY be the authoritative server,
their load balancer. Any other DNS server can return the records in
any order that it desires. Now, a zero TTL means that the servers
queried for this DNS information will always have to query an
authoritative server. Now, all the zero TTL is providing is to
insure that the authoritative server(s) have to work harder and that
greater network traffic is generated.

You might want to take a look at http://www.isc.org/index.pl?/sw/bind/
docs/bind-load-bal.php, and suggest that your customer do this also.
This is a white paper discussing using DNS for load balancing, which
is what your customer is effectively trying to do, and some
alternative solutions to this balancing mechanism. Rather than
returning multiple "A" records, you might want to have them consider
using a different authoritative set of servers that can can toggle
which "A" record is returned upon need. This white paper suggests
using the "lbnamed" DNS server, or I suspect that a standard BIND
server could easily be used but having the correct "A" record being
set using an external dynamic DNS update process.

As an aside, if their load balancer is the ONLY authoritative server,
then what happens when this load balancer fails? Obviously, if the
load balancer fails then no one can get to the host(s) that the
balancer provides access to. But, if the load balancer also provides
DNS information, then when the balancer fails then the result will be
that the clients will get an "unknown host" type response rather than
something that indicates that the host is unavailable. I would look
at this as a situation where their solution is making communication
less robust rather than more robust.