For the past few months, I have been trying to resolve (unsuccessfully to this point) with a trio of caching only name servers that we have in place. The general nature of the problem is as follows. A dhcp client originally gets an IP address on subnet A but at some point prior to lease expiration moves to subnet B, where they obtain a new IP address successfully. The problem that I am seeing is that after the move to subnet B, one or more of our caching only name servers are still returning the old IP address when a lookup of the hostname occurs. This behavior seems reasonable at first glance since caching only servers should retain the information they have in cache until the TTL expires and/or the cache is flushed. After digging into this further, I'm finding that that the TTL for the hosts whose forward lookups are returning the wrong IP are set to 604800 seconds or 168 hours. I've determined this by dumping / viewing the cache. In addition, I've also discovered
that the TTL for the reverse record for the same client is also set to this high value. This behavior would seem reasonable if this high value was the TTL value configured for the domain, which is not the case here. We have the default TTL in our environment set for 10800 seconds or 4 hours. Thus, I'm a little baffled as to why the TTL for some of these DHCP clients are being set to such a high value when other clients have their TTL's set to the 10800 value configured at the domain level. I've checked the registration at the object level (in our IP management application) and the TTL field is blank, thus implying the default TTL is in place.
Aside from the above details, I can also note that the problematic lookups seem to involve the same DHCP clients. The only reason I know about these clients is that they are unable to SSH to some Unix boxes in a DMZ that restrict access to hosts that they can perform both forward and reverse lookups for. In this scenario, the forward lookup is failing since it's returning the old IP address of the client. When this problem occurs, it tends to affect one or two of the caching servers but not all three. Furthermore, it is somewhat random as to which of the 3 servers are affected.

The caching servers in question are all Solaris 9 running BIND 9.3.2

If anyone can provide some insight here, it would be much appreciated. I can provide additional information and/or elaborate on something as needed.

Bill Smith

ISS Server Systems Group
Johns Hopkins University Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723
Phone: 443-778-5523
Web: http://www.jhuapl.edu