Greetings.

We are running two Bind 9.2.1 name servers on Solaris. We are having
trouble with a particular domain -- sbj.net. I know there is a problem
with the domain. The root servers think that ns1-auth.sprintlink.net and
ns1.corpranet.net are supposed to be the authoritative servers for the
domain, whereas ns1.corpranet.net and ns1.positech.net are apparently
*supposed* to be the authoritative servers, and ns1-auth.sprintlink.net
indicates that it is *not* authoritative for sbj.net.

If I flush the cache (rndc flush) on our servers, they will successfully
resolve the A record for sbj.net. A dump of the database at that point
shows that our servers have cached ns1.corpranet.net and ns1.positech.net
as nameservers for sbj.net:

----------------------------------------------------------------
; authauthority
sbj.net. 3554 NS ns1.positech.net.
3554 NS ns1.corpranet.net.
; authanswer
3554 A 69.27.136.10
; authanswer
www.sbj.net. 3554 CNAME sbj.net.
----------------------------------------------------------------


After the NS records for sbj.net time out (1 hour), our servers then
return SERVFAIL for sbj.net. A network sniff indicates that our servers
are returning SERVFAIL without even trying to query any other nameservers
for sbj.net. A dump of the database at that point shows that our servers
have cached ns1.corpranet.net and ns1-auth.sprintlink.net as nameservers
for sbj.net:

------------------------------------------------------------------
; glue
sbj.NET. 155685 NS ns1.corpranet.net.
155685 NS ns1-auth.sprintlink.net.
; glue
sbs2003.NET. 149675 NS ns1.sbs2003.net.
------------------------------------------------------------------


My questions are:

1) Why do our servers sometime cache ns1.corpranet.net and
ns1.positech.net as the nameservers for sbj.net, and why do they sometimes
cache ns1.corpranet.net and ns1-auth.sprintlink.net instead? Why are they
not consistent?

2) *Should* our nameservers be caching ns1-auth.sprintlink.net as a
nameserver for sbj.net, since that server is lame for sbj.net?

3) If the answer to (2) is yes, is there any way to configure our servers
to keep them from caching lame servers (JUST the lame servers without
affecting caching for anything else)?

4) Why are our nameservers returning SERVFAIL when ns1-auth.sprintlink.net
is in the cache, since ns1.corpranet.net is also in the cache and is
authoritative for sbj.net. (In other words, why don't our servers go
ahead and try to query ns1.corpranet.net even though
ns1-auth.sprintlink.net is lame for sbj.net?)

I've googled and searched through the bind-user archives but have so far
not found the answers to my questions.

Thanks.

Ben Bridges
Network Engineer
SpringNet / City Utilities of Springfield, MO