I have two caching servers, res1 and res2, running BIND 9.2.3 on Red Hat Linux release 8.0 (Psyche). They sit inside a firewall, and forward queries to four different caching servers on the outside, as well as some internal servers authoritative for internal zones.

Last week res2 starting being slow and failing resolution intermittently. Dig queries sent from res2 to the outside resolvers worked correctly. Dig queries sent from res2 to res1 worked correctly. However, dig queries from res1 to res2 produced error messages like this:

;; Warning: ID mismatch: expected ID 3325, got 34596

with various different IDs produced from different queries. It was late at night (I had been paged) so I went ahead and rebooted res2. This cleared up the issue.

Now, a week later, this same issue is occurring on res1. res1 is slow to respond to queries and intermittently failing to resolve names. digs issued on res1 pointing to the outside resolvers work fine. Digs issued on res1 pointing to res2 work fine. Digs issued on res2 pointing to res1 produce the ID mismatch errors again.

I suspect that if I reboot it the error will clear up again, but before I do that I want to try and work out what is going on.

Any advice?