At 02:27 AM 9/30/2004, Eddie wrote:
>Let me say thank you for everyones help.
>Yesterday, I was sitting in the server room, and noticed the failed
>request count going up. Here we go again. This is the first time it's done
>this when I am on site so I can debug. About time.
>I dumped the packets and saw many request to the root servers going out,
>but nothing returning. So this time, I did a tcpdump on the external
>side of the NAT/firewall (Linux) box. Strange, I saw tons of DNS request
>to the root servers, all comming from the wrong server on the network.
>That's the backup server, it does not run bind. All it does is samba and
>ntpd. So I did a tcpdump on it and watched a bunch of stuff with the root
>servers sending data to it. On port 123 no less.
>I went into "I have been HACKED" panic mode and checked services and shut
>down programs. Still data. So I killed the network interface. Still

port 123 is the ntpd port. If there is still an issue related to ntp please
post it in comp.protocols.time.ntp though it sounds like bad memory.


>Anyway, after spending a good hour watching packet, I figured out that my
>NAT/firewall box has bad memory or some bug that once a week, it blows up
>the masq table and changes the "from" address of the DNS server, to the
>backup server. So any DNS request sent from the DNS server, are turned to
>the backup server. This is the strangest thing I have ever seen.
>I rebooted the firewall and now all is happy, but I am changing out that
>computer with a nice 486 with no floating point bug.
>Thanks for all help. Not a Bind bug after all. This is sure going on my
>On Sun, 01 Dec 2002 22:45:20 -0800, Mark_Andrews wrote:
> >> My primary DNS server is up to date on the latest RH patches. It runs
> >> Bind 9.2.1. The backup DNS server has not been updated yet and runs
> >> 9.2.0. It suffers the same problem, but since it's not under load, the
> >> problem does not show itself until the primary DNS fails for a bit.
> >>
> >> As for making the root name servers mad, I did a packet capture when
> >> Bind is running correctly. Looking at it in ethereal, I see an A query
> >> to respoinds back with
> >> "Standard Query Responce, Format Error"
> >>
> >> The request is made again, and this time it works. I see a lot of
> >> "Format" errors in my packet capture and this is when Bind is working.

> >
> > FORMERR's are responses to EDNS probes. Named re-tries w/o EDNS.
> >
> > Everything sounds like normal.
> >
> >> When Bind quit working last, I did a quick tcpdump and noticed that it
> >> was sending request out, but nothing was coming back. I did not get a
> >> chance to do a packet capture or a little sniffing on the external side
> >> of the firewall, but the backup DNS server was running fine at the time
> >> so I don't think it's firewall or network related. It was just like the
> >> root name servers stopped talking to it. Restarting Bind fixed the
> >> problem. Next time it goes out, I will be ready.

> >
> > I was on a doubly NAT'd net the other day and observed the behaviour.
> > As this was in a hotel conference room it wasn't worth expending time
> > and effort to chase the problem down. Note however the first NAT box
> > was Linux based.
> >
> > Restarting named causes named to use a different source port which
> > would allow the NAT to clear state.
> >
> > I would be taking packet traces from the outside of the firewall next
> > time it fails.
> >
> >> Thanks for the tip o the source rpms. When it dies again, I will try
> >> that.
> >>
> >> Thanks
> >> Ed
> >>
> >>

> > Mark
> > --
> > Mark Andrews, Internet Software Consortium 1 Seymour St., Dundas Valley,
> > NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET:
> >