Reachable and rejected - NTP
This is a discussion on Reachable and rejected - NTP ; I hope I didn't miss an easy answer while reading the FAQ, list archive,
and other documents online. I have some systems which are separated from
their time servers by a NAT proxy. Those which are not separated seem to
...
-
Reachable and rejected
I hope I didn't miss an easy answer while reading the FAQ, list archive,
and other documents online. I have some systems which are separated from
their time servers by a NAT proxy. Those which are not separated seem to
work just fine but those beyond the proxy don't keep time correctly. For
example, on one of them I got this output:
# ntpq -p
remote refid st t when poll reach delay offset jitter
================================================== ============================
server-1 172.16.2.5 2 u 52 64 377 2.022 -41630. 19.566
server-2 172.16.2.5 2 u 6 64 377 2.121 -41601. 19.996
# ntpq -c as
ind assID status conf reach auth condition last_event cnt
================================================== =========
1 20192 9024 yes yes none reject reachable 2
2 20193 9024 yes yes none reject reachable 2
Those time servers aren't ideal but they are beyond my control and these
are the only two I have available. The local firewall won't let me use
servers on the Internet.
What I haven't found while reading is how it is possible for a server to
be both reachable and rejected. Note that the reject condition is not
constant; the servers are accepted occasionally, but not for very long.
Can this situation be remedied?
--
Dave Close, Compata, Costa Mesa CA +1 714 434 7359
dave@compata.com dhclose@alumni.caltech.edu
"Quantum computing is a marvelous way to show the non-
intuitive nature of quantum mechanics." -Gordon Moore
--
Dave Close, Compata, Costa Mesa CA "Politics is the business of getting
dave@compata.com, +1 714 434 7359 power and privilege without
dhclose@alumni.caltech.edu possessing merit." - P. J. O'Rourke
-
Re: Reachable and rejected
On 2008-09-10, Dave Close wrote:
> I hope I didn't miss an easy answer while reading the FAQ, list
> archive, and other documents online. I have some systems which are
> separated from their time servers by a NAT proxy. Those which are not
> separated seem to work just fine but those beyond the proxy don't keep
> time correctly. For example, on one of them I got this output:
The system shown below has no problem polling the remote time servers.
So you can rule out NAT as a problem.
> # ntpq -p
> remote refid st t when poll reach delay offset jitter
> ================================================== ============
> server-1 172.16.2.5 2 u 52 64 377 2.022 -41630. 19.566
> server-2 172.16.2.5 2 u 6 64 377 2.121 -41601. 19.996
This ntpd was 41.6 seconds away from the those servers at the time this
billboard was taken. That is a very large offset.
I would check in the syslog and see if ntpd is having to step the clock.
If that is the case you need to fix whatever is causing this massive
drift.
--
Steve Kostecke
NTP Public Services Project - http://support.ntp.org/
-
Re: Reachable and rejected
Dave Close wrote:
> I hope I didn't miss an easy answer while reading the FAQ, list archive,
> and other documents online. I have some systems which are separated from
> their time servers by a NAT proxy. Those which are not separated seem to
> work just fine but those beyond the proxy don't keep time correctly. For
> example, on one of them I got this output:
>
> # ntpq -p
> remote refid st t when poll reach delay offset jitter
> ================================================== ============================
> server-1 172.16.2.5 2 u 52 64 377 2.022 -41630. 19.566
> server-2 172.16.2.5 2 u 6 64 377 2.121 -41601. 19.996
>
> # ntpq -c as
> ind assID status conf reach auth condition last_event cnt
> ================================================== =========
> 1 20192 9024 yes yes none reject reachable 2
> 2 20193 9024 yes yes none reject reachable 2
>
> Those time servers aren't ideal but they are beyond my control and these
> are the only two I have available. The local firewall won't let me use
> servers on the Internet.
>
> What I haven't found while reading is how it is possible for a server to
> be both reachable and rejected. Note that the reject condition is not
> constant; the servers are accepted occasionally, but not for very long.
>
> Can this situation be remedied?
I would START by setting the correct time on each machine. You can
either start ntpd with the "-g" switch, or you can use ntpdate to set
the time. Without doing one or the other I doubt that your machine will
EVER synchronize.
Please try this, wait for at least 30 minutes, and then issue:
ntpq -p
and report your results.
-
Re: Reachable and rejected
Steve Kostecke writes:
>On 2008-09-10, Dave Close wrote:
>> I hope I didn't miss an easy answer while reading the FAQ, list
>> archive, and other documents online. I have some systems which are
>> separated from their time servers by a NAT proxy. Those which are not
>> separated seem to work just fine but those beyond the proxy don't keep
>> time correctly. For example, on one of them I got this output:
>The system shown below has no problem polling the remote time servers.
>So you can rule out NAT as a problem.
>> # ntpq -p
>> remote refid st t when poll reach delay offset jitter
>> ================================================== ============
>> server-1 172.16.2.5 2 u 52 64 377 2.022 -41630. 19.566
>> server-2 172.16.2.5 2 u 6 64 377 2.121 -41601. 19.996
>This ntpd was 41.6 seconds away from the those servers at the time this
>billboard was taken. That is a very large offset.
>I would check in the syslog and see if ntpd is having to step the clock.
>If that is the case you need to fix whatever is causing this massive
>drift.
>--
>Steve Kostecke
>NTP Public Services Project - http://support.ntp.org/
I am having the same problem on SEVENTEEN machines, all of which are
behind the NAT, and I am NOT having the problem on dozens more which
are not behind it and are configured identically. These are all Fedora
machines which run ntpdate automatically as part of /etc/init.d/ntpd.
The example above is from a machine behind the NAT which had been
running for more than a week. The drift does not surprise me.
In desperation, I have changed several of the machines behind the
NAT to run ntpd -gq periodically, and stopped the ntpd daemon. Those
machines are tracking the correct time fairly closely, within less
than a second always. But I don't like this kludge and would love the
find the right solution.
--
Dave Close, Compata, Costa Mesa CA "There is no security on this earth.
dave@compata.com, +1 714 434 7359 There is only opportunity."
dhclose@alumni.caltech.edu -- Douglas MacArthur
--
Dave Close, Compata, Costa Mesa CA "Politics is the business of getting
dave@compata.com, +1 714 434 7359 power and privilege without
dhclose@alumni.caltech.edu possessing merit." - P. J. O'Rourke
-
Re: Reachable and rejected
On 2008-09-11, Dave Close wrote:
> Steve Kostecke writes:
>
>>This ntpd was 41.6 seconds away from the those servers at the time
>>this billboard was taken. That is a very large offset.
>
>>I would check in the syslog and see if ntpd is having to step the
>>clock. If that is the case you need to fix whatever is causing this
>>massive drift.
>
> I am having the same problem on SEVENTEEN machines, all of which are
> behind the NAT, and I am NOT having the problem on dozens more which
> are not behind it and are configured identically.
I've run ntpd behind NAT without any special configuration and not had a
problem. Is this NAT box overloaded?
> These are all Fedora machines which run ntpdate automatically as part
> of /etc/init.d/ntpd. The example above is from a machine behind the
> NAT which had been running for more than a week. The drift does not
> surprise me.
Are there any clock reset messages from ntpd in the syslog?
ntpd should be stepping the clock once the offset exceeds 128ms (unless
you've disabled stepping).
BTW: There's often someone awake on #ntp at irc.freenode.net
--
Steve Kostecke
NTP Public Services Project - http://support.ntp.org/
-
Re: Reachable and rejected
>I am having the same problem on SEVENTEEN machines, all of which are
>behind the NAT, and I am NOT having the problem on dozens more which
>are not behind it and are configured identically.
I have no troubles running ntpd behind a NAT box.
What flavor of NAT box are you using? I wonder what it is it doing
that is confusing things?
Note that the 377 under the reach column says that the packets
are getting through.
--
These are my opinions, not necessarily my employer's. I hate spam.
-
Re: Reachable and rejected
Dave Close wrote:
>
> What I haven't found while reading is how it is possible for a server to
> be both reachable and rejected. Note that the reject condition is not
That's quite easy, but I can't see a case which applies here (using a
not recently synchronised w32time server (NAT would make no difference),
kiss of death because of the excess query rate from the router (I think
the refid would be special); the two servers have error bands that do
not overlap (I can't remember whether this would produce a special flag
in the ntpq output))
> constant; the servers are accepted occasionally, but not for very long.
That does sound like kiss of death, except for the refid.
What would help is the result of running rv against each association ID,
to get the detailed state for the association.
-
Re: Reachable and rejected
David Woolley writes:
>Dave Close wrote:
>> What I haven't found while reading is how it is possible for a server to
>> be both reachable and rejected. Note that the reject condition is not
>That's quite easy, but I can't see a case which applies here (using a
>not recently synchronised w32time server (NAT would make no difference),
>kiss of death because of the excess query rate from the router (I think
>the refid would be special); the two servers have error bands that do
>not overlap (I can't remember whether this would produce a special flag
>in the ntpq output))
>> constant; the servers are accepted occasionally, but not for very long.
>That does sound like kiss of death, except for the refid.
>What would help is the result of running rv against each association ID,
>to get the detailed state for the association.
Ok, here goes. I hope this means something to someone else...
# ntpq -p
remote refid st t when poll reach delay offset jitter
================================================== ============================
server1 172.16.2.5 2 u 24 64 377 2.159 -51835. 8.729
server2 172.16.2.5 2 u 46 64 377 2.200 -51822. 19.581
# ntpq -c as
ind assID status conf reach auth condition last_event cnt
================================================== =========
1 20192 9024 yes yes none reject reachable 2
2 20193 9024 yes yes none reject reachable 2
# ntpq
ntpq> rv 20192
assID=20192 status=9024 reach, conf, 2 events, event_reach,
srcadr=taus-idc2.dom1.taus.us.thales, srcport=123,
dstadr=192.168.58.250, dstport=123, leap=00, stratum=2, precision=-7,
rootdelay=0.000, rootdispersion=14089.050, refid=172.16.2.5, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist,
keyid=0, ttl=0, offset=-51835.943, delay=2.159, dispersion=9.549,
jitter=10.021, reftime=cc74ddf3.bd083558 Fri, Sep 12 2008 5:24:19.738,
org=cc754432.c147ae14 Fri, Sep 12 2008 12:40:34.755,
rec=cc754466.9b66e174 Fri, Sep 12 2008 12:41:26.607,
xmt=cc754466.9ad85a70 Fri, Sep 12 2008 12:41:26.604,
filtdelay= 2.17 2.20 2.16 3.68 4.14 2.25 4.47 2.24,
filtoffset= -51850. -51846. -51835. -51848. -51831. -51832. -51829. -51846.,
filtdisp= 7.81 8.80 9.75 10.72 11.70 12.66 13.65 14.59
ntpq> rv 20193
assID=20193 status=9024 reach, conf, 2 events, event_reach,
srcadr=taus-idc1.dom1.taus.us.thales, srcport=123,
dstadr=192.168.58.250, dstport=123, leap=00, stratum=2, precision=-7,
rootdelay=0.000, rootdispersion=14089.630, refid=172.16.2.5, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist,
keyid=0, ttl=0, offset=-51822.970, delay=2.200, dispersion=12.060,
jitter=24.499, reftime=cc7533f1.d8de6ddf Fri, Sep 12 2008 11:31:13.847,
org=cc754459.c0418937 Fri, Sep 12 2008 12:41:13.751,
rec=cc75448d.9bb36a6e Fri, Sep 12 2008 12:42:05.608,
xmt=cc75448d.9b19fbd7 Fri, Sep 12 2008 12:42:05.605,
filtdelay= 2.34 2.64 4.18 3.24 2.71 2.40 2.93 2.20,
filtoffset= -51856. -51847. -51849. -51851. -51844. -51838. -51840. -51822.,
filtdisp= 7.81 8.76 9.72 10.69 11.65 12.63 13.59 14.56
--
Dave Close, Compata, Costa Mesa CA +1 714 434 7359
dave@compata.com dhclose@alumni.caltech.edu
"Technology has the shelf life of a banana." - Scott McNealy
--
Dave Close, Compata, Costa Mesa CA "Politics is the business of getting
dave@compata.com, +1 714 434 7359 power and privilege without
dhclose@alumni.caltech.edu possessing merit." - P. J. O'Rourke
-
Re: Reachable and rejected
On 2008-09-12, Dave Close wrote:
> # ntpq -p
> remote refid st t when poll reach delay offset jitter
>================================================== ============================
> server1 172.16.2.5 2 u 24 64 377 2.159 -51835. 8.729
> server2 172.16.2.5 2 u 46 64 377 2.200 -51822. 19.581
Are server1 and server2 real NTP servers? What does their ntpq -p output
look like?
--
Steve Kostecke
NTP Public Services Project - http://support.ntp.org/
-
Re: Reachable and rejected
Dave Close wrote:
> David Woolley writes:
>
>> Dave Close wrote:
>
>>> What I haven't found while reading is how it is possible for a server to
>>> be both reachable and rejected. Note that the reject condition is not
>
>> That's quite easy, but I can't see a case which applies here (using a
>> not recently synchronised w32time server (NAT would make no difference),
>> kiss of death because of the excess query rate from the router (I think
>> the refid would be special); the two servers have error bands that do
>> not overlap (I can't remember whether this would produce a special flag
>> in the ntpq output))
>
>>> constant; the servers are accepted occasionally, but not for very long.
>
>> That does sound like kiss of death, except for the refid.
>
>> What would help is the result of running rv against each association ID,
>> to get the detailed state for the association.
>
> Ok, here goes. I hope this means something to someone else...
>
> # ntpq -p
> remote refid st t when poll reach delay offset jitter
> ================================================== ============================
> server1 172.16.2.5 2 u 24 64 377 2.159 -51835. 8.729
> server2 172.16.2.5 2 u 46 64 377 2.200 -51822. 19.581
> # ntpq -c as
> ind assID status conf reach auth condition last_event cnt
> ================================================== =========
> 1 20192 9024 yes yes none reject reachable 2
> 2 20193 9024 yes yes none reject reachable 2
> # ntpq
> ntpq> rv 20192
> assID=20192 status=9024 reach, conf, 2 events, event_reach,
> srcadr=taus-idc2.dom1.taus.us.thales, srcport=123,
> dstadr=192.168.58.250, dstport=123, leap=00, stratum=2, precision=-7,
> rootdelay=0.000, rootdispersion=14089.050, refid=172.16.2.5, reach=377,
> unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist,
> keyid=0, ttl=0, offset=-51835.943, delay=2.159, dispersion=9.549,
> jitter=10.021, reftime=cc74ddf3.bd083558 Fri, Sep 12 2008 5:24:19.738,
> org=cc754432.c147ae14 Fri, Sep 12 2008 12:40:34.755,
> rec=cc754466.9b66e174 Fri, Sep 12 2008 12:41:26.607,
> xmt=cc754466.9ad85a70 Fri, Sep 12 2008 12:41:26.604,
> filtdelay= 2.17 2.20 2.16 3.68 4.14 2.25 4.47 2.24,
> filtoffset= -51850. -51846. -51835. -51848. -51831. -51832. -51829. -51846.,
> filtdisp= 7.81 8.80 9.75 10.72 11.70 12.66 13.65 14.59
> ntpq> rv 20193
> assID=20193 status=9024 reach, conf, 2 events, event_reach,
> srcadr=taus-idc1.dom1.taus.us.thales, srcport=123,
> dstadr=192.168.58.250, dstport=123, leap=00, stratum=2, precision=-7,
> rootdelay=0.000, rootdispersion=14089.630, refid=172.16.2.5, reach=377,
> unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist,
> keyid=0, ttl=0, offset=-51822.970, delay=2.200, dispersion=12.060,
> jitter=24.499, reftime=cc7533f1.d8de6ddf Fri, Sep 12 2008 11:31:13.847,
> org=cc754459.c0418937 Fri, Sep 12 2008 12:41:13.751,
> rec=cc75448d.9bb36a6e Fri, Sep 12 2008 12:42:05.608,
> xmt=cc75448d.9b19fbd7 Fri, Sep 12 2008 12:42:05.605,
> filtdelay= 2.34 2.64 4.18 3.24 2.71 2.40 2.93 2.20,
> filtoffset= -51856. -51847. -51849. -51851. -51844. -51838. -51840. -51822.,
> filtdisp= 7.81 8.76 9.72 10.69 11.65 12.63 13.59 14.56
The offset is large enough that ntpd would need several DAYS to work it off.
Try setting your clock to a reasonable approximation of the correct time
before starting ntpd. ntpd -g should do the job if you are running a
reasonably recent version. If your version is too old to support -g,
then use ntpdate to set the clock before starting ntpd.
-
Re: Reachable and rejected
Dave Close wrote:
> dstadr=192.168.58.250, dstport=123, leap=00, stratum=2, precision=-7,
Precision -7 is poor, but possibly a contra-indication for the main
hypothesis as I believe that w32time is normally even worse, at -6.
> rootdelay=0.000, rootdispersion=14089.630, refid=172.16.2.5, reach=377,
Root dispersion is excessive. Combined with a stratum of 2, this is
indicative of your using a w32time server rather than an NTP server.
However, NAT should make no difference. By any chance were the machines
outside the NAT w32time clients, rather than NTP clients?
A real NTP server would not report root delay as zero unless it had a
directly connected reference clock. The reference ID indicates that
that is not the case, and the stratum is suggestive that it isn't the
case. Again it looks as though you are not talking to an NTP server and
probably talking to a w32time one.
What I think you can safely say is that you are not talking to an NTP
server.
(Basically, the rootdispersion is telling you that the server hasn't had
an update for so long that it can't be sure of its own time to better
than 14 seconds. NTP requires that, after adding extra errors for the
hop to the client, that it know to better than one second.)
> unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist,
peer_dist is telling you what I just said.
> keyid=0, ttl=0, offset=-51822.970, delay=2.200, dispersion=12.060,
> jitter=24.499, reftime=cc7533f1.d8de6ddf Fri, Sep 12 2008 11:31:13.847,
reftime looks OK. I wonder if the server is actually reflecting your
rootdispersion. Again, if so, not an NTP server.
> org=cc754459.c0418937 Fri, Sep 12 2008 12:41:13.751,
> rec=cc75448d.9bb36a6e Fri, Sep 12 2008 12:42:05.608,
> xmt=cc75448d.9b19fbd7 Fri, Sep 12 2008 12:42:05.605,
> filtdelay= 2.34 2.64 4.18 3.24 2.71 2.40 2.93 2.20,
> filtoffset= -51856. -51847. -51849. -51851. -51844. -51838. -51840. -51822.,
> filtdisp= 7.81 8.76 9.72 10.69 11.65 12.63 13.59 14.56
-
Re: Reachable and rejected
On 2008-09-12, Dave Close wrote:
> Steve Kostecke wrote:
>
>>Are server1 and server2 real NTP servers? What does their ntpq -p output
>>look like?
>
> I don't have access to these servers.
Try 'ntpq -pcrv server[1|2]' from another system.
--
Steve Kostecke
NTP Public Services Project - http://support.ntp.org/
-
Re: Reachable and rejected
Steve Kostecke writes:
>Are server1 and server2 real NTP servers? What does their ntpq -p output
>look like?
>Try 'ntpq -pcrv server[1|2]' from another system.
# ntpq -pcrv server1
server1: timed out, nothing received
***Request timed out
server1: timed out, nothing received
***Request timed out
# ntpq -pcrv server2
server2: timed out, nothing received
***Request timed out
server2: timed out, nothing received
***Request timed out
--
Dave Close, Compata, Costa Mesa CA +1 714 434 7359
dave@compata.com dhclose@alumni.caltech.edu
Ralph Waldo Emerson:
"A foolish consistency is the hobgoblin of little minds,
adored by little statesmen and philosophers and divines.
With consistency a great soul has simply nothing to do."
--
Dave Close, Compata, Costa Mesa CA "Politics is the business of getting
dave@compata.com, +1 714 434 7359 power and privilege without
dhclose@alumni.caltech.edu possessing merit." - P. J. O'Rourke
-
Re: Reachable and rejected
Richard B. Gilbert wrote:
> What, if anything, leads you to believe that "server1" or "server2" are
> actually running NTP, are connected to the network, etc, etc?
That they respond to NTP queries with well formed responses, even if the
response indicates the time is too unreliable to use and in other ways
looks like a failing SNTP implementation attempt, which is not directly
connected to a reference clock.
The lack of response here is consistent with something that is not the
reference implemenation of NTP, but could be the result of network and
server security policies. To be honest, I would have been very
surprised if there had been a response, as the server is simply not
behaving in a way that is consistent with the reference implementation.
>
> What happens if you say:
> ping serveri
> ping server2
Total waste of time. We already know that something responds to those
addresses and a ping failure is very likely in the modern, paranoid, world.
Some basic SNMP queries are much more likely to be useful, as, if there
is a response, it will tell us what OS we are dealing with. Although my
original thought was Windows, I think that would have produced a
precision of -6. -7 suggests something with a 100Hz clock interrupt
rate, which is the typical Unix rate.
We could be dealing with a router, an appliance time server, or a weird
choice of NTP software on Unix. Although I believe that NTP should
indicate an unsynchronised state if the incoming root dispersion goes
excessive, I have seen an example here that seemed to contradict this,
so it is even possible that the real culprit is the stratum one server.
However, I think that the zero root delay is a strong clue that this
is an SNTP server operating outside the scope of SNTP, and possibly not
handling root dispersion validly.
> ??
>
> Getting a response to ping will show that they are connected to the
> network, have network software installed, etc, etc. If they respond to
> ping but not to nptq, that would suggest that ntpd is not running.
The respond to NTP client requests but not NTP management requests. No
need for the pings. Even the first word in the subject tells you that
they are responding to NTP!
-
Re: Reachable and rejected
David Woolley wrote:
> Richard B. Gilbert wrote:
>
>> What, if anything, leads you to believe that "server1" or "server2"
>> are actually running NTP, are connected to the network, etc, etc?
>
> That they respond to NTP queries with well formed responses, even if the
> response indicates the time is too unreliable to use and in other ways
> looks like a failing SNTP implementation attempt, which is not directly
> connected to a reference clock.
>
> The lack of response here is consistent with something that is not the
> reference implemenation of NTP, but could be the result of network and
> server security policies. To be honest, I would have been very
> surprised if there had been a response, as the server is simply not
> behaving in a way that is consistent with the reference implementation.
>>
>> What happens if you say:
>> ping serveri
>> ping server2
>
> Total waste of time. We already know that something responds to those
> addresses and a ping failure is very likely in the modern, paranoid, world.
>
> Some basic SNMP queries are much more likely to be useful, as, if there
> is a response, it will tell us what OS we are dealing with. Although my
> original thought was Windows, I think that would have produced a
> precision of -6. -7 suggests something with a 100Hz clock interrupt
> rate, which is the typical Unix rate.
>
> We could be dealing with a router, an appliance time server, or a weird
> choice of NTP software on Unix. Although I believe that NTP should
> indicate an unsynchronised state if the incoming root dispersion goes
> excessive, I have seen an example here that seemed to contradict this,
> so it is even possible that the real culprit is the stratum one server.
> However, I think that the zero root delay is a strong clue that this is
> an SNTP server operating outside the scope of SNTP, and possibly not
> handling root dispersion validly.
>> ??
>>
>> Getting a response to ping will show that they are connected to the
>> network, have network software installed, etc, etc. If they respond
>> to ping but not to nptq, that would suggest that ntpd is not running.
>
> The respond to NTP client requests but not NTP management requests. No
> need for the pings. Even the first word in the subject tells you that
> they are responding to NTP!
I'm going to have my breakfast rather than look up the details but if
they don't respond to NTP management requests (ntpq or ntpdc), it
suggests that they have been configured not to!