NTP slow to start correction after a drift
Apologies for a long post, but I was unable to make it shorter.
I have been monitoring timekeeping performance on an environment which
contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2.
The stratum 1s use time which is derived originally from GPS, but fed
to the stratum 1 clocks via IRIG.
The monitoring is carried out from a single Solaris system which takes
time from all seven servers.
Normally all clocks show times within +/- 4ms, but every 7-8 days I
see an event where all 7 clocks drift out by about 10-18 ms over a
period of 2-3 hours before they are corrected.
I am interpreting this as being due to drift in the local clock on the
Solaris box which is doing trhe monitoring, I would expect the stratum
2 servers to lag the stratum 1s if the time on the stratum 1 servers
was drifting due to some common-mode problem with their time
reference.
I am concerned about the length of time it takes before NTP starts
correcting the local clock on the Solaris server.
I have a graph which you can see at
<[url]http://www.flickr.com/photos/36096832@N00/2477948892/sizes/o/in/[/url]
set-72157604959850048/>
The above graph shows offset against time for all seven clocks. An
hour of steady state operation is shown before the beginning of the
drift event, the system has been in steady state for some days prior
to the drift event.
The poll interval is initially 1024 seconds.
The drift event starts about an hour into the graph, the offset
increases by about 15ms in about 2 hours (roughly 2ppm) then a
correction is applied and the clock drifts back to zero offset at
about the 3.5 hour mark.
I am concerned that the drift went uncorrected for so long, and am
trying to understand the cause.
Is the clock-filter algorithm rejecting updated timestamps which are
not the lowest of the most recent eight? From my reading of the book
and the RFCs, this is what should happen, but that means that the
clock can drift significantly before a new timestamp passes through
the clock filter algorithm.
To illustrate, here are the timestamp values for the three stratum 1
clocks over the period of the drift and the beginning of the
correction. The time base is the same as that of the graph.
Stratum 1 A
Time Offset Delay Dispersion
00:00:00 -0.000052 0.000600 0.000200 * Lowest delay of the most
recent 8 values.
00:17:04 0.000394 0.001850 0.000370
00:34:08 -0.000174 0.000630 0.000400
00:51:12 0.000908 0.000580 0.000890 * New lowest delay - drift
begins about here
01:08:16 0.002661 0.000630 0.002180
01:25:20 0.004790 0.000750 0.003190
01:42:24 0.007350 0.000600 0.004120
01:59:28 0.010072 0.000610 0.004750
02:16:32 0.012666 0.000600 0.004910
02:33:36 0.015004 0.000610 0.004730
02:50:40 0.017115 0.000600 0.004390
02:59:12 0.018362 0.001970 0.003390
* The 000580 delay has now expired, there are three timestamps with
000600 delays in the shift register, which is chosen?
Whichever is chosen, the offset has drifted significantly since the
last timestamp was passed from the clock-filter.
03:06:00 0.017913 0.000600 0.001630 * Correction has begun
03:10:16 0.017275 0.000610 0.001080
03:14:05 0.015433 0.000580 0.002120 * New lowest delay
03:16:13 0.013812 0.000630 0.002610
Stratum 1 B
Time Offset Delay Dispersion
00:12:47 -0.000637 0.010160 0.000260 * Lowest delay in shift
register is 0.009900
00:29:51 -0.000810 0.010330 0.000320
00:46:55 0.000029 0.010180 0.000690
01:03:59 0.001683 0.010240 0.002000 Drift begins about here
01:21:03 0.003762 0.010220 0.003050
01:38:07 0.006200 0.010220 0.003940
01:55:11 0.008894 0.010130 0.004610 * New lowest delay
02:12:15 0.011507 0.010030 0.004880 * New lowest delay
02:29:19 0.013935 0.010190 0.004810
02:46:23 0.016025 0.010150 0.004430
02:54:55 0.016739 0.010210 0.002870
03:03:27 0.017224 0.010160 0.001850
03:07:43 0.016871 0.010380 0.000870 * Correction has begun
03:11:59 0.016221 0.010100 0.000850 * New lowest delay
03:14:07 0.014934 0.010240 0.001620
03:16:15 0.013274 0.010150 0.002430
Stratum 1 C (Selected as Sync Server during the whole of this time)
Time Offset Delay Dispersion
00:01:52 -0.000076 0.009250 0.000200 *Lowest delay in shift
register is 0.009090
00:18:56 -0.000287 0.009230 0.000310
00:36:00 -0.000091 0.009160 0.000150
00:53:04 0.001073 0.009310 0.001190
* Delay of 0.009090 expires, new lowest delay is 0.009160
Drift begins about here
01:10:08 0.002899 0.009410 0.002400
01:27:12 0.005351 0.009630 0.003630
01:44:16 0.007630 0.009220 0.004070
02:01:20 0.010348 0.009250 0.004700
02:18:24 0.012981 0.009250 0.004910
02:35:28 0.015285 0.009200 0.004700
02:52:31 0.017373 0.009250 0.004360
* Delay of 0.009160 expires, new lowest delay is 0.009200
03:01:03 0.017929 0.009230 0.002690
03:06:32 0.018002 0.009190 0.001340 * New lowest delay
03:10:48 0.017277 0.009290 0.000990 * Correction has begun
03:13:14 0.016549 0.009190 0.001100
03:15:22 0.014858 0.009280 0.002150
Why is the polling interval maintained at 1024s for so long in the
presence of the drift?
Apart from reducing the maximum polling interval, what else could I do
to hasten the response to this kind of clock drift?
The offsets from the set of clocks normally remains within +/- 4ms,
which is sufficient for our needs, but a drift out beyond 15 ms is a
cause for concern. We are hoping to be able to maintain time to within
+/- 5ms of UTC on our NTP clients.
The drift rate seen here is about 2ppm. If the drift rate were about
6ppm and we saw the same slow response to the drift, the clock could
drift out by 50ms before the correction begins, this would definitely
be regarded as poor timekeeping, and would cause alarms to be raised.
I would be grateful for any comments or advice.
Regards,
Mike
Re: NTP slow to start correction after a drift
Mike K Smith <mks-usenet@dsl.pipex.com> writes:
[color=blue]
>Apologies for a long post, but I was unable to make it shorter.[/color]
[color=blue]
>I have been monitoring timekeeping performance on an environment which
>contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2.
>The stratum 1s use time which is derived originally from GPS, but fed
>to the stratum 1 clocks via IRIG.[/color]
[color=blue]
>The monitoring is carried out from a single Solaris system which takes
>time from all seven servers.[/color]
Why would you use a solaris system? AFAIK its kernel timeing routines are
primative. Use a Linux/BSD system.
[color=blue]
>Normally all clocks show times within +/- 4ms, but every 7-8 days I
>see an event where all 7 clocks drift out by about 10-18 ms over a
>period of 2-3 hours before they are corrected.[/color]
Yee gads. With GPS time you should be withing usec, not msec.
[color=blue]
>I am interpreting this as being due to drift in the local clock on the
>Solaris box which is doing trhe monitoring, I would expect the stratum
>2 servers to lag the stratum 1s if the time on the stratum 1 servers
>was drifting due to some common-mode problem with their time
>reference.[/color]
[color=blue]
>I am concerned about the length of time it takes before NTP starts
>correcting the local clock on the Solaris server.[/color]
[color=blue]
>I have a graph which you can see at
><[url]http://www.flickr.com/photos/36096832@N00/2477948892/sizes/o/in/[/url]
>set-72157604959850048/>[/color]
[color=blue]
>The above graph shows offset against time for all seven clocks. An
>hour of steady state operation is shown before the beginning of the
>drift event, the system has been in steady state for some days prior
>to the drift event.[/color]
[color=blue]
>The poll interval is initially 1024 seconds.[/color]
So nothing can be corrected in times less than may times 1024 sec ( ie
hours).
ntp is designed to make sure tht nothing happends on time scales shorter
than many times the poll interval to maintian stability.
[color=blue]
>The drift event starts about an hour into the graph, the offset
>increases by about 15ms in about 2 hours (roughly 2ppm) then a
>correction is applied and the clock drifts back to zero offset at
>about the 3.5 hour mark.[/color]
[color=blue]
>I am concerned that the drift went uncorrected for so long, and am
>trying to understand the cause.[/color]
ntp design.
[color=blue]
>Is the clock-filter algorithm rejecting updated timestamps which are
>not the lowest of the most recent eight? From my reading of the book
>and the RFCs, this is what should happen, but that means that the
>clock can drift significantly before a new timestamp passes through
>the clock filter algorithm.[/color]
Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
3 hours. and ntp can only correct on time scales longer than that. Design
decision.
Re: NTP slow to start correction after a drift
Do you have the frequency data from the same period as the graph? What
happened to cause the frequency to be off all of a sudden?
Brian Utterback
Re: NTP slow to start correction after a drift
Hi Brian,
On 9 May, 19:06, Brian Utterback <brian.utterb...@sun.com> wrote:[color=blue]
> Do you have the frequency data from the same period as the graph? What
> happened to cause the frequency to be off all of a sudden?[/color]
Loopstats weren't enabled so I don't have the frequency data. I'm out
of the office today but will set up loopstats tomorrow. I should be
able to look at the frequency along with the peer data next time I see
this behaviour.
Mike
Re: NTP slow to start correction after a drift
On 9 May, 16:46, Unruh <unruh-s...@physics.ubc.ca> wrote:
[color=blue]
> Why would you use a solaris system? AFAIK its kernel timeing routines are
> primative. Use a Linux/BSD system.[/color]
This is an existing system which I don't have the means to change even
if I felt that Solaris were somehow intrinsically inferior to Linux or
BSD. I have worked with Solaris for a long time.
[color=blue][color=green]
> >Normally all clocks show times within +/- 4ms, but every 7-8 days I
> >see an event where all 7 clocks drift out by about 10-18 ms over a
> >period of 2-3 hours before they are corrected.[/color]
>
> Yee gads. With GPS time you should be withing usec, not msec.[/color]
The median time for each clock measured over the course of a week has
an offset within microseconds. The 1% and 99% centiles are around -4ms
and +4ms, again measured over a week.
I'll try to look into the causes of dispersion later, slow drift
correction is a bigger and more immediate problem.
[color=blue]
>[color=green]
> >I am interpreting this as being due to drift in the local clock on the
> >Solaris box which is doing trhe monitoring, I would expect the stratum
> >2 servers to lag the stratum 1s if the time on the stratum 1 servers
> >was drifting due to some common-mode problem with their time
> >reference.
> >I am concerned about the length of time it takes before NTP starts
> >correcting the local clock on the Solaris server.
> >I have a graph which you can see at
> ><[url]http://www.flickr.com/photos/36096832@N00/2477948892/sizes/o/in/[/url]
> >set-72157604959850048/>
> >The above graph shows offset against time for all seven clocks. An
> >hour of steady state operation is shown before the beginning of the
> >drift event, the system has been in steady state for some days prior
> >to the drift event.
> >The poll interval is initially 1024 seconds.[/color]
>
> So nothing can be corrected in times less than may times 1024 sec ( ie
> hours).
> ntp is designed to make sure tht nothing happends on time scales shorter
> than many times the poll interval to maintian stability.[/color]
I knew that NTP is bised towards long-term stability, but I hadn't
realised that it was quite that inflexible, I had expected that the
poll interval would decrease more rapidly in the event of drift.
[color=blue][color=green]
> >The drift event starts about an hour into the graph, the offset
> >increases by about 15ms in about 2 hours (roughly 2ppm) then a
> >correction is applied and the clock drifts back to zero offset at
> >about the 3.5 hour mark.
> >I am concerned that the drift went uncorrected for so long, and am
> >trying to understand the cause.[/color]
>
> ntp design.
>[color=green]
> >Is the clock-filter algorithm rejecting updated timestamps which are
> >not the lowest of the most recent eight? From my reading of the book
> >and the RFCs, this is what should happen, but that means that the
> >clock can drift significantly before a new timestamp passes through
> >the clock filter algorithm.[/color]
>
> Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
> 3 hours. and ntp can only correct on time scales longer than that. Design
> decision.[/color]
Thanks for the comments. As with the use of Solaris, I don't have the
option to throw out NTP and replace it with something else, so I have
to try to make the best use of it.
Looks like I should be reducing maxpoll. I guess the design of NTP is
optimised for clocks with predictable drift rates, and a sudden
variation in drift rate takes longer to correct.
I would appreciate comments from other regulars who are more closely
linked with the development and maintenance of NTP, too.
Thanks,
Mike
Re: NTP slow to start correction after a drift
Mike K Smith wrote:[color=blue]
> On 9 May, 16:46, Unruh <unruh-s...@physics.ubc.ca> wrote:
>[color=green]
>> Why would you use a solaris system? AFAIK its kernel timeing routines are
>> primative. Use a Linux/BSD system.[/color]
> This is an existing system which I don't have the means to change even
> if I felt that Solaris were somehow intrinsically inferior to Linux or
> BSD. I have worked with Solaris for a long time.
>[color=green][color=darkred]
>>> Normally all clocks show times within +/- 4ms, but every 7-8 days I
>>> see an event where all 7 clocks drift out by about 10-18 ms over a
>>> period of 2-3 hours before they are corrected.[/color]
>> Yee gads. With GPS time you should be withing usec, not msec.[/color]
> The median time for each clock measured over the course of a week has
> an offset within microseconds. The 1% and 99% centiles are around -4ms
> and +4ms, again measured over a week.
>
> I'll try to look into the causes of dispersion later, slow drift
> correction is a bigger and more immediate problem.
>[color=green][color=darkred]
>>> I am interpreting this as being due to drift in the local clock on the
>>> Solaris box which is doing trhe monitoring, I would expect the stratum
>>> 2 servers to lag the stratum 1s if the time on the stratum 1 servers
>>> was drifting due to some common-mode problem with their time
>>> reference.
>>> I am concerned about the length of time it takes before NTP starts
>>> correcting the local clock on the Solaris server.
>>> I have a graph which you can see at
>>> <[url]http://www.flickr.com/photos/36096832@N00/2477948892/sizes/o/in/[/url]
>>> set-72157604959850048/>
>>> The above graph shows offset against time for all seven clocks. An
>>> hour of steady state operation is shown before the beginning of the
>>> drift event, the system has been in steady state for some days prior
>>> to the drift event.
>>> The poll interval is initially 1024 seconds.[/color]
>> So nothing can be corrected in times less than may times 1024 sec ( ie
>> hours).
>> ntp is designed to make sure tht nothing happends on time scales shorter
>> than many times the poll interval to maintian stability.[/color]
>
> I knew that NTP is bised towards long-term stability, but I hadn't
> realised that it was quite that inflexible, I had expected that the
> poll interval would decrease more rapidly in the event of drift.
>[color=green][color=darkred]
>>> The drift event starts about an hour into the graph, the offset
>>> increases by about 15ms in about 2 hours (roughly 2ppm) then a
>>> correction is applied and the clock drifts back to zero offset at
>>> about the 3.5 hour mark.
>>> I am concerned that the drift went uncorrected for so long, and am
>>> trying to understand the cause.[/color]
>> ntp design.
>>[color=darkred]
>>> Is the clock-filter algorithm rejecting updated timestamps which are
>>> not the lowest of the most recent eight? From my reading of the book
>>> and the RFCs, this is what should happen, but that means that the
>>> clock can drift significantly before a new timestamp passes through
>>> the clock filter algorithm.[/color]
>> Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
>> 3 hours. and ntp can only correct on time scales longer than that. Design
>> decision.[/color]
>
> Thanks for the comments. As with the use of Solaris, I don't have the
> option to throw out NTP and replace it with something else, so I have
> to try to make the best use of it.
>
> Looks like I should be reducing maxpoll. I guess the design of NTP is
> optimised for clocks with predictable drift rates, and a sudden
> variation in drift rate takes longer to correct.
>[/color]
You DO know that NTPD adjusts the poll interval to fit the current
conditions??? It will increase the poll interval to MAXPOLL only when
the clock is stable and very close to being correct. The default values
of MINPOLL and MAXPOLL are correct for all but the weirdest cases.
Are you operating your machines in a controlled (temperature)
environment? If the temperature bounces around, so will your clock.
NTPD will correct it but if the temperature drops five degrees in five
minutes when the air conditioning kicks in, NTPD may have a little
difficulty keeping up.
Re: NTP slow to start correction after a drift
Mike K Smith wrote:
[color=blue]
> Looks like I should be reducing maxpoll. I guess the design of NTP is[/color]
As I understand it, the loop time constant determines the poll interval,
but the poll interval doesn't constrain the loop time constant, so
reducing maxpoll will not make the system significantly more responsive
to anything except a complete failure of a time source.
[color=blue]
> optimised for clocks with predictable drift rates, and a sudden
> variation in drift rate takes longer to correct.
>[/color]
Re: NTP slow to start correction after a drift
David and others,
The adaptive poll algorithm evolved over many years and many variations.
A summary follows.
1. The poll will not be less than the maximum of the peer poll and
minpoll. The maximum poll will not be greater than maxpoll. This is to
protect the network.
2. The time constant will not be less than minpoll nor greater than
maxpoll of the system peer. This is to allow the user to constrain the
time constant for some purpose. Note that if the maxpoll is different as
the system peer changes, some swish and sway must be expected. That's
why the ACTS driver is disabled if other peers are active.
3. Subject to the above constraints, a jiggle counter increments by the
value of the time constant when the current clock offset is less than
twice the clock jitter and otherwise decrements by twice this value. If
the jiggle counter exceeds +30, the time constant increments by one. If
it falls below -30 it decrements by one and in both cases the jiggle
counter is reset to zero.
The design is intended to
1. Always poll at twice the Nyquist rate with any time constant. Serious
audiophiles and DSP engineers will recognize the needfor this.
2. The poll value here is the exponent of two to yield the actual poll
interval. This is chosen to match the Allan deviation characteristic
which results in straight lines in log-log coordinates.
3. The time constant increases slowly to higher intervals and decreases
quickly to lower intervals.
4. The time constant adapts more rapidly at higher polls and more slowly
at the lower intervals.
I hope this explains the behavior you report.
Dave
David Woolley wrote:
[color=blue]
> Mike K Smith wrote:
>[color=green]
>> Looks like I should be reducing maxpoll. I guess the design of NTP is[/color]
>
>
> As I understand it, the loop time constant determines the poll interval,
> but the poll interval doesn't constrain the loop time constant, so
> reducing maxpoll will not make the system significantly more responsive
> to anything except a complete failure of a time source.
>[color=green]
>> optimised for clocks with predictable drift rates, and a sudden
>> variation in drift rate takes longer to correct.
>>[/color][/color]
Re: NTP slow to start correction after a drift
On 12 May, 15:16, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:[color=blue]
> Mike K Smith wrote:[/color]
[color=blue][color=green]
> > Looks like I should be reducing maxpoll. I guess the design of NTP is
> > optimised for clocks with predictable drift rates, and a sudden
> > variation in drift rate takes longer to correct.[/color]
>
> You DO know that NTPD adjusts the poll interval to fit the current
> conditions??? *It will increase the poll interval to MAXPOLL only when
> the clock is stable and very close to being correct. *The default values
> of MINPOLL and MAXPOLL are correct for all but the weirdest cases.[/color]
I know that ntpd adjusts the poll interval to fit the current
conditions, but I am describing a case where the current conditions
changed. The clock had been stable for around a week, and the polling
interval had increased to 1024 seconds, then something changed. It
looks like the clock started drifting by about 2ppm, the poll interval
didn't change for three hours causing a 15ms offset before beginning
to correct the drift.
I initiated this thread to help me understand why ntpd took so long to
respond. I had expected to see the poll interval decrease and the
offset swing back towards zero after the first couple of polls showed
the increased offset.
[color=blue]
> Are you operating your machines in a controlled (temperature)
> environment? *If the temperature bounces around, so will your clock.
> NTPD will correct it but if the temperature drops five degrees in five
> minutes when the air conditioning kicks in, NTPD may have a little
> difficulty keeping up.[/color]
The systems are in air-conditioned equipment rooms, I wasn't expecting
to frequency changes due to temperature.
Re: NTP slow to start correction after a drift
Mike K Smith wrote:[color=blue]
> On 12 May, 15:16, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:[color=green]
>> Mike K Smith wrote:[/color]
>[color=green][color=darkred]
>>> Looks like I should be reducing maxpoll. I guess the design of NTP is
>>> optimised for clocks with predictable drift rates, and a sudden
>>> variation in drift rate takes longer to correct.[/color]
>> You DO know that NTPD adjusts the poll interval to fit the current
>> conditions??? It will increase the poll interval to MAXPOLL only when
>> the clock is stable and very close to being correct. The default values
>> of MINPOLL and MAXPOLL are correct for all but the weirdest cases.[/color]
>
> I know that ntpd adjusts the poll interval to fit the current
> conditions, but I am describing a case where the current conditions
> changed. The clock had been stable for around a week, and the polling
> interval had increased to 1024 seconds, then something changed. It
> looks like the clock started drifting by about 2ppm, the poll interval
> didn't change for three hours causing a 15ms offset before beginning
> to correct the drift.
> I initiated this thread to help me understand why ntpd took so long to
> respond. I had expected to see the poll interval decrease and the
> offset swing back towards zero after the first couple of polls showed
> the increased offset.
>[color=green]
>> Are you operating your machines in a controlled (temperature)
>> environment? If the temperature bounces around, so will your clock.
>> NTPD will correct it but if the temperature drops five degrees in five
>> minutes when the air conditioning kicks in, NTPD may have a little
>> difficulty keeping up.[/color]
>
> The systems are in air-conditioned equipment rooms, I wasn't expecting
> to frequency changes due to temperature.[/color]
Do you monitor the temperature? Many data centers have a clock driven
chart recorder that records the temperature and humidity. If the
temperature changes the clock WILL be affected
I can't tell you why NPTD took "so long" to jump on a 15 millisecond
error, that's a problem for the mathematicians/control systems theory guys.
If you need synchronization and/or accuracy closer than that, you may need:
a. Better environmental control/monitoring, or
b. A better clock (OCXO, TCXO). This could get expensive; computer
clocks use basically the same mechanism as a cheap "quartz" wristwatch
but lack the temperature control that usually keeps the wristwatch
somewhere near 98.6 degrees Fahrenheit. Would you be upset if your
wristwatch gained or lost thirty seconds per month?
c. A different tool than NTPD for the job. Some people advocate a tool
called "chrony", something with which I have no experience!
Re: NTP slow to start correction after a drift
Mike K Smith <mks-usenet@dsl.pipex.com> writes:
[color=blue]
>On 12 May, 15:16, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:[color=green]
>> Mike K Smith wrote:[/color][/color]
[color=blue][color=green][color=darkred]
>> > Looks like I should be reducing maxpoll. I guess the design of NTP is
>> > optimised for clocks with predictable drift rates, and a sudden
>> > variation in drift rate takes longer to correct.[/color]
>>
>> You DO know that NTPD adjusts the poll interval to fit the current
>> conditions??? =A0It will increase the poll interval to MAXPOLL only when
>> the clock is stable and very close to being correct. =A0The default values=[/color][/color]
[color=blue][color=green]
>> of MINPOLL and MAXPOLL are correct for all but the weirdest cases.[/color][/color]
[color=blue]
>I know that ntpd adjusts the poll interval to fit the current
>conditions, but I am describing a case where the current conditions
>changed. The clock had been stable for around a week, and the polling
>interval had increased to 1024 seconds, then something changed. It
>looks like the clock started drifting by about 2ppm, the poll interval
>didn't change for three hours causing a 15ms offset before beginning
>to correct the drift.[/color]
with a poll interval of 1024 the actual poll is about 8000 sec ( after the
clock filter which throws away about 7 out of 8 data points). That is about
2 hours, so it is impossible for the system to even recognize that
something has happened in less than about 2 hours. It can then try to start
correcting and start to try to reduce the poll interval. Why does it throw
away all that data? It is believed that the gain in using the minimum delay
out of 8 is more than the loss in responsiveness, and in accuracy. (The
procedure is to try to get rid of data which might have a large assymetric
drift. ) This means that if the clock is 10ms out and the delay is .1ms, it
may still be thrown out since that .1 ms is greater than the .095 ms
achieved 7 poll intervals ago, despite the fact that the data shows
incontrovertably that the clock is having far more problems than could
ever be hidden in the delay.
[color=blue]
>I initiated this thread to help me understand why ntpd took so long to
>respond. I had expected to see the poll interval decrease and the
>offset swing back towards zero after the first couple of polls showed
>the increased offset.[/color]
[color=blue][color=green]
>> Are you operating your machines in a controlled (temperature)
>> environment? =A0If the temperature bounces around, so will your clock.
>> NTPD will correct it but if the temperature drops five degrees in five
>> minutes when the air conditioning kicks in, NTPD may have a little
>> difficulty keeping up.[/color][/color]
[color=blue]
>The systems are in air-conditioned equipment rooms, I wasn't expecting
>to frequency changes due to temperature.[/color]
Re: NTP slow to start correction after a drift
Bill,
You seem to have a tack up your tail about the clock filter algorithm.
First, you didn't respond to my message about sampling at twice the
Nyquist rate, even if a burst of seven samples is lost.
Second, look at the clock filter algorithm code and comments. Samples
older than the Allan intercept (default 2000 s) are effectively
discarded. Thus, only the latest sample is used and the next older used
only to compute the peer jitter.
Third, if you recall my recent message about the poll algorithm, you
know the jiggle counter is reduced if the (combined) clock offset
exceeds twice the clock jitter. With the constants revealed in my prior
message, and if the clock frequency is yanked 1 PPM by a Grue, all it
takes is two samples and the poll interval/time constant drops by half.
Dave
Unruh wrote:
[color=blue]
> Mike K Smith <mks-usenet@dsl.pipex.com> writes:
>
>[color=green]
>>On 12 May, 15:16, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:
>>[color=darkred]
>>>Mike K Smith wrote:[/color][/color]
>
>[color=green][color=darkred]
>>>>Looks like I should be reducing maxpoll. I guess the design of NTP is
>>>>optimised for clocks with predictable drift rates, and a sudden
>>>>variation in drift rate takes longer to correct.
>>>
>>>You DO know that NTPD adjusts the poll interval to fit the current
>>>conditions??? =A0It will increase the poll interval to MAXPOLL only when
>>>the clock is stable and very close to being correct. =A0The default values=[/color][/color]
>
>[color=green][color=darkred]
>>>of MINPOLL and MAXPOLL are correct for all but the weirdest cases.[/color][/color]
>
>[color=green]
>>I know that ntpd adjusts the poll interval to fit the current
>>conditions, but I am describing a case where the current conditions
>>changed. The clock had been stable for around a week, and the polling
>>interval had increased to 1024 seconds, then something changed. It
>>looks like the clock started drifting by about 2ppm, the poll interval
>>didn't change for three hours causing a 15ms offset before beginning
>>to correct the drift.[/color]
>
>
> with a poll interval of 1024 the actual poll is about 8000 sec ( after the
> clock filter which throws away about 7 out of 8 data points). That is about
> 2 hours, so it is impossible for the system to even recognize that
> something has happened in less than about 2 hours. It can then try to start
> correcting and start to try to reduce the poll interval. Why does it throw
> away all that data? It is believed that the gain in using the minimum delay
> out of 8 is more than the loss in responsiveness, and in accuracy. (The
> procedure is to try to get rid of data which might have a large assymetric
> drift. ) This means that if the clock is 10ms out and the delay is .1ms, it
> may still be thrown out since that .1 ms is greater than the .095 ms
> achieved 7 poll intervals ago, despite the fact that the data shows
> incontrovertably that the clock is having far more problems than could
> ever be hidden in the delay.
>
>[color=green]
>>I initiated this thread to help me understand why ntpd took so long to
>>respond. I had expected to see the poll interval decrease and the
>>offset swing back towards zero after the first couple of polls showed
>>the increased offset.[/color]
>
>[color=green][color=darkred]
>>>Are you operating your machines in a controlled (temperature)
>>>environment? =A0If the temperature bounces around, so will your clock.
>>>NTPD will correct it but if the temperature drops five degrees in five
>>>minutes when the air conditioning kicks in, NTPD may have a little
>>>difficulty keeping up.[/color][/color]
>
>[color=green]
>>The systems are in air-conditioned equipment rooms, I wasn't expecting
>>to frequency changes due to temperature.[/color][/color]