NTP slow to start correction after a drift - NTP

This is a discussion on NTP slow to start correction after a drift - NTP ; Apologies for a long post, but I was unable to make it shorter. I have been monitoring timekeeping performance on an environment which contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2. The stratum 1s use ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: NTP slow to start correction after a drift

  1. NTP slow to start correction after a drift

    Apologies for a long post, but I was unable to make it shorter.

    I have been monitoring timekeeping performance on an environment which
    contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2.
    The stratum 1s use time which is derived originally from GPS, but fed
    to the stratum 1 clocks via IRIG.

    The monitoring is carried out from a single Solaris system which takes
    time from all seven servers.

    Normally all clocks show times within +/- 4ms, but every 7-8 days I
    see an event where all 7 clocks drift out by about 10-18 ms over a
    period of 2-3 hours before they are corrected.

    I am interpreting this as being due to drift in the local clock on the
    Solaris box which is doing trhe monitoring, I would expect the stratum
    2 servers to lag the stratum 1s if the time on the stratum 1 servers
    was drifting due to some common-mode problem with their time
    reference.

    I am concerned about the length of time it takes before NTP starts
    correcting the local clock on the Solaris server.

    I have a graph which you can see at
    <http://www.flickr.com/photos/3609683...92/sizes/o/in/
    set-72157604959850048/>

    The above graph shows offset against time for all seven clocks. An
    hour of steady state operation is shown before the beginning of the
    drift event, the system has been in steady state for some days prior
    to the drift event.

    The poll interval is initially 1024 seconds.

    The drift event starts about an hour into the graph, the offset
    increases by about 15ms in about 2 hours (roughly 2ppm) then a
    correction is applied and the clock drifts back to zero offset at
    about the 3.5 hour mark.

    I am concerned that the drift went uncorrected for so long, and am
    trying to understand the cause.

    Is the clock-filter algorithm rejecting updated timestamps which are
    not the lowest of the most recent eight? From my reading of the book
    and the RFCs, this is what should happen, but that means that the
    clock can drift significantly before a new timestamp passes through
    the clock filter algorithm.

    To illustrate, here are the timestamp values for the three stratum 1
    clocks over the period of the drift and the beginning of the
    correction. The time base is the same as that of the graph.

    Stratum 1 A
    Time Offset Delay Dispersion
    00:00:00 -0.000052 0.000600 0.000200 * Lowest delay of the most
    recent 8 values.
    00:17:04 0.000394 0.001850 0.000370
    00:34:08 -0.000174 0.000630 0.000400
    00:51:12 0.000908 0.000580 0.000890 * New lowest delay - drift
    begins about here
    01:08:16 0.002661 0.000630 0.002180
    01:25:20 0.004790 0.000750 0.003190
    01:42:24 0.007350 0.000600 0.004120
    01:59:28 0.010072 0.000610 0.004750
    02:16:32 0.012666 0.000600 0.004910
    02:33:36 0.015004 0.000610 0.004730
    02:50:40 0.017115 0.000600 0.004390
    02:59:12 0.018362 0.001970 0.003390
    * The 000580 delay has now expired, there are three timestamps with
    000600 delays in the shift register, which is chosen?
    Whichever is chosen, the offset has drifted significantly since the
    last timestamp was passed from the clock-filter.
    03:06:00 0.017913 0.000600 0.001630 * Correction has begun
    03:10:16 0.017275 0.000610 0.001080
    03:14:05 0.015433 0.000580 0.002120 * New lowest delay
    03:16:13 0.013812 0.000630 0.002610


    Stratum 1 B
    Time Offset Delay Dispersion
    00:12:47 -0.000637 0.010160 0.000260 * Lowest delay in shift
    register is 0.009900
    00:29:51 -0.000810 0.010330 0.000320
    00:46:55 0.000029 0.010180 0.000690
    01:03:59 0.001683 0.010240 0.002000 Drift begins about here
    01:21:03 0.003762 0.010220 0.003050
    01:38:07 0.006200 0.010220 0.003940
    01:55:11 0.008894 0.010130 0.004610 * New lowest delay
    02:12:15 0.011507 0.010030 0.004880 * New lowest delay
    02:29:19 0.013935 0.010190 0.004810
    02:46:23 0.016025 0.010150 0.004430
    02:54:55 0.016739 0.010210 0.002870
    03:03:27 0.017224 0.010160 0.001850
    03:07:43 0.016871 0.010380 0.000870 * Correction has begun
    03:11:59 0.016221 0.010100 0.000850 * New lowest delay
    03:14:07 0.014934 0.010240 0.001620
    03:16:15 0.013274 0.010150 0.002430


    Stratum 1 C (Selected as Sync Server during the whole of this time)
    Time Offset Delay Dispersion
    00:01:52 -0.000076 0.009250 0.000200 *Lowest delay in shift
    register is 0.009090
    00:18:56 -0.000287 0.009230 0.000310
    00:36:00 -0.000091 0.009160 0.000150
    00:53:04 0.001073 0.009310 0.001190
    * Delay of 0.009090 expires, new lowest delay is 0.009160
    Drift begins about here
    01:10:08 0.002899 0.009410 0.002400
    01:27:12 0.005351 0.009630 0.003630
    01:44:16 0.007630 0.009220 0.004070
    02:01:20 0.010348 0.009250 0.004700
    02:18:24 0.012981 0.009250 0.004910
    02:35:28 0.015285 0.009200 0.004700
    02:52:31 0.017373 0.009250 0.004360
    * Delay of 0.009160 expires, new lowest delay is 0.009200
    03:01:03 0.017929 0.009230 0.002690
    03:06:32 0.018002 0.009190 0.001340 * New lowest delay
    03:10:48 0.017277 0.009290 0.000990 * Correction has begun
    03:13:14 0.016549 0.009190 0.001100
    03:15:22 0.014858 0.009280 0.002150

    Why is the polling interval maintained at 1024s for so long in the
    presence of the drift?
    Apart from reducing the maximum polling interval, what else could I do
    to hasten the response to this kind of clock drift?

    The offsets from the set of clocks normally remains within +/- 4ms,
    which is sufficient for our needs, but a drift out beyond 15 ms is a
    cause for concern. We are hoping to be able to maintain time to within
    +/- 5ms of UTC on our NTP clients.

    The drift rate seen here is about 2ppm. If the drift rate were about
    6ppm and we saw the same slow response to the drift, the clock could
    drift out by 50ms before the correction begins, this would definitely
    be regarded as poor timekeeping, and would cause alarms to be raised.

    I would be grateful for any comments or advice.

    Regards,

    Mike




  2. Re: NTP slow to start correction after a drift

    Mike K Smith writes:

    >Apologies for a long post, but I was unable to make it shorter.


    >I have been monitoring timekeeping performance on an environment which
    >contains 3 stratum 1 clocks and 4 Cisco routers running as stratum 2.
    >The stratum 1s use time which is derived originally from GPS, but fed
    >to the stratum 1 clocks via IRIG.


    >The monitoring is carried out from a single Solaris system which takes
    >time from all seven servers.


    Why would you use a solaris system? AFAIK its kernel timeing routines are
    primative. Use a Linux/BSD system.


    >Normally all clocks show times within +/- 4ms, but every 7-8 days I
    >see an event where all 7 clocks drift out by about 10-18 ms over a
    >period of 2-3 hours before they are corrected.


    Yee gads. With GPS time you should be withing usec, not msec.


    >I am interpreting this as being due to drift in the local clock on the
    >Solaris box which is doing trhe monitoring, I would expect the stratum
    >2 servers to lag the stratum 1s if the time on the stratum 1 servers
    >was drifting due to some common-mode problem with their time
    >reference.


    >I am concerned about the length of time it takes before NTP starts
    >correcting the local clock on the Solaris server.


    >I have a graph which you can see at
    ><http://www.flickr.com/photos/3609683...92/sizes/o/in/
    >set-72157604959850048/>


    >The above graph shows offset against time for all seven clocks. An
    >hour of steady state operation is shown before the beginning of the
    >drift event, the system has been in steady state for some days prior
    >to the drift event.


    >The poll interval is initially 1024 seconds.


    So nothing can be corrected in times less than may times 1024 sec ( ie
    hours).
    ntp is designed to make sure tht nothing happends on time scales shorter
    than many times the poll interval to maintian stability.



    >The drift event starts about an hour into the graph, the offset
    >increases by about 15ms in about 2 hours (roughly 2ppm) then a
    >correction is applied and the clock drifts back to zero offset at
    >about the 3.5 hour mark.


    >I am concerned that the drift went uncorrected for so long, and am
    >trying to understand the cause.


    ntp design.


    >Is the clock-filter algorithm rejecting updated timestamps which are
    >not the lowest of the most recent eight? From my reading of the book
    >and the RFCs, this is what should happen, but that means that the
    >clock can drift significantly before a new timestamp passes through
    >the clock filter algorithm.


    Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
    3 hours. and ntp can only correct on time scales longer than that. Design
    decision.



  3. Re: NTP slow to start correction after a drift

    Do you have the frequency data from the same period as the graph? What
    happened to cause the frequency to be off all of a sudden?

    Brian Utterback

  4. Re: NTP slow to start correction after a drift

    Hi Brian,

    On 9 May, 19:06, Brian Utterback wrote:
    > Do you have the frequency data from the same period as the graph? What
    > happened to cause the frequency to be off all of a sudden?


    Loopstats weren't enabled so I don't have the frequency data. I'm out
    of the office today but will set up loopstats tomorrow. I should be
    able to look at the frequency along with the peer data next time I see
    this behaviour.

    Mike



  5. Re: NTP slow to start correction after a drift

    On 9 May, 16:46, Unruh wrote:

    > Why would you use a solaris system? AFAIK its kernel timeing routines are
    > primative. Use a Linux/BSD system.

    This is an existing system which I don't have the means to change even
    if I felt that Solaris were somehow intrinsically inferior to Linux or
    BSD. I have worked with Solaris for a long time.

    > >Normally all clocks show times within +/- 4ms, but every 7-8 days I
    > >see an event where all 7 clocks drift out by about 10-18 ms over a
    > >period of 2-3 hours before they are corrected.

    >
    > Yee gads. With GPS time you should be withing usec, not msec.

    The median time for each clock measured over the course of a week has
    an offset within microseconds. The 1% and 99% centiles are around -4ms
    and +4ms, again measured over a week.

    I'll try to look into the causes of dispersion later, slow drift
    correction is a bigger and more immediate problem.

    >
    > >I am interpreting this as being due to drift in the local clock on the
    > >Solaris box which is doing trhe monitoring, I would expect the stratum
    > >2 servers to lag the stratum 1s if the time on the stratum 1 servers
    > >was drifting due to some common-mode problem with their time
    > >reference.
    > >I am concerned about the length of time it takes before NTP starts
    > >correcting the local clock on the Solaris server.
    > >I have a graph which you can see at
    > ><http://www.flickr.com/photos/3609683...92/sizes/o/in/
    > >set-72157604959850048/>
    > >The above graph shows offset against time for all seven clocks. An
    > >hour of steady state operation is shown before the beginning of the
    > >drift event, the system has been in steady state for some days prior
    > >to the drift event.
    > >The poll interval is initially 1024 seconds.

    >
    > So nothing can be corrected in times less than may times 1024 sec ( ie
    > hours).
    > ntp is designed to make sure tht nothing happends on time scales shorter
    > than many times the poll interval to maintian stability.


    I knew that NTP is bised towards long-term stability, but I hadn't
    realised that it was quite that inflexible, I had expected that the
    poll interval would decrease more rapidly in the event of drift.

    > >The drift event starts about an hour into the graph, the offset
    > >increases by about 15ms in about 2 hours (roughly 2ppm) then a
    > >correction is applied and the clock drifts back to zero offset at
    > >about the 3.5 hour mark.
    > >I am concerned that the drift went uncorrected for so long, and am
    > >trying to understand the cause.

    >
    > ntp design.
    >
    > >Is the clock-filter algorithm rejecting updated timestamps which are
    > >not the lowest of the most recent eight? From my reading of the book
    > >and the RFCs, this is what should happen, but that means that the
    > >clock can drift significantly before a new timestamp passes through
    > >the clock filter algorithm.

    >
    > Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
    > 3 hours. and ntp can only correct on time scales longer than that. Design
    > decision.


    Thanks for the comments. As with the use of Solaris, I don't have the
    option to throw out NTP and replace it with something else, so I have
    to try to make the best use of it.

    Looks like I should be reducing maxpoll. I guess the design of NTP is
    optimised for clocks with predictable drift rates, and a sudden
    variation in drift rate takes longer to correct.

    I would appreciate comments from other regulars who are more closely
    linked with the development and maintenance of NTP, too.

    Thanks,

    Mike





  6. Re: NTP slow to start correction after a drift

    Mike K Smith wrote:
    > On 9 May, 16:46, Unruh wrote:
    >
    >> Why would you use a solaris system? AFAIK its kernel timeing routines are
    >> primative. Use a Linux/BSD system.

    > This is an existing system which I don't have the means to change even
    > if I felt that Solaris were somehow intrinsically inferior to Linux or
    > BSD. I have worked with Solaris for a long time.
    >
    >>> Normally all clocks show times within +/- 4ms, but every 7-8 days I
    >>> see an event where all 7 clocks drift out by about 10-18 ms over a
    >>> period of 2-3 hours before they are corrected.

    >> Yee gads. With GPS time you should be withing usec, not msec.

    > The median time for each clock measured over the course of a week has
    > an offset within microseconds. The 1% and 99% centiles are around -4ms
    > and +4ms, again measured over a week.
    >
    > I'll try to look into the causes of dispersion later, slow drift
    > correction is a bigger and more immediate problem.
    >
    >>> I am interpreting this as being due to drift in the local clock on the
    >>> Solaris box which is doing trhe monitoring, I would expect the stratum
    >>> 2 servers to lag the stratum 1s if the time on the stratum 1 servers
    >>> was drifting due to some common-mode problem with their time
    >>> reference.
    >>> I am concerned about the length of time it takes before NTP starts
    >>> correcting the local clock on the Solaris server.
    >>> I have a graph which you can see at
    >>> <http://www.flickr.com/photos/3609683...92/sizes/o/in/
    >>> set-72157604959850048/>
    >>> The above graph shows offset against time for all seven clocks. An
    >>> hour of steady state operation is shown before the beginning of the
    >>> drift event, the system has been in steady state for some days prior
    >>> to the drift event.
    >>> The poll interval is initially 1024 seconds.

    >> So nothing can be corrected in times less than may times 1024 sec ( ie
    >> hours).
    >> ntp is designed to make sure tht nothing happends on time scales shorter
    >> than many times the poll interval to maintian stability.

    >
    > I knew that NTP is bised towards long-term stability, but I hadn't
    > realised that it was quite that inflexible, I had expected that the
    > poll interval would decrease more rapidly in the event of drift.
    >
    >>> The drift event starts about an hour into the graph, the offset
    >>> increases by about 15ms in about 2 hours (roughly 2ppm) then a
    >>> correction is applied and the clock drifts back to zero offset at
    >>> about the 3.5 hour mark.
    >>> I am concerned that the drift went uncorrected for so long, and am
    >>> trying to understand the cause.

    >> ntp design.
    >>
    >>> Is the clock-filter algorithm rejecting updated timestamps which are
    >>> not the lowest of the most recent eight? From my reading of the book
    >>> and the RFCs, this is what should happen, but that means that the
    >>> clock can drift significantly before a new timestamp passes through
    >>> the clock filter algorithm.

    >> Yes. ntp only uses about 1/8 of the data. Ie your actual time span is about
    >> 3 hours. and ntp can only correct on time scales longer than that. Design
    >> decision.

    >
    > Thanks for the comments. As with the use of Solaris, I don't have the
    > option to throw out NTP and replace it with something else, so I have
    > to try to make the best use of it.
    >
    > Looks like I should be reducing maxpoll. I guess the design of NTP is
    > optimised for clocks with predictable drift rates, and a sudden
    > variation in drift rate takes longer to correct.
    >

    You DO know that NTPD adjusts the poll interval to fit the current
    conditions??? It will increase the poll interval to MAXPOLL only when
    the clock is stable and very close to being correct. The default values
    of MINPOLL and MAXPOLL are correct for all but the weirdest cases.

    Are you operating your machines in a controlled (temperature)
    environment? If the temperature bounces around, so will your clock.
    NTPD will correct it but if the temperature drops five degrees in five
    minutes when the air conditioning kicks in, NTPD may have a little
    difficulty keeping up.

  7. Re: NTP slow to start correction after a drift

    Mike K Smith wrote:

    > Looks like I should be reducing maxpoll. I guess the design of NTP is


    As I understand it, the loop time constant determines the poll interval,
    but the poll interval doesn't constrain the loop time constant, so
    reducing maxpoll will not make the system significantly more responsive
    to anything except a complete failure of a time source.

    > optimised for clocks with predictable drift rates, and a sudden
    > variation in drift rate takes longer to correct.
    >


  8. Re: NTP slow to start correction after a drift

    David and others,

    The adaptive poll algorithm evolved over many years and many variations.
    A summary follows.

    1. The poll will not be less than the maximum of the peer poll and
    minpoll. The maximum poll will not be greater than maxpoll. This is to
    protect the network.

    2. The time constant will not be less than minpoll nor greater than
    maxpoll of the system peer. This is to allow the user to constrain the
    time constant for some purpose. Note that if the maxpoll is different as
    the system peer changes, some swish and sway must be expected. That's
    why the ACTS driver is disabled if other peers are active.

    3. Subject to the above constraints, a jiggle counter increments by the
    value of the time constant when the current clock offset is less than
    twice the clock jitter and otherwise decrements by twice this value. If
    the jiggle counter exceeds +30, the time constant increments by one. If
    it falls below -30 it decrements by one and in both cases the jiggle
    counter is reset to zero.

    The design is intended to

    1. Always poll at twice the Nyquist rate with any time constant. Serious
    audiophiles and DSP engineers will recognize the needfor this.

    2. The poll value here is the exponent of two to yield the actual poll
    interval. This is chosen to match the Allan deviation characteristic
    which results in straight lines in log-log coordinates.

    3. The time constant increases slowly to higher intervals and decreases
    quickly to lower intervals.

    4. The time constant adapts more rapidly at higher polls and more slowly
    at the lower intervals.

    I hope this explains the behavior you report.

    Dave

    David Woolley wrote:

    > Mike K Smith wrote:
    >
    >> Looks like I should be reducing maxpoll. I guess the design of NTP is

    >
    >
    > As I understand it, the loop time constant determines the poll interval,
    > but the poll interval doesn't constrain the loop time constant, so
    > reducing maxpoll will not make the system significantly more responsive
    > to anything except a complete failure of a time source.
    >
    >> optimised for clocks with predictable drift rates, and a sudden
    >> variation in drift rate takes longer to correct.
    >>


  9. Re: NTP slow to start correction after a drift

    On 12 May, 15:16, "Richard B. Gilbert" wrote:
    > Mike K Smith wrote:


    > > Looks like I should be reducing maxpoll. I guess the design of NTP is
    > > optimised for clocks with predictable drift rates, and a sudden
    > > variation in drift rate takes longer to correct.

    >
    > You DO know that NTPD adjusts the poll interval to fit the current
    > conditions??? *It will increase the poll interval to MAXPOLL only when
    > the clock is stable and very close to being correct. *The default values
    > of MINPOLL and MAXPOLL are correct for all but the weirdest cases.


    I know that ntpd adjusts the poll interval to fit the current
    conditions, but I am describing a case where the current conditions
    changed. The clock had been stable for around a week, and the polling
    interval had increased to 1024 seconds, then something changed. It
    looks like the clock started drifting by about 2ppm, the poll interval
    didn't change for three hours causing a 15ms offset before beginning
    to correct the drift.
    I initiated this thread to help me understand why ntpd took so long to
    respond. I had expected to see the poll interval decrease and the
    offset swing back towards zero after the first couple of polls showed
    the increased offset.

    > Are you operating your machines in a controlled (temperature)
    > environment? *If the temperature bounces around, so will your clock.
    > NTPD will correct it but if the temperature drops five degrees in five
    > minutes when the air conditioning kicks in, NTPD may have a little
    > difficulty keeping up.


    The systems are in air-conditioned equipment rooms, I wasn't expecting
    to frequency changes due to temperature.

  10. Re: NTP slow to start correction after a drift

    Mike K Smith wrote:
    > On 12 May, 15:16, "Richard B. Gilbert" wrote:
    >> Mike K Smith wrote:

    >
    >>> Looks like I should be reducing maxpoll. I guess the design of NTP is
    >>> optimised for clocks with predictable drift rates, and a sudden
    >>> variation in drift rate takes longer to correct.

    >> You DO know that NTPD adjusts the poll interval to fit the current
    >> conditions??? It will increase the poll interval to MAXPOLL only when
    >> the clock is stable and very close to being correct. The default values
    >> of MINPOLL and MAXPOLL are correct for all but the weirdest cases.

    >
    > I know that ntpd adjusts the poll interval to fit the current
    > conditions, but I am describing a case where the current conditions
    > changed. The clock had been stable for around a week, and the polling
    > interval had increased to 1024 seconds, then something changed. It
    > looks like the clock started drifting by about 2ppm, the poll interval
    > didn't change for three hours causing a 15ms offset before beginning
    > to correct the drift.
    > I initiated this thread to help me understand why ntpd took so long to
    > respond. I had expected to see the poll interval decrease and the
    > offset swing back towards zero after the first couple of polls showed
    > the increased offset.
    >
    >> Are you operating your machines in a controlled (temperature)
    >> environment? If the temperature bounces around, so will your clock.
    >> NTPD will correct it but if the temperature drops five degrees in five
    >> minutes when the air conditioning kicks in, NTPD may have a little
    >> difficulty keeping up.

    >
    > The systems are in air-conditioned equipment rooms, I wasn't expecting
    > to frequency changes due to temperature.


    Do you monitor the temperature? Many data centers have a clock driven
    chart recorder that records the temperature and humidity. If the
    temperature changes the clock WILL be affected

    I can't tell you why NPTD took "so long" to jump on a 15 millisecond
    error, that's a problem for the mathematicians/control systems theory guys.

    If you need synchronization and/or accuracy closer than that, you may need:
    a. Better environmental control/monitoring, or
    b. A better clock (OCXO, TCXO). This could get expensive; computer
    clocks use basically the same mechanism as a cheap "quartz" wristwatch
    but lack the temperature control that usually keeps the wristwatch
    somewhere near 98.6 degrees Fahrenheit. Would you be upset if your
    wristwatch gained or lost thirty seconds per month?
    c. A different tool than NTPD for the job. Some people advocate a tool
    called "chrony", something with which I have no experience!

  11. Re: NTP slow to start correction after a drift

    Mike K Smith writes:

    >On 12 May, 15:16, "Richard B. Gilbert" wrote:
    >> Mike K Smith wrote:


    >> > Looks like I should be reducing maxpoll. I guess the design of NTP is
    >> > optimised for clocks with predictable drift rates, and a sudden
    >> > variation in drift rate takes longer to correct.

    >>
    >> You DO know that NTPD adjusts the poll interval to fit the current
    >> conditions??? =A0It will increase the poll interval to MAXPOLL only when
    >> the clock is stable and very close to being correct. =A0The default values=


    >> of MINPOLL and MAXPOLL are correct for all but the weirdest cases.


    >I know that ntpd adjusts the poll interval to fit the current
    >conditions, but I am describing a case where the current conditions
    >changed. The clock had been stable for around a week, and the polling
    >interval had increased to 1024 seconds, then something changed. It
    >looks like the clock started drifting by about 2ppm, the poll interval
    >didn't change for three hours causing a 15ms offset before beginning
    >to correct the drift.


    with a poll interval of 1024 the actual poll is about 8000 sec ( after the
    clock filter which throws away about 7 out of 8 data points). That is about
    2 hours, so it is impossible for the system to even recognize that
    something has happened in less than about 2 hours. It can then try to start
    correcting and start to try to reduce the poll interval. Why does it throw
    away all that data? It is believed that the gain in using the minimum delay
    out of 8 is more than the loss in responsiveness, and in accuracy. (The
    procedure is to try to get rid of data which might have a large assymetric
    drift. ) This means that if the clock is 10ms out and the delay is .1ms, it
    may still be thrown out since that .1 ms is greater than the .095 ms
    achieved 7 poll intervals ago, despite the fact that the data shows
    incontrovertably that the clock is having far more problems than could
    ever be hidden in the delay.

    >I initiated this thread to help me understand why ntpd took so long to
    >respond. I had expected to see the poll interval decrease and the
    >offset swing back towards zero after the first couple of polls showed
    >the increased offset.


    >> Are you operating your machines in a controlled (temperature)
    >> environment? =A0If the temperature bounces around, so will your clock.
    >> NTPD will correct it but if the temperature drops five degrees in five
    >> minutes when the air conditioning kicks in, NTPD may have a little
    >> difficulty keeping up.


    >The systems are in air-conditioned equipment rooms, I wasn't expecting
    >to frequency changes due to temperature.


  12. Re: NTP slow to start correction after a drift

    Bill,

    You seem to have a tack up your tail about the clock filter algorithm.
    First, you didn't respond to my message about sampling at twice the
    Nyquist rate, even if a burst of seven samples is lost.

    Second, look at the clock filter algorithm code and comments. Samples
    older than the Allan intercept (default 2000 s) are effectively
    discarded. Thus, only the latest sample is used and the next older used
    only to compute the peer jitter.

    Third, if you recall my recent message about the poll algorithm, you
    know the jiggle counter is reduced if the (combined) clock offset
    exceeds twice the clock jitter. With the constants revealed in my prior
    message, and if the clock frequency is yanked 1 PPM by a Grue, all it
    takes is two samples and the poll interval/time constant drops by half.

    Dave

    Unruh wrote:

    > Mike K Smith writes:
    >
    >
    >>On 12 May, 15:16, "Richard B. Gilbert" wrote:
    >>
    >>>Mike K Smith wrote:

    >
    >
    >>>>Looks like I should be reducing maxpoll. I guess the design of NTP is
    >>>>optimised for clocks with predictable drift rates, and a sudden
    >>>>variation in drift rate takes longer to correct.
    >>>
    >>>You DO know that NTPD adjusts the poll interval to fit the current
    >>>conditions??? =A0It will increase the poll interval to MAXPOLL only when
    >>>the clock is stable and very close to being correct. =A0The default values=

    >
    >
    >>>of MINPOLL and MAXPOLL are correct for all but the weirdest cases.

    >
    >
    >>I know that ntpd adjusts the poll interval to fit the current
    >>conditions, but I am describing a case where the current conditions
    >>changed. The clock had been stable for around a week, and the polling
    >>interval had increased to 1024 seconds, then something changed. It
    >>looks like the clock started drifting by about 2ppm, the poll interval
    >>didn't change for three hours causing a 15ms offset before beginning
    >>to correct the drift.

    >
    >
    > with a poll interval of 1024 the actual poll is about 8000 sec ( after the
    > clock filter which throws away about 7 out of 8 data points). That is about
    > 2 hours, so it is impossible for the system to even recognize that
    > something has happened in less than about 2 hours. It can then try to start
    > correcting and start to try to reduce the poll interval. Why does it throw
    > away all that data? It is believed that the gain in using the minimum delay
    > out of 8 is more than the loss in responsiveness, and in accuracy. (The
    > procedure is to try to get rid of data which might have a large assymetric
    > drift. ) This means that if the clock is 10ms out and the delay is .1ms, it
    > may still be thrown out since that .1 ms is greater than the .095 ms
    > achieved 7 poll intervals ago, despite the fact that the data shows
    > incontrovertably that the clock is having far more problems than could
    > ever be hidden in the delay.
    >
    >
    >>I initiated this thread to help me understand why ntpd took so long to
    >>respond. I had expected to see the poll interval decrease and the
    >>offset swing back towards zero after the first couple of polls showed
    >>the increased offset.

    >
    >
    >>>Are you operating your machines in a controlled (temperature)
    >>>environment? =A0If the temperature bounces around, so will your clock.
    >>>NTPD will correct it but if the temperature drops five degrees in five
    >>>minutes when the air conditioning kicks in, NTPD may have a little
    >>>difficulty keeping up.

    >
    >
    >>The systems are in air-conditioned equipment rooms, I wasn't expecting
    >>to frequency changes due to temperature.


+ Reply to Thread