Time slew doesn't seem to work - NTP

This is a discussion on Time slew doesn't seem to work - NTP ; Hi, I've started ntpd with the -x option and defined at run-time (using ntpdc) 3 servers. The client machine has an offset of +/- 2s with the ntp servers. In the NTP log file I find the following statements (extracted ...

+ Reply to Thread
Results 1 to 18 of 18

Thread: Time slew doesn't seem to work

  1. Time slew doesn't seem to work


    Hi,

    I've started ntpd with the -x option and defined at run-time (using ntpdc) 3
    servers. The client machine has an offset of +/- 2s with the ntp servers.
    In the NTP log file I find the following statements (extracted out of a
    total of 98):

    9 Apr 07:46:13 ntpd[19257]: time slew 1.781571 s
    9 Apr 08:01:16 ntpd[19257]: time slew 1.781200 s
    9 Apr 08:17:21 ntpd[19257]: time slew 1.781085 s
    9 Apr 08:32:33 ntpd[19257]: time slew 1.781807 s
    9 Apr 08:48:37 ntpd[19257]: time slew 1.782273 s
    9 Apr 09:04:38 ntpd[19257]: time slew 1.781004 s
    9 Apr 09:19:42 ntpd[19257]: time slew 1.781344 s
    9 Apr 09:34:46 ntpd[19257]: time slew 1.780407 s
    9 Apr 09:49:50 ntpd[19257]: time slew 1.778824 s

    The times don't seem to converge.

    When I shut down the ntp daemon and try to slew the time using ntpdate with
    the -B option it does work. The time difference with the ntp servers
    gradually declines.

    We use Suse SLES10 (kernel version: 2.6.16).

    Does anybody have an idea on what's going wrong?

    Thanks,
    Jan



  2. Re: Time slew doesn't seem to work

    jkvbe wrote:
    > Hi,
    >
    > I've started ntpd with the -x option and defined at run-time (using ntpdc) 3
    > servers. The client machine has an offset of +/- 2s with the ntp servers.
    > In the NTP log file I find the following statements (extracted out of a
    > total of 98):
    >
    > 9 Apr 07:46:13 ntpd[19257]: time slew 1.781571 s
    > 9 Apr 08:01:16 ntpd[19257]: time slew 1.781200 s
    > 9 Apr 08:17:21 ntpd[19257]: time slew 1.781085 s
    > 9 Apr 08:32:33 ntpd[19257]: time slew 1.781807 s
    > 9 Apr 08:48:37 ntpd[19257]: time slew 1.782273 s
    > 9 Apr 09:04:38 ntpd[19257]: time slew 1.781004 s
    > 9 Apr 09:19:42 ntpd[19257]: time slew 1.781344 s
    > 9 Apr 09:34:46 ntpd[19257]: time slew 1.780407 s
    > 9 Apr 09:49:50 ntpd[19257]: time slew 1.778824 s
    >
    > The times don't seem to converge.
    >
    > When I shut down the ntp daemon and try to slew the time using ntpdate with
    > the -B option it does work. The time difference with the ntp servers
    > gradually declines.
    >
    > We use Suse SLES10 (kernel version: 2.6.16).
    >
    > Does anybody have an idea on what's going wrong?
    >
    > Thanks,
    > Jan
    >
    >


    Something is VERY wrong there. It looks as if NTPD is making a massive
    correction every fifteen minutes or so!

    If you reboot without running NTPD, and set the time manually, how badly
    does it drift? If it gains or loses more than something like 43 seconds
    per day, NTPD will not work until you get your hardware fixed. Gaining
    or losing 1 or 2 seconds per day without NTPD is the expected level of
    performance for a typical computer clock. (You get the finest hardware
    that $2 US can buy!)



  3. Re: Time slew doesn't seem to work

    jkvbe wrote:
    > Hi,
    >
    > I've started ntpd with the -x option and defined at run-time (using ntpdc) 3
    > servers. The client machine has an offset of +/- 2s with the ntp servers.
    > In the NTP log file I find the following statements (extracted out of a
    > total of 98):
    >
    > 9 Apr 07:46:13 ntpd[19257]: time slew 1.781571 s
    > 9 Apr 08:01:16 ntpd[19257]: time slew 1.781200 s
    > 9 Apr 08:17:21 ntpd[19257]: time slew 1.781085 s
    > 9 Apr 08:32:33 ntpd[19257]: time slew 1.781807 s
    > 9 Apr 08:48:37 ntpd[19257]: time slew 1.782273 s
    > 9 Apr 09:04:38 ntpd[19257]: time slew 1.781004 s
    > 9 Apr 09:19:42 ntpd[19257]: time slew 1.781344 s
    > 9 Apr 09:34:46 ntpd[19257]: time slew 1.780407 s
    > 9 Apr 09:49:50 ntpd[19257]: time slew 1.778824 s
    >
    > The times don't seem to converge.
    >
    > When I shut down the ntp daemon and try to slew the time using ntpdate with
    > the -B option it does work. The time difference with the ntp servers
    > gradually declines.
    >
    > We use Suse SLES10 (kernel version: 2.6.16).
    >
    > Does anybody have an idea on what's going wrong?
    >
    > Thanks,
    > Jan

    Two things:
    (1) Try running with time stepping enabled on that system (i.e. don't
    use the '-x' flag) to see how well the system keeps time. What kind of
    offset do you have after 1 or 2 hours of operation?
    (2) Check your drift value when running with time stepping disabled
    (also check it with time stepping enabled). You can do this with 'ntpq
    -crv' where 'frequency' is the drift value or you can dump the drift
    file (probably /var/lib/ntp/drift). Note that the drift file is only
    updated once every hour or so.

    I encountered a problem on linux 2.6.18 in which disabling of time
    stepping (using either '-x' or 'tinker step 0') caused the drift value
    to run at or near +/-500ppm and subsequently caused a time offset of
    several milliseconds. If I allow time steps on that same system, it
    runs with a drift <100ppm and maintains an offset <1ms. I am using an
    IRIG time source, so I expect high accuracy. In my system, a time step
    is never needed (i.e. the offset never grows larger than 128ms),
    regardless of whether time stepping is enabled or disabled. This
    doesn't change the fact that it runs like crap with time stepping disabled.

    Andy

  4. Re: Time slew doesn't seem to work

    The 15-minute correction is due to the default configuration for
    "stepout". In my experience, it's either due to another piece of
    software to discipline the clock or a bad drift file, when just
    erasing it and restarting NTP should help.

    HTH

  5. Re: Time slew doesn't seem to work

    "Richard B. Gilbert" writes:

    >jkvbe wrote:
    >> Hi,
    >>
    >> I've started ntpd with the -x option and defined at run-time (using ntpdc) 3
    >> servers. The client machine has an offset of +/- 2s with the ntp servers.
    >> In the NTP log file I find the following statements (extracted out of a
    >> total of 98):
    >>
    >> 9 Apr 07:46:13 ntpd[19257]: time slew 1.781571 s
    >> 9 Apr 08:01:16 ntpd[19257]: time slew 1.781200 s
    >> 9 Apr 08:17:21 ntpd[19257]: time slew 1.781085 s
    >> 9 Apr 08:32:33 ntpd[19257]: time slew 1.781807 s
    >> 9 Apr 08:48:37 ntpd[19257]: time slew 1.782273 s
    >> 9 Apr 09:04:38 ntpd[19257]: time slew 1.781004 s
    >> 9 Apr 09:19:42 ntpd[19257]: time slew 1.781344 s
    >> 9 Apr 09:34:46 ntpd[19257]: time slew 1.780407 s
    >> 9 Apr 09:49:50 ntpd[19257]: time slew 1.778824 s
    >>
    >> The times don't seem to converge.
    >>
    >> When I shut down the ntp daemon and try to slew the time using ntpdate with
    >> the -B option it does work. The time difference with the ntp servers
    >> gradually declines.
    >>
    >> We use Suse SLES10 (kernel version: 2.6.16).
    >>
    >> Does anybody have an idea on what's going wrong?
    >>
    >> Thanks,
    >> Jan
    >>
    >>


    >Something is VERY wrong there. It looks as if NTPD is making a massive
    >correction every fifteen minutes or so!


    >If you reboot without running NTPD, and set the time manually, how badly
    >does it drift? If it gains or loses more than something like 43 seconds
    >per day, NTPD will not work until you get your hardware fixed. Gaining
    >or losing 1 or 2 seconds per day without NTPD is the expected level of
    >performance for a typical computer clock. (You get the finest hardware
    >that $2 US can buy!)


    Well, no. 1 or 2 sec is 10-20PPM which is on the good side. 43 sec per day
    is like 500PPM which is definitely on the high side. 5-10sec per day is
    more typical. Note that chrony(on linux) will fix 43s/day. (It will use the fast
    slew-- ie changing the tick size-- as well as the slow slew.) ntp as a
    design decision decided that 500PPM was the max it would ever do. NOt that I
    advise a computer with 500PPM freq error. something is wrong and is liable
    to be wrong in more places than just the clock.






  6. Re: Time slew doesn't seem to work


    >Well, no. 1 or 2 sec is 10-20PPM which is on the good side. 43 sec per day
    >is like 500PPM which is definitely on the high side. 5-10sec per day is
    >more typical. Note that chrony(on linux) will fix 43s/day. (It will use the fast
    >slew-- ie changing the tick size-- as well as the slow slew.) ntp as a
    >design decision decided that 500PPM was the max it would ever do. NOt that I
    >advise a computer with 500PPM freq error. something is wrong and is liable
    >to be wrong in more places than just the clock.


    Don't overlook software when looking for things that can go wrong.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  7. Re: Time slew doesn't seem to work

    jkvbe wrote:
    >
    > I've started ntpd with the -x option and defined at run-time (using ntpdc) 3


    Are you sure that you are using an unmodified recent version. I seem to
    remember that even -x will step if the offset is more than 500ms. If
    not you should address the question to SUSE.

    There isn't enough time for the clock to slew 2000ms in 15 minutes.


    > servers. The client machine has an offset of +/- 2s with the ntp servers.
    > In the NTP log file I find the following statements (extracted out of a
    > total of 98):


    > We use Suse SLES10 (kernel version: 2.6.16).


    Even if this in unmodified code, which version is it?

  8. Re: Time slew doesn't seem to work

    David Woolley writes:

    >jkvbe wrote:
    >>
    >> I've started ntpd with the -x option and defined at run-time (using ntpdc) 3


    >Are you sure that you are using an unmodified recent version. I seem to
    >remember that even -x will step if the offset is more than 500ms. If
    >not you should address the question to SUSE.


    >There isn't enough time for the clock to slew 2000ms in 15 minutes.


    Not at 500PPM limit but if you use the tick adjustment, it is more than
    enough time. (The tick adjust limits out at 100,000PPM)



    >> servers. The client machine has an offset of +/- 2s with the ntp servers.
    >> In the NTP log file I find the following statements (extracted out of a
    >> total of 98):


    >> We use Suse SLES10 (kernel version: 2.6.16).


    >Even if this in unmodified code, which version is it?


  9. Re: Time slew doesn't seem to work

    Unruh wrote:
    >
    > Not at 500PPM limit but if you use the tick adjustment, it is more than
    > enough time. (The tick adjust limits out at 100,000PPM)
    >

    I believe ntpd assumes that it is constant. Having a large tickadj
    causes poor resolution when using the user space discipline.

    I suspect that Dr Mills would say that a high slew rate also compromises
    the system behaviour when you cascasde multiple strata.

  10. Re: Time slew doesn't seem to work

    David Woolley writes:

    >Unruh wrote:
    >>
    >> Not at 500PPM limit but if you use the tick adjustment, it is more than
    >> enough time. (The tick adjust limits out at 100,000PPM)
    >>

    >I believe ntpd assumes that it is constant. Having a large tickadj
    >causes poor resolution when using the user space discipline.


    Yes, ntpd does limit all slews to 500PPM (the limit on the freq adjust
    parameter in adjtimex on linux as well). I was just saying that a slew of 2
    sec in 15min IS possible, although ntpd will not do it.


    >I suspect that Dr Mills would say that a high slew rate also compromises
    >the system behaviour when you cascasde multiple strata.


    Not sure why, but maybe.


  11. Re: Time slew doesn't seem to work

    David,

    The original model implemented in the Alpha kernel does not step the
    clock backward unless the step is greater than two seconds. Rather, it
    stops the clock and advances one microsecond at each read. This applies
    whether NTP slews or steps. Various ports of that code have broken this
    model in every possible way.

    The 500-PPM slew once was common in the ubiquitous Unix kernel. The
    value was chosen as a compromise between short slew time for relatively
    small adjustments and moderate resolution during the slew interval. This
    works out to 5 microseconds per tick with a 100-Hz clock and a 5-us
    jitter. In truth this could be changed to anything you want, as long as
    the value is fixed.

    Some kernelmongers, including SGI and Linux, have put up fancy code
    designed to reduce the slew time for large adjustments. This inserts and
    additional pole in the clock discipline impulse response which results
    in unstable behavior for adjustments over half a second or so.

    The default step threshold is 128 ms; the -x command line option sets it
    to 600 s and does nothing else. The 600-s value was chosen as the
    expected accuracy with eyeball and wristwatch. If the extra pole is not
    there, the original response is preserved over that range and largely
    independent of the slew value itself.

    Say you change from 5 us per tick to 1 ms per tick or 100 ms/s. This
    would amortize a 600-s adjustment in almost two hours and reduce the
    resolution to 1 ms. If your extended network requires synchronization to
    better than one second, in all but the last second of that slew the
    network would not be synchronized.

    Dave

    David Woolley wrote:
    > Unruh wrote:
    >
    >>
    >> Not at 500PPM limit but if you use the tick adjustment, it is more than
    >> enough time. (The tick adjust limits out at 100,000PPM)
    >>

    > I believe ntpd assumes that it is constant. Having a large tickadj
    > causes poor resolution when using the user space discipline.
    >
    > I suspect that Dr Mills would say that a high slew rate also compromises
    > the system behaviour when you cascasde multiple strata.


  12. Re: Time slew doesn't seem to work

    "David L. Mills" writes:

    >David,


    >The original model implemented in the Alpha kernel does not step the
    >clock backward unless the step is greater than two seconds. Rather, it
    >stops the clock and advances one microsecond at each read. This applies
    >whether NTP slews or steps. Various ports of that code have broken this
    >model in every possible way.


    >The 500-PPM slew once was common in the ubiquitous Unix kernel. The
    >value was chosen as a compromise between short slew time for relatively
    >small adjustments and moderate resolution during the slew interval. This
    >works out to 5 microseconds per tick with a 100-Hz clock and a 5-us
    >jitter. In truth this could be changed to anything you want, as long as
    >the value is fixed.


    OK, so there was no magic in that 500PPM limit. Is there a difference
    between the tick size adjustment and the frequency adjustment
    (CPU-counter-to-time conversion factor).

    >Some kernelmongers, including SGI and Linux, have put up fancy code
    >designed to reduce the slew time for large adjustments. This inserts and
    >additional pole in the clock discipline impulse response which results
    >in unstable behavior for adjustments over half a second or so.


    That is of course not good. I am a bit uncertain why that instability would
    depend on amplitude. Is the response a non-linear response? For linear
    responses the amplitude should not matter. But from your words it sounds
    like the reponse is amplitude dependent which would of course be
    non-linear. And if it is non-linear, once it gets near (1 sec?) the right
    time, that linear stable response should take over.


    By the way, do you happen to know how the Linux kernel inpliments the
    adjtime system call (adjtimex ADJ_OFFSET_SINGLESHOT) does its slewing?





    >The default step threshold is 128 ms; the -x command line option sets it
    >to 600 s and does nothing else. The 600-s value was chosen as the
    >expected accuracy with eyeball and wristwatch. If the extra pole is not
    >there, the original response is preserved over that range and largely
    >independent of the slew value itself.


    >Say you change from 5 us per tick to 1 ms per tick or 100 ms/s. This
    >would amortize a 600-s adjustment in almost two hours and reduce the
    >resolution to 1 ms. If your extended network requires synchronization to
    >better than one second, in all but the last second of that slew the
    >network would not be synchronized.


    This paragraph confuses me. If the clock is 600s out, it is way out no
    matter how you slew it back. What do you mean "reduce the resolution to
    1ms"? The resolution is still 1usec and the accuracy is a few hundreds of
    seconds. And if the clock is out by 600s the network will not be
    synchronised to true time. Or perhaps you mean something else than that
    with "network would not be synchronized".

    Thanks




    >Dave


    >David Woolley wrote:
    >> Unruh wrote:
    >>
    >>>
    >>> Not at 500PPM limit but if you use the tick adjustment, it is more than
    >>> enough time. (The tick adjust limits out at 100,000PPM)
    >>>

    >> I believe ntpd assumes that it is constant. Having a large tickadj
    >> causes poor resolution when using the user space discipline.
    >>
    >> I suspect that Dr Mills would say that a high slew rate also compromises
    >> the system behaviour when you cascasde multiple strata.


  13. Re: Time slew doesn't seem to work

    David L. Mills wrote:

    > The default step threshold is 128 ms; the -x command line option sets it
    > to 600 s and does nothing else. The 600-s value was chosen as the


    Oops, I thought the units there were ms, which invalidates some of what
    I said. I was probably partly confused by the disabling of the kernel
    discipline at 0.5 seconds.


    > expected accuracy with eyeball and wristwatch. If the extra pole is not


    With a radio controlled wrist watch, 200ms is easily possible. I'd
    suggest that 10 minutes is about the 95 percentile for when the person
    setting the time doesn't care about the time. In that context, my
    practical experience is that most times are set within 5 minutes. As
    someone who regularly uses public transport, I would be disappointed if
    I could get the time to 30 seconds, by reading my wrist watch.

    600 seconds is 60% of the drop dead value, so I'm not clear why the two
    weren't made the same.

  14. Re: Time slew doesn't seem to work


    >OK, so there was no magic in that 500PPM limit. Is there a difference
    >between the tick size adjustment and the frequency adjustment
    >(CPU-counter-to-time conversion factor).


    Limiting the slew rate to something like that means that
    software that is timing things with code like:
    grab time, do something, grab time, subtract
    gets a sane answer if it happens to be running while somebody
    adjusts the time.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  15. Re: Time slew doesn't seem to work

    hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray) writes:


    >>OK, so there was no magic in that 500PPM limit. Is there a difference
    >>between the tick size adjustment and the frequency adjustment
    >>(CPU-counter-to-time conversion factor).


    >Limiting the slew rate to something like that means that
    >software that is timing things with code like:
    > grab time, do something, grab time, subtract
    >gets a sane answer if it happens to be running while somebody
    >adjusts the time.


    Do you know any code that cares if that is wrong by 10% (which would be
    100000PPM) Ie, is 10% error insane?

    Is 1% (10000PPM)?
    Ie, .05% seems a bit extreme for that.



    >--
    >These are my opinions, not necessarily my employer's. I hate spam.



  16. Re: Time slew doesn't seem to work

    Unruh wrote:

    >
    > Do you know any code that cares if that is wrong by 10% (which would be
    > 100000PPM) Ie, is 10% error insane?
    >

    RTP.

    Anything measuring speeds based on crossing starting and ending thresholds.

  17. Re: Time slew doesn't seem to work

    >Do you know any code that cares if that is wrong by 10% (which would be
    >100000PPM) Ie, is 10% error insane?


    >Is 1% (10000PPM)?
    >Ie, .05% seems a bit extreme for that.


    I used to do a lot of performance measurements.

    For the stuff I was doing, 10% is easy to spot. 1% is borderline.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  18. Re: Time slew doesn't seem to work

    On Apr 9, 4:40 pm, andy.heltenNOS...@dot21rts.com (Andy) wrote:

    >
    > Two things:
    > (1) Try running with time stepping enabled on that system (i.e. don't
    > use the '-x' flag) to see how well the system keeps time. What kind of
    > offset do you have after 1 or 2 hours of operation?
    > (2) Check your drift value when running with time stepping disabled
    > (also check it with time stepping enabled). You can do this with 'ntpq
    > -crv' where 'frequency' is the drift value or you can dump the drift
    > file (probably /var/lib/ntp/drift). Note that the drift file is only
    > updated once every hour or so.
    >
    > I encountered a problem on linux 2.6.18 in which disabling of time
    > stepping (using either '-x' or 'tinker step 0') caused the drift value
    > to run at or near +/-500ppm and subsequently caused a time offset of
    > several milliseconds. If I allow time steps on that same system, it
    > runs with a drift <100ppm and maintains an offset <1ms. I am using an
    > IRIG time source, so I expect high accuracy. In my system, a time step
    > is never needed (i.e. the offset never grows larger than 128ms),
    > regardless of whether time stepping is enabled or disabled. This
    > doesn't change the fact that it runs like crap with time stepping disabled.
    >
    > Andy


    Note that the my set-up was an experiment to check how time slew
    worked. What I still don't get is why it works when using ntpdate.
    Also strange is the fact that ntpq actually reports the node as
    synched with an offset of > 1.6s.

    Jan

+ Reply to Thread