tinker step 0 (always slew) and kernel time discipline - NTP

This is a discussion on tinker step 0 (always slew) and kernel time discipline - NTP ; Please read driver1.html and what it says about the 'prefer' keyword. H...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 40 of 40

Thread: tinker step 0 (always slew) and kernel time discipline

  1. Re: tinker step 0 (always slew) and kernel time discipline

    Please read driver1.html and what it says about the 'prefer' keyword.

    H

  2. Re: tinker step 0 (always slew) and kernel time discipline

    In article ,
    Richard B. Gilbert wrote:

    > This just what happened. Both servers seem to have been serving their
    > unsynchronized local clocks. The servers diverged and the client had no


    From what he said, I think that they *were* configured with a real server
    as well, but that, at least one, real server was given a wrong address,
    and maybe the stratum of the local clock reference wasn't set to 10 or so.

    One of the points that I keep making is that configuring a local clock
    is something that should only be done after considerable thought,
    because, amongst other things, it prevents downstream ntpd's detecting that
    the server in question is hopelessly unsynchronised. Unfortunately,
    most distributors seem to be enamoured with the idea and configure it
    in by default.

    If the servers in question had not had a local clock configured, and had
    a bad real reference configured, they would either never claimed to be
    valid sources, or would have been detected as such within about a day of
    losing their good server. If time is important to the application, it
    could have used the ntp_adjtime system call, or local equivalent, to
    detect whether there was a validly synchronised time, and alarmed at that
    point.

  3. Re: tinker step 0 (always slew) and kernel time discipline

    In article ,
    Richard B. Gilbert wrote:

    > The telephone companies tend to be very aware of time and timing. The
    > time division multiplexing of T1 and T3 lines requires splitting the
    > second very precisely. Cellular phones also require very precise


    That can be true of the public network, although even then it may
    be more so for the more engineering oriented layers, like bearer
    synchronisation, than the more commercial oriented layers, like
    call detail recording.

    However, it is very definitely not true of most PABX systems, which
    typically have wristwatch and eyeball set times and run in local
    time, with no automatic daylight saving switch, and have no high quality
    frequency standard.

    At least one PABX type consistently gets the start time wrong, even if
    it its clock is set properly.

  4. Re: tinker step 0 (always slew) and kernel time discipline

    Message-ID:
    From: Joe Harvell

    > Documentation about the "prefer" keyword states that if the prefer

    * peer survives the selection algorithm, then it will always survive the
    * clustering algorithm. Additionally, if the prefer peer does survive,

    That should say "intersection", not "selection"; the clustering algorithm
    is part of the selection algorithm. However it does have the effect
    you are quoting.

    * then the combining algorithm is not used, and the "prefer" peer's clock
    * is supposed to be the only clock used when correcting the local clock.

    > So if the two servers have non-intersecting error bounds, then I would

    * expect both peers (including the "prefer" peer) to not survive the
    * selection algorithm, and the local clock should free run.

    Looking at the 4.2.0 code, I would agree. However, just after a step
    event, one server will become eligible before the other, so there will
    be a time when you only have one server to choose from.

    Also, 800 seconds in 22 days is close to 500ppm, which may mean that
    one of your servers is on its frequency correction end stop, in which
    case there is a 50% chance that you will be unable to track it purely
    by slewing.

    Generally, it would be much easier to work out exactly what was happening
    if we had the syslog entries and ntpq peer command results at frequent
    intervals. Doing ntpq rv on the client and the server associations might
    help even more.

    Some issues from other threads:

    The reason that 512ms is an overflow is that the kernel uses scaled
    integers, accepting measured offsets with a basic resolution of 1
    microsecond, but needs an additional 12 bits of precision to perform
    the filter calculations. One suspects that the microsecond resolution
    might have been compromised if this had not been possible.



  5. Re: tinker step 0 (always slew) and kernel time discipline

    Harlan,

    The patch does not affect the reported behavior.

    Dave

    Harlan Stenn wrote:
    > Joe,
    >
    > Dave's patch will be in the next ntp-dev tarball snapshot roll, which will
    > happen automatically in about 14 hours' time, and could happen sooner if I
    > "push".
    >
    > If you want to see the patch and apply it yourself, in a little while it
    > will be in both the bk-ntp-dev-send@ mailing list archives and in (what I
    > trust will be) the suitably-commented changeset visible in the ntp-dev tree
    > at http://ntp.bkbits.com .
    >
    > H


  6. Re: tinker step 0 (always slew) and kernel time discipline

    Joe,

    The intended application and behavior scenario of the prefer keyword is
    are carefully explained on the "Mitigation Rules and the prefer Keyword"
    page in the documentation. Your expectation goes beyond the intend
    stated in that page.

    Dave

    Joe Harvell wrote:

    > Thanks for the information about fallback local reference clocks. This
    > will help me when I talk to whomever is responsible for the NTP
    > configuration in this lab.
    >
    >> Normally, I believe, if you have just two servers and they have non-
    >> intersecting error bounds, they will both be rejected and the system
    >> will free run. However, I think that prefer confuses the issue,
    >> by not allowing the preferred one to be discarded. I have a feeling
    >> this is actually done by saying that the system stops discarding when
    >> it would discard that one. I suppose that the other one could still
    >> be in contention at that point.
    >>

    >
    > I agree that the "prefer" keyword doesn't make sense in the case of an
    > NTP client without a physical clock.
    >
    > Documentation about the "prefer" keyword states that if the prefer peer
    > survives the selection algorithm, then it will always survive the
    > clustering algorithm. Additionally, if the prefer peer does survive,
    > then the combining algorithm is not used, and the "prefer" peer's clock
    > is supposed to be the only clock used when correcting the local clock.
    >
    > So if the two servers have non-intersecting error bounds, then I would
    > expect both peers (including the "prefer" peer) to not survive the
    > selection algorithm, and the local clock should free run.
    >
    > I wonder if whoever decided to use the "prefer" keyword thought it would
    > serve the same purpose as putting the fallback reference clocks at
    > different strata.
    >
    > I'm not sure I can readily reproduce the scenario without the "prefer"
    > keyword being used. But I can look into doing this if it would help.
    >
    >
    >> The expected behaviour is that this has happened because one is giving
    >> a false time and the other is giving UTC time. The remaining servers
    >> will also give UTC time, so the bad one will get voted out.
    >>
    >> I don't think prefer is intended to deal with broken clocks, only with
    >> more accurate ones.


  7. Re: tinker step 0 (always slew) and kernel time discipline

    David,

    The default behavior, different from my better judgement, is to accept
    synchronization from a single survivor, which is the default "tos
    minsane" behavior. As it says in the comments in ntp_proto.c, careful
    operators will set that value to something higher, assuming at least
    that number of servers is available. The default is set at one, as most
    casual operators would scream a violation of the Principle of Least
    Astonishment if the daemon did not latch on to a single server. Das Buch
    contains an extensive discussion on these issues.

    Dave

    David Woolley wrote:

    > Message-ID:
    > From: Joe Harvell
    >
    >>Documentation about the "prefer" keyword states that if the prefer

    >
    > * peer survives the selection algorithm, then it will always survive the
    > * clustering algorithm. Additionally, if the prefer peer does survive,
    >
    > That should say "intersection", not "selection"; the clustering algorithm
    > is part of the selection algorithm. However it does have the effect
    > you are quoting.
    >
    > * then the combining algorithm is not used, and the "prefer" peer's clock
    > * is supposed to be the only clock used when correcting the local clock.
    >
    >
    >>So if the two servers have non-intersecting error bounds, then I would

    >
    > * expect both peers (including the "prefer" peer) to not survive the
    > * selection algorithm, and the local clock should free run.
    >
    > Looking at the 4.2.0 code, I would agree. However, just after a step
    > event, one server will become eligible before the other, so there will
    > be a time when you only have one server to choose from.
    >
    > Also, 800 seconds in 22 days is close to 500ppm, which may mean that
    > one of your servers is on its frequency correction end stop, in which
    > case there is a 50% chance that you will be unable to track it purely
    > by slewing.
    >
    > Generally, it would be much easier to work out exactly what was happening
    > if we had the syslog entries and ntpq peer command results at frequent
    > intervals. Doing ntpq rv on the client and the server associations might
    > help even more.
    >
    > Some issues from other threads:
    >
    > The reason that 512ms is an overflow is that the kernel uses scaled
    > integers, accepting measured offsets with a basic resolution of 1
    > microsecond, but needs an additional 12 bits of precision to perform
    > the filter calculations. One suspects that the microsecond resolution
    > might have been compromised if this had not been possible.
    >
    >


  8. Re: tinker step 0 (always slew) and kernel time discipline


    >> The telephone companies tend to be very aware of time and timing. The
    >> time division multiplexing of T1 and T3 lines requires splitting the
    >> second very precisely. Cellular phones also require very precise

    >
    >That can be true of the public network, although even then it may
    >be more so for the more engineering oriented layers, like bearer
    >synchronisation, than the more commercial oriented layers, like
    >call detail recording.
    >
    >However, it is very definitely not true of most PABX systems, which
    >typically have wristwatch and eyeball set times and run in local
    >time, with no automatic daylight saving switch, and have no high quality
    >frequency standard.


    There are two separate issues: time and frequency.

    Anybody know if I can get a good frequency off a DSL line?
    Assume I'm willing to hack a wire into my modem/router.

    If so, it might be a nice/cheap way to get a stable clock
    to use with a NTP box. (Handwave, PLLs and such. Not a hard
    problem.)

    --
    The suespammers.org mail server is located in California. So are all my
    other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited
    commercial e-mail to my suespammers.org address or any of my other addresses.
    These are my opinions, not necessarily my employer's. I hate spam.


  9. Re: tinker step 0 (always slew) and kernel time discipline

    Hal Murray wrote:
    > Anybody know if I can get a good frequency off a DSL line?
    > Assume I'm willing to hack a wire into my modem/router.


    Since the advent of Ethernet or IP DSLAMs they are no longer necessarily
    synchronised with the rest of the network. Only a DSLAM which has an ATM
    (or in any case SDH/SONET encapsulated) uplink necessarily is.
    Ethernet/IP DSLAMs may still have BITS interfaces but their use would be
    optional (and needed only if this DSLAM is being used for things like
    circuit emulation on G.shdsl access lines).

    So: the clock frequency you would derive from a DSL line may or may not
    be synchronised with the (presumably high-quality) network clock. Unless
    you have detailed knowledge of your operator's network you cannot be
    certain.

    Regards, Jan

  10. Re: tinker step 0 (always slew) and kernel time discipline

    In article ,
    hmurray@suespammers.org (Hal Murray) wrote:

    > >> The telephone companies tend to be very aware of time and timing. The
    > >> time division multiplexing of T1 and T3 lines requires splitting the
    > >> second very precisely. Cellular phones also require very precise

    > >
    > >That can be true of the public network, although even then it may
    > >be more so for the more engineering oriented layers, like bearer
    > >synchronisation, than the more commercial oriented layers, like
    > >call detail recording.
    > >
    > >However, it is very definitely not true of most PABX systems, which
    > >typically have wristwatch and eyeball set times and run in local
    > >time, with no automatic daylight saving switch, and have no high quality
    > >frequency standard.

    >
    > There are two separate issues: time and frequency.
    >
    > Anybody know if I can get a good frequency off a DSL line?
    > Assume I'm willing to hack a wire into my modem/router.
    >
    > If so, it might be a nice/cheap way to get a stable clock
    > to use with a NTP box. (Handwave, PLLs and such. Not a hard
    > problem.)


    I assume that DSL follows a public standard, and that this standard will
    specify how good the clocks must be.

    For comparison, broadband cable modems follow DOCSIS 2.0, available from
    .

    My understanding is that both are timed protocols of some kind, but I
    don't know the details.

    Joe Gwinn

  11. Re: tinker step 0 (always slew) and kernel time discipline

    Joe,

    First of all, you misunderstand what the prefer keyword is for and what
    it is intended to do. It is not applicable to your scenario. As for
    jerking back and forth every 15 minutes, something is seriously broken
    with the hardware, either a stuck bit or kernel problem. Consider the
    step actions as a temporal canary. Considering the rather large number
    of servers around here and the national labs, if a step ever occurs, the
    hardware is to blame.

    Second, you apparently are using two servers that diverge widely about
    their times. The clients will be most confused as to which of the
    servers is trustable. This is not a step problem, it is a fatal
    condition for the applications. If the divergence is due to configuring
    both servers with the local clock driver, this violates the principle
    that all servers cling to the same timescale, UTC or synthetic. If you
    really need to have redundant servers that cling to the same synthetic
    timescale, configure both servers in orphan mode and symmetric active
    mode with each other. Do not use the local clock driver.

    A better choice is to have three servers configured as above. If one of
    them sails to the sunset, a majority clique is still possible. If only
    two servers and one of them sails away, the clients cannot form a
    majoity clique and will conclude neither of them is sane.

    Above all, if you are serious about the integrity of the time function
    and believe in Lamport's happens-before relation, as interpreted by NTP,
    take very seriously the topics discussed in the white papers linked from
    the NTP project page. Also, there should be no excuse for not detecting
    and responding to a scenario where servers can show serious disagreement
    without being reported to your beeper. That's how the NIST servers are
    monitored.

    Dave

    Joe Harvell wrote:
    > David L. Mills wrote:
    >
    >
    >>
    >> 5. If for some reason the server(s) are not reachable at startup and
    >> the applications must start, then I would assume the applications
    >> would fail, since the time is not synchronized. If the applications
    >> use the NTP system primatives, the synchronization condition is
    >> readily apparent in the return code. Since they can't run anyway,
    >> there is no harm in stepping the clock, no matter what the initial
    >> offset. Forcing a slew in this case would seem highly undesirable,
    >> unless the application can tolerate large differences between clocks
    >> and, in that case, using ntpd is probably a poor choice in the first
    >> place.
    >>

    >
    > I agree that the condition of no time servers reachable on startup is
    > the most common case where a large offset will eventually be observed.
    > I agree that the application should detect this and fail before starting
    > up. I am concerned about clock and network failure scenarios that cause
    > an NTP client to see two different NTP servers with very different times.
    >
    > This actually happened in a testbed for our application. NTP stats show
    > that over the course of 22 days, the offsets of two configured NTP
    > servers (both ours) serving one of our NTP clients started diverging up
    > to a maximum distance of 800 seconds. During this time, our NTP client
    > stepped its clock forward 940 times and backwards 803 times, with
    > increasing magnitudes up to ~400 seconds. The problem went away when
    > someone "added an IP address to the configuration of one of the NTP
    > servers." (I am still trying to determine exactly what happened). The
    > ntp.conf files of the NTP client, the stats, and a nice graph of the
    > offsets is found at http://dingo.dogpad.net/ntpProblem/.
    >
    > I concede that only having 2 NTP servers for our host made this problem
    > more likely to occur. But considering the mayhem caused by jerking the
    > clock back and forth every 15 minues for 22 days, I think it is worth
    > investigating whether to eliminate stepping altogether.
    >
    > I still don't understand why the clock was being stepped back and
    > forth. One of the NTP servers showed up with 80f4 (unreachable) status
    > every 15 minutes for the entire 22 days, but with 90f4 (reject) and 96f4
    > (sys.peer) in between. Oddly, this server was one of two servers, but
    > the *other* server was the preferred peer. I wonder why this peer would
    > ever be selected as the sys.peer since the prefer peer was only reported
    > unreachable 10 times over this 22 day period. Would this be because the
    > selection algorithm finds no intersection?
    >
    > Maybe the behavior I saw was a bug, and not the expected consequence of
    > a failure scenario in which 2 NTP servers have diverging clocks.



  12. Re: tinker step 0 (always slew) and kernel time discipline

    Harlan,

    My original recommendation used a "disable kernel" as a very practical
    workaround, so the patch would not be needed. As always, the patch is
    documented in the source.

    Dave

    Harlan Stenn wrote:

    > Joe,
    >
    > Dave's patch will be in the next ntp-dev tarball snapshot roll, which will
    > happen automatically in about 14 hours' time, and could happen sooner if I
    > "push".
    >
    > If you want to see the patch and apply it yourself, in a little while it
    > will be in both the bk-ntp-dev-send@ mailing list archives and in (what I
    > trust will be) the suitably-commented changeset visible in the ntp-dev tree
    > at http://ntp.bkbits.com .
    >
    > H



  13. Mail from field tampered in: Re: tinker step 0 (always slew) andkernel time discipline

    Listkeepers,

    This and a previous message were sent by me from mills@udel.edu but
    apparently overwritten as user@domain.invalid. What's going on?

    Dave

    user@domain.invalid wrote:
    > Harlan,
    >
    > My original recommendation used a "disable kernel" as a very practical
    > workaround, so the patch would not be needed. As always, the patch is
    > documented in the source.
    >
    > Dave
    >
    > Harlan Stenn wrote:
    >
    >> Joe,
    >>
    >> Dave's patch will be in the next ntp-dev tarball snapshot roll, which
    >> will
    >> happen automatically in about 14 hours' time, and could happen sooner
    >> if I
    >> "push".
    >>
    >> If you want to see the patch and apply it yourself, in a little while it
    >> will be in both the bk-ntp-dev-send@ mailing list archives and in (what I
    >> trust will be) the suitably-commented changeset visible in the ntp-dev
    >> tree
    >> at http://ntp.bkbits.com .
    >>
    >> H

    >
    >


  14. Re: Mail from field tampered in: Re: tinker step0 (always slew) and kernel time discipline

    Dave,

    This has been going on for quite a while with your messages. It looks
    like it going out via the newsgroup so it must depend on your
    configuration on your end. How are you sending your messages and with
    what software? This message went out with your correct email. They are
    not always that way.

    Danny

    David L. Mills wrote:
    > Listkeepers,
    >
    > This and a previous message were sent by me from mills@udel.edu but
    > apparently overwritten as user@domain.invalid. What's going on?
    >
    > Dave
    >
    > user@domain.invalid wrote:
    >> Harlan,
    >>
    >> My original recommendation used a "disable kernel" as a very practical
    >> workaround, so the patch would not be needed. As always, the patch is
    >> documented in the source.
    >>
    >> Dave

    _______________________________________________
    questions mailing list
    questions@lists.ntp.isc.org
    https://lists.ntp.isc.org/mailman/listinfo/questions


  15. Re: Mail from field tampered in: Re: tinker step 0 (always slew)and kernel time discipline

    Danny,

    Thanks for the check. I see a messages I sent to this list earlier today
    appears with the expected sender field and was sent using exactly the
    same programs and infrastructure. I've copied this message to my
    official mailbox so I can verify the headers. Right now I am suspecting
    tampering at the campus news.udel.edu NNTP server.

    Dave

    Danny Mayer wrote:
    > Dave,
    >
    > This has been going on for quite a while with your messages. It looks
    > like it going out via the newsgroup so it must depend on your
    > configuration on your end. How are you sending your messages and with
    > what software? This message went out with your correct email. They are
    > not always that way.
    >
    > Danny
    >
    > David L. Mills wrote:
    >
    >>Listkeepers,
    >>
    >>This and a previous message were sent by me from mills@udel.edu but
    >>apparently overwritten as user@domain.invalid. What's going on?
    >>
    >>Dave
    >>
    >>user@domain.invalid wrote:
    >>
    >>>Harlan,
    >>>
    >>>My original recommendation used a "disable kernel" as a very practical
    >>>workaround, so the patch would not be needed. As always, the patch is
    >>>documented in the source.
    >>>
    >>>Dave

    >
    > _______________________________________________
    > questions mailing list
    > questions@lists.ntp.isc.org
    > https://lists.ntp.isc.org/mailman/listinfo/questions
    >


  16. Re: tinker step 0 (always slew) and kernel time discipline



    David Woolley wrote:
    > In article ,
    > Joe Harvell wrote:
    >
    >> This actually happened in a testbed for our application. NTP stats show

    > * that over the course of 22 days, the offsets of two configured NTP
    > * servers (both ours) serving one of our NTP clients started diverging
    > * up to a maximum distance of 800 seconds. During this time, our NTP
    >
    > This could only happen if either the implementation was broken, or
    > they were mis-using the local clock pseudo reference clock.


    David:

    I tracked down the configuration of the NTP servers 192.168.0.1 and 192.168.0.2. Their normal NTP configuration is shown below. The problem occurred during a system upgrade on 16 Aug when the ntp.conf file of 192.168.0.1 was accidentally truncated (empty). This was fixed on 8 Sep to the normal configuration shown below.

    I don't understand why sysmgr0 or sysmgr1 would ever look at the time from 192.168.0.1 since it should have shown it was unsynced. I suspect it has to do with the "prefer" keyword. Should I file a bug report?

    ================================================== ============================
    192.168.0.1
    ================================================== ============================

    # BEGIN NTP SERVERS
    server ntp-3
    server ntp-2
    server ntp
    # END NTP SERVERS

    # BEGIN NTP PEERS
    # END NTP PEERS

    # BEGIN NTP OPTIONS
    driftfile /var/opt/NTP/ntp.drift
    statsdir /var/opt/NTP/ntpstats
    filegen peerstats file peerstatistics_log type week enable
    # END NTP OPTIONS


    ================================================== ============================
    192.168.0.2
    ================================================== ============================

    # BEGIN NTP SERVERS
    server ntp-3
    server ntp-2
    server ntp
    # END NTP SERVERS

    # BEGIN NTP PEERS
    # END NTP PEERS

    # BEGIN NTP OPTIONS
    driftfile /var/opt/NTP/ntp.drift
    statsdir /var/opt/NTP/ntpstats
    filegen peerstats file peerstatistics_log type week enable
    # END NTP OPTIONS
    #

    > If the
    > servers were using a proper reference clock as their primary source,
    > root dispersion would have exceeded it's maximum value when the
    > error was probably a lot less than a second and the servers would have
    > been rejected completely.
    >


  17. Re: tinker step 0 (always slew) and kernel time discipline

    Joe,

    I went thru the entire thread and I did not see an ntp.conf file that shows
    where the "prefer" keyword you mention below is used.

    I also notice your posted ntp.conf files do not use the iburst option, and
    that can be of great help at initial startup.

    H
    --
    >>> In article , Joe Harvell writes:


    Joe> I don't understand why sysmgr0 or sysmgr1 would ever look at the time
    Joe> from 192.168.0.1 since it should have shown it was unsynced. I suspect
    Joe> it has to do with the "prefer" keyword. Should I file a bug report?


  18. Re: tinker step 0 (always slew) and kernel time discipline

    Sorry. The details of the original problem (along with the ntp.conf
    files of sysmgr0 and sysmgr1) are in a previous thread. The subject of
    that thread is "2NTP Servers with diverging clocks and how to avoid
    stepping backwards in time (repost)"

    There is a link to the ntp.conf, peerstats, loopstats, etc. for sysmgr0
    and sysmgr1.

    Harlan Stenn wrote:
    > Joe,
    >
    > I went thru the entire thread and I did not see an ntp.conf file that shows
    > where the "prefer" keyword you mention below is used.
    >
    > I also notice your posted ntp.conf files do not use the iburst option, and
    > that can be of great help at initial startup.
    >
    > H
    > --
    >>>> In article , Joe Harvell writes:

    >
    > Joe> I don't understand why sysmgr0 or sysmgr1 would ever look at the time
    > Joe> from 192.168.0.1 since it should have shown it was unsynced. I suspect
    > Joe> it has to do with the "prefer" keyword. Should I file a bug report?
    >


  19. Re: tinker step 0 (always slew) and kernel time discipline

    Joe,

    I found the link.

    I assume you have seen the prefer.html man page.

    From what I can see, the 'prefer' keyword should not let you sync to an
    unsync'd peer.

    If you think that is happening, could you please also include the output of
    ntpq -p on the affected client machine?

    H

  20. Re: tinker step 0 (always slew) and kernel time discipline

    Harlan:

    Yes, that is what I think is happening.

    I will see if I can reproduce the problem so I can get ntpq -p output.
    I think the stats field from the peerstats files (already captured and
    available at the linke) should have all that information, though.

    Harlan Stenn wrote:
    > Joe,
    >
    > I found the link.
    >
    > I assume you have seen the prefer.html man page.
    >
    > From what I can see, the 'prefer' keyword should not let you sync to an
    > unsync'd peer.
    >
    > If you think that is happening, could you please also include the output of
    > ntpq -p on the affected client machine?
    >
    > H


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2