Linux NTP Kernel unsync flag remains long afterNTP&Kernel have PPL sync - NTP

This is a discussion on Linux NTP Kernel unsync flag remains long afterNTP&Kernel have PPL sync - NTP ; I am currently using NTP 4.2.2p1. My problem is with the management of the "unsync" flag in the kernel. This is visible from the "status" line in : ntpdc> kerninfo pll offset: 4.7e-05 s pll frequency: -62.146 ppm maximum error: ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 27

Thread: Linux NTP Kernel unsync flag remains long afterNTP&Kernel have PPL sync

  1. Linux NTP Kernel unsync flag remains long afterNTP&Kernel have PPL sync

    I am currently using NTP 4.2.2p1.

    My problem is with the management of the "unsync" flag in the kernel.
    This is visible from the "status" line in :

    ntpdc> kerninfo
    pll offset: 4.7e-05 s
    pll frequency: -62.146 ppm
    maximum error: 16.384 s
    estimated error: 16.384 s
    status: 0041 pll unsync
    pll time constant: 2
    precision: 1e-06 s
    frequency tolerance: 512 ppm
    ntpdc>


    Older version of NTP used to manage this flag, this is my understanding
    of how things used worked (based on observation not understanding) :

    * Kernel would bootup and by default the status would have the "unsync"
    bit set.
    * Then NTP would be started.
    * NTP would take a few minutes to obtain PLL lock with multiple time
    sources.
    * Then select a preferred source as candidate to configure the kernel.
    * NTP would then configure the kernel PLL to obtain convergence.
    *** Once convergence was complete the 0x40 UNSYNC bitwise flag would be
    reset in the kernel by NTP. ***
    * NTP would continue to monitor/manage/update the kernel PLL.

    It appears since between version 4.2.0.a.20040617 and 4.2.2p1 the
    penultimate item in the list above is no longer occurring.



    Question 1) Can someone confirm is the "UNSYNC" status flag held inside
    the kernel is arbitrary, i.e. its just an informational flag and is
    independent of the operation / function of NTP ?


    Question 2) Am I correctly interpreting the purpose of the UNSYNC flag
    ? I have a periodic script that runs and checks to see if adjtimex
    reports the nominal status of 0x01 PLL, as opposed to (anything else for
    example 0x41 = PLL|UNSYNC). This used to provide me with a mechanism
    to alert me to problems with NTP configuration when a host became UNSYNC
    so that as administrator I could investigate why a system became unsync.


    Question 3) If NTP exits/crashes does the kernel automatically re-arm
    the UNSYNC flag if the PLL data has not been updated within a specified
    period of time (like within 3 minutes) ? i.e. the kernel will fail-safe
    back to UNSYNC if it can clearly observe that no application has called
    the appropiate NTP API to keep the UNSYNC status flag muted. This is a
    sort of watchdog that does the correct thing in the case of failure ?


    When I googled this problem I found a suggestion that "enable kernel"
    command can do the trick. I do use NTP keys between my external data
    sources and when I tried this command into ntpdc it asked me for a key.

    ntpdc> enable kernel
    Keyid: 1
    MD5 Password:
    ***Permission denied
    ntpdc>


    Both systems run ntp as non-root, both systems have the appropriate Linux kernel capability bit set CAP_SYS_TIME :

    # cat /proc/3268/status
    CapInh: 0000000002000000
    CapPrm: 0000000002000000
    CapEff: 0000000002000000

    I guess in order to configure the PLL ntp is going to need that capability anyway.



    So I'm at a bit for a loss as what the cause of the UNSYNC flag sticking
    long after both NTP and the kernel have obtain a good enough PLL sync to
    believe they are "in-step".


    Thanks,

    Darryl

  2. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    darryl-mailinglists@netbauds.net (Darryl Miles) writes:

    >I am currently using NTP 4.2.2p1.


    >My problem is with the management of the "unsync" flag in the kernel.
    >This is visible from the "status" line in :


    The kernel manages that flag. It has problems, for example the 11 min
    write-to-rtc which can mess up any attempt to maintain rtc statistics and
    drift. Ie it is better to have it off
    ..

    >ntpdc> kerninfo
    >pll offset: 4.7e-05 s
    >pll frequency: -62.146 ppm
    >maximum error: 16.384 s
    >estimated error: 16.384 s
    >status: 0041 pll unsync
    >pll time constant: 2
    >precision: 1e-06 s
    >frequency tolerance: 512 ppm
    >ntpdc>



    >Older version of NTP used to manage this flag, this is my understanding
    >of how things used worked (based on observation not understanding) :


    > * Kernel would bootup and by default the status would have the "unsync"
    >bit set.
    > * Then NTP would be started.
    > * NTP would take a few minutes to obtain PLL lock with multiple time
    >sources.
    > * Then select a preferred source as candidate to configure the kernel.
    > * NTP would then configure the kernel PLL to obtain convergence.
    > *** Once convergence was complete the 0x40 UNSYNC bitwise flag would be
    >reset in the kernel by NTP. ***


    ntp has nothing to do with the kernel. The kernel is Linus Torvald's
    business, not David Mills (much to that latter's annoyance when the kernel
    people mess up the kernel timekeeping).

    As far as I know, the only purpose of that flag is turn on the 11 min rtc
    procedure ( evey 11 min the kernel resets the rtc to the current system
    time) with a very inaccurate procedure.


    > * NTP would continue to monitor/manage/update the kernel PLL.


    >It appears since between version 4.2.0.a.20040617 and 4.2.2p1 the
    >penultimate item in the list above is no longer occurring.




    >Question 1) Can someone confirm is the "UNSYNC" status flag held inside
    >the kernel is arbitrary, i.e. its just an informational flag and is
    >independent of the operation / function of NTP ?



    >Question 2) Am I correctly interpreting the purpose of the UNSYNC flag
    >? I have a periodic script that runs and checks to see if adjtimex
    >reports the nominal status of 0x01 PLL, as opposed to (anything else for
    >example 0x41 = PLL|UNSYNC). This used to provide me with a mechanism
    >to alert me to problems with NTP configuration when a host became UNSYNC
    >so that as administrator I could investigate why a system became unsync.


    A far better idea is to monitor the offset from the ntp servers to let you
    know if there is a clock problem.



    >Question 3) If NTP exits/crashes does the kernel automatically re-arm
    >the UNSYNC flag if the PLL data has not been updated within a specified
    >period of time (like within 3 minutes) ? i.e. the kernel will fail-safe
    >back to UNSYNC if it can clearly observe that no application has called
    >the appropiate NTP API to keep the UNSYNC status flag muted. This is a
    >sort of watchdog that does the correct thing in the case of failure ?


    I do not think so.



    >When I googled this problem I found a suggestion that "enable kernel"
    >command can do the trick. I do use NTP keys between my external data
    >sources and when I tried this command into ntpdc it asked me for a key.


    >ntpdc> enable kernel
    >Keyid: 1
    >MD5 Password:
    >***Permission denied
    >ntpdc>



    >Both systems run ntp as non-root, both systems have the appropriate Linux kernel capability bit set CAP_SYS_TIME :


    ># cat /proc/3268/status
    >CapInh: 0000000002000000
    >CapPrm: 0000000002000000
    >CapEff: 0000000002000000


    >I guess in order to configure the PLL ntp is going to need that capability anyway.




    >So I'm at a bit for a loss as what the cause of the UNSYNC flag sticking
    >long after both NTP and the kernel have obtain a good enough PLL sync to
    >believe they are "in-step".


    Leave it unsynced. It serves no useful purpose AFAIK. hwclock is a much better
    idea to use to set the rtc, and does a much better job of it ( including
    determining the drift of the rtc and compensating for it. )




    >Thanks,


    >Darryl


  3. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel havePPL sync

    Unruh wrote:
    > darryl-mailinglists@netbauds.net (Darryl Miles) writes:
    >
    >> I am currently using NTP 4.2.2p1.


    Rather old.

    >
    >> My problem is with the management of the "unsync" flag in the kernel.
    >> This is visible from the "status" line in :

    >
    > The kernel manages that flag. It has problems, for example the 11 min
    > write-to-rtc which can mess up any attempt to maintain rtc statistics and
    > drift. Ie it is better to have it off


    ntpd manages that flag. The kernel only changes it on a manual time set
    or when the estimated error overflows.
    > .
    >
    >> ntpdc> kerninfo
    >> pll offset: 4.7e-05 s
    >> pll frequency: -62.146 ppm
    >> maximum error: 16.384 s
    >> estimated error: 16.384 s
    >> status: 0041 pll unsync
    >> pll time constant: 2
    >> precision: 1e-06 s
    >> frequency tolerance: 512 ppm
    >> ntpdc>

    >
    >
    >> Older version of NTP used to manage this flag, this is my understanding
    >> of how things used worked (based on observation not understanding) :

    >
    >> * Kernel would bootup and by default the status would have the "unsync"
    >> bit set.


    Status is preset to just UNSYNC.

    >> * Then NTP would be started.
    >> * NTP would take a few minutes to obtain PLL lock with multiple time
    >> sources.


    ntpd obtains PLL lock with a mix of all the good time sources, unless
    you force it to use one. It chooses the reference source long before it
    has tight PLL lock.

    >> * Then select a preferred source as candidate to configure the kernel.


    ntpd's source selection doesn't affect what is fed to the kernel.
    Unless one specifically inhibits it, that is a mix of all the valid time
    sources.

    >> * NTP would then configure the kernel PLL to obtain convergence.
    >> *** Once convergence was complete the 0x40 UNSYNC bitwise flag would be
    >> reset in the kernel by NTP. ***


    ntpd releases UNSYNC as soon as it starts disciplining the kernel, which
    is long before the PLL has stabilised.

    >
    > ntp has nothing to do with the kernel. The kernel is Linus Torvald's
    > business, not David Mills (much to that latter's annoyance when the kernel
    > people mess up the kernel timekeeping).


    This part of the kernel was contributed by the NTP project! The UNSYNC
    flag is not masked out in the API, so is controlled by ntpd.

    >
    > As far as I know, the only purpose of that flag is turn on the 11 min rtc
    > procedure ( evey 11 min the kernel resets the rtc to the current system
    > time) with a very inaccurate procedure.


    It might not be optimal, but it does take steps to improve accuracy.
    >
    >
    >> * NTP would continue to monitor/manage/update the kernel PLL.

    >
    >> It appears since between version 4.2.0.a.20040617 and 4.2.2p1 the
    >> penultimate item in the list above is no longer occurring.

    >
    >
    >
    >> Question 1) Can someone confirm is the "UNSYNC" status flag held inside
    >> the kernel is arbitrary, i.e. its just an informational flag and is
    >> independent of the operation / function of NTP ?


    As noted, I think I believe it controls setting of the RTC clock every
    11 minutes. It doesn't look like it affects the behaviour of the kernel
    discipline. But then you could have looked at the code, the same as I did.
    >
    >
    >> Question 2) Am I correctly interpreting the purpose of the UNSYNC flag
    >> ? I have a periodic script that runs and checks to see if adjtimex
    >> reports the nominal status of 0x01 PLL, as opposed to (anything else for
    >> example 0x41 = PLL|UNSYNC). This used to provide me with a mechanism
    >> to alert me to problems with NTP configuration when a host became UNSYNC
    >> so that as administrator I could investigate why a system became unsync.

    >
    > A far better idea is to monitor the offset from the ntp servers to let you
    > know if there is a clock problem.


    UNSYNC will tell you when you haven't had updates.
    >
    >
    >
    >> Question 3) If NTP exits/crashes does the kernel automatically re-arm
    >> the UNSYNC flag if the PLL data has not been updated within a specified
    >> period of time (like within 3 minutes) ? i.e. the kernel will fail-safe
    >> back to UNSYNC if it can clearly observe that no application has called
    >> the appropiate NTP API to keep the UNSYNC status flag muted. This is a
    >> sort of watchdog that does the correct thing in the case of failure ?


    UNSYNC gets set (2.4 kernel) when you manually set the time or the
    estimated error overflows. In your case, the estimated error is at the
    end stop. That could be cause or effect, as setting the time manually
    also forces the error to maximum.
    >
    >> When I googled this problem I found a suggestion that "enable kernel"
    >> command can do the trick. I do use NTP keys between my external data
    >> sources and when I tried this command into ntpdc it asked me for a key.


    The kernel discipline is enabled by default, provided that you don't
    have configuration parameters that incompatible. If you did have
    incompatible parameters, I don't believe you would get a PLL state.

    >
    >> ntpdc> enable kernel
    >> Keyid: 1
    >> MD5 Password:
    >> ***Permission denied
    >> ntpdc>

    >
    >
    >> Both systems run ntp as non-root, both systems have the appropriate Linux kernel capability bit set CAP_SYS_TIME :


    I'm not sure if that mode is supported in the official code.

    >
    >> # cat /proc/3268/status
    >> CapInh: 0000000002000000
    >> CapPrm: 0000000002000000
    >> CapEff: 0000000002000000

    >
    >> I guess in order to configure the PLL ntp is going to need that capability anyway.

    >
    >
    >
    >> So I'm at a bit for a loss as what the cause of the UNSYNC flag sticking
    >> long after both NTP and the kernel have obtain a good enough PLL sync to
    >> believe they are "in-step".

    >
    > Leave it unsynced. It serves no useful purpose AFAIK. hwclock is a much better
    > idea to use to set the rtc, and does a much better job of it ( including
    > determining the drift of the rtc and compensating for it. )


    Being unsynced indicates a problem. The end stop estimated errors also
    indicate a problem. If you don't want the 11 minute mode, build a
    kernel without it.

  4. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernelhave PPL sync

    Darryl,

    It doesn't make sense to manage that bit. Use the maximum error
    statistic instead.

    When the client is first started until setting the clock, this statistic
    will be large (~16 s), as it is in your example. Once the clock is set
    and after that this statistic is set to the synchronization distance
    determined by the daemon.

    If the daemon crashes or loses all sources, the kernel will increase the
    distance as required by the specification. Application programs can
    establish their own bound (~1 s) above which they consider the clock
    unsynchronized. The problem with managing the bit is that the kernel
    doesn't know your particular bound.

    Dave

    Darryl Miles wrote:
    > I am currently using NTP 4.2.2p1.
    >
    > My problem is with the management of the "unsync" flag in the kernel.
    > This is visible from the "status" line in :
    >
    > ntpdc> kerninfo
    > pll offset: 4.7e-05 s
    > pll frequency: -62.146 ppm
    > maximum error: 16.384 s
    > estimated error: 16.384 s
    > status: 0041 pll unsync
    > pll time constant: 2
    > precision: 1e-06 s
    > frequency tolerance: 512 ppm
    > ntpdc>
    >
    >
    > Older version of NTP used to manage this flag, this is my understanding
    > of how things used worked (based on observation not understanding) :
    >
    > * Kernel would bootup and by default the status would have the "unsync"
    > bit set.
    > * Then NTP would be started.
    > * NTP would take a few minutes to obtain PLL lock with multiple time
    > sources.
    > * Then select a preferred source as candidate to configure the kernel.
    > * NTP would then configure the kernel PLL to obtain convergence.
    > *** Once convergence was complete the 0x40 UNSYNC bitwise flag would be
    > reset in the kernel by NTP. ***
    > * NTP would continue to monitor/manage/update the kernel PLL.
    >
    > It appears since between version 4.2.0.a.20040617 and 4.2.2p1 the
    > penultimate item in the list above is no longer occurring.
    >
    >
    >
    > Question 1) Can someone confirm is the "UNSYNC" status flag held inside
    > the kernel is arbitrary, i.e. its just an informational flag and is
    > independent of the operation / function of NTP ?
    >
    >
    > Question 2) Am I correctly interpreting the purpose of the UNSYNC flag
    > ? I have a periodic script that runs and checks to see if adjtimex
    > reports the nominal status of 0x01 PLL, as opposed to (anything else for
    > example 0x41 = PLL|UNSYNC). This used to provide me with a mechanism
    > to alert me to problems with NTP configuration when a host became UNSYNC
    > so that as administrator I could investigate why a system became unsync.
    >
    >
    > Question 3) If NTP exits/crashes does the kernel automatically re-arm
    > the UNSYNC flag if the PLL data has not been updated within a specified
    > period of time (like within 3 minutes) ? i.e. the kernel will fail-safe
    > back to UNSYNC if it can clearly observe that no application has called
    > the appropiate NTP API to keep the UNSYNC status flag muted. This is a
    > sort of watchdog that does the correct thing in the case of failure ?
    >
    >
    > When I googled this problem I found a suggestion that "enable kernel"
    > command can do the trick. I do use NTP keys between my external data
    > sources and when I tried this command into ntpdc it asked me for a key.
    >
    > ntpdc> enable kernel
    > Keyid: 1
    > MD5 Password:
    > ***Permission denied
    > ntpdc>
    >
    >
    > Both systems run ntp as non-root, both systems have the appropriate Linux kernel capability bit set CAP_SYS_TIME :
    >
    > # cat /proc/3268/status
    > CapInh: 0000000002000000
    > CapPrm: 0000000002000000
    > CapEff: 0000000002000000
    >
    > I guess in order to configure the PLL ntp is going to need that capability anyway.
    >
    >
    >
    > So I'm at a bit for a loss as what the cause of the UNSYNC flag sticking
    > long after both NTP and the kernel have obtain a good enough PLL sync to
    > believe they are "in-step".
    >
    >
    > Thanks,
    >
    > Darryl


  5. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernelhave PPL sync

    David L. Mills wrote:
    >
    > It doesn't make sense to manage that bit. Use the maximum error
    > statistic instead.


    Whilst that is a good suggestion. Those statistics are also showing an
    alarm state in this case.

  6. Re: Linux NTP Kernel unsync flag remains longafter NTP&Kernel have PPL sync

    Thanks for your replies.


    Unruh wrote:
    > A far better idea is to monitor the offset from the ntp servers to

    let you
    > know if there is a clock problem.


    I'd appreciate a tool for that. "/usr/sbin/ntpdc -check 0.0000:0000:0000
    -print" that takes various parameters for your acceptable accuracy and
    returns with zero/non-zero exit status. That might also dump data like
    adjtimex -print and indicate items of concern to the administrator.

    The params 0.0000:0000:0000 would be some acceptable accuracy
    description on offset/error/whatever makes sense to ntp groks. I
    wouldn't know what to put!


    > Leave it unsynced. It serves no useful purpose AFAIK. hwclock is a

    much better
    > idea to use to set the rtc, and does a much better job of it ( including
    > determining the drift of the rtc and compensating for it. )


    On a personal observation note, I'm not sure I agree that hwclock/drift
    file is even good for managing the hardware RTC. While the machine is
    switched on we have NTP, while the machine is switched off the
    internal/component temperature is vastly different so any drift
    estimation maintained over time whilst powered up might not be in the
    right ballpark, you do maintain different drift data whilst powered up
    and powered down don't you ?



    David Woolley wrote:
    > Being unsynced indicates a problem. The end stop estimated errors also
    > indicate a problem. If you don't want the 11 minute mode, build a
    > kernel without it.


    Ah ha, now I see. Yes, the maximum error / estimated error of my
    systems does appear to be at a 16bit unsigned integer endstop:

    >>> ntpdc> kerninfo
    >>> pll offset: 4.7e-05 s
    >>> pll frequency: -62.146 ppm
    >>> maximum error: 16.384 s
    >>> estimated error: 16.384 s
    >>> status: 0041 pll unsync
    >>> pll time constant: 2
    >>> precision: 1e-06 s
    >>> frequency tolerance: 512 ppm
    >>> ntpdc>



    The above data is for a running system that has (as far as I can tell)
    got plenty of reachability with a diverse number of systems and is in step.

    ntpdc> peers
    remote local st poll reach delay offset disp
    ================================================== =====================
    =80.85.129.25 xxx.yy.0.137 3 1024 77 0.01630 0.000452 0.28458
    *82.133.58.132 xxx.yy.0.137 2 1024 377 0.02780 -0.001716 0.13663
    =127.127.1.0 127.0.0.1 10 64 377 0.00000 0.000000 0.03059
    -xxx.yy.0.191 xxx.yy.0.137 2 1024 376 0.00435 0.001452 0.16240
    =86.59.99.138 xxx.yy.0.137 3 1024 377 0.03571 0.001900 0.12178
    =83.170.75.28 xxx.yy.0.137 3 1024 377 0.00484 -0.000715 0.13660
    +xxx.yy.0.240 xxx.yy.0.137 3 1024 357 0.00031 0.006866 0.14854
    ^xxx.yy.0.255 xxx.yy.0.137 16 64 0 0.00000 0.000000 4.00000
    (BROADCAST addr)
    +zz.xxx.83.153 xxx.yy.0.137 16 1024 0 0.00000 0.000000 3.99217
    =84.54.128.8 xxx.yy.0.137 2 1024 377 0.06641 0.002147 0.12181

    ntpdc> sysinfo
    system peer: unused.foobar.com
    system peer mode: sym_passive
    leap indicator: 00
    stratum: 3
    precision: -20
    root distance: 0.05646 s
    root dispersion: 0.05922 s
    reference ID: [xx.yy.0.191]
    reference time: cc5effc0.8730a3ef Tue, Aug 26 2008 23:18:40.528
    system flags: auth monitor ntp kernel stats
    jitter: 0.000320 s
    stability: 0.000 ppm
    broadcastdelay: 0.003998 s
    authdelay: 0.000003 s

    # uname -a
    Linux host1.foobar.com 2.6.18-53.1.21.el5xen #1 SMP Tue May 20 10:03:27
    EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


    So why might the kernel maximum/estimated error at the end stop ?





    David L. Mills wrote:
    > When the client is first started until setting the clock, this statistic
    > will be large (~16 s), as it is in your example. Once the clock is set
    > and after that this statistic is set to the synchronization distance
    > determined by the daemon.


    Right so that is what is meant to happen, but it is not taking place for me.


    > If the daemon crashes or loses all sources, the kernel will increase the
    > distance as required by the specification. Application programs can
    > establish their own bound (~1 s) above which they consider the clock
    > unsynchronized. The problem with managing the bit is that the kernel
    > doesn't know your particular bound.


    Which agree's with my suggestion for some params to "ntpdc" to allow
    configuration of my bounds with an accuracy check.


    Darryl

  7. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernelhave PPL sync

    David,

    The bit is never set, so the system calls never show error.

    Dave

    David Woolley wrote:
    > David L. Mills wrote:
    >
    >>
    >> It doesn't make sense to manage that bit. Use the maximum error
    >> statistic instead.

    >
    >
    > Whilst that is a good suggestion. Those statistics are also showing an
    > alarm state in this case.


  8. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernelhave PPL sync

    David L. Mills wrote:
    > David,
    >
    > The bit is never set, so the system calls never show error.


    That conflicts with the evidence presented by the questioner. I think
    it is true that ntpd never sets it in the kernel(although 4.2.4p4 (which
    is more recent than his) does set it in the user space copy. However
    the kernel does set it, as I already noted, on startup, when the time is
    set manually, and when the estimated error hits its end stop.

    However, that is largely irrelevant, as one could rephrase the question
    to be, earlier versions of ntpd used to set the estimated error to soem
    low value when started, why is his version leaving it set at 16+ seconds?

    (I suspect user error.)

    > David Woolley wrote:
    >> David L. Mills wrote:
    >>
    >>>
    >>> It doesn't make sense to manage that bit. Use the maximum error
    >>> statistic instead.

    >>
    >>
    >> Whilst that is a good suggestion. Those statistics are also showing
    >> an alarm state in this case.


  9. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernelhave PPL sync

    David,

    The NTP development version on the web (p125) does not set the
    STA_UNSYNC bit anywhere. A grep for this bit shows only legacy means for
    ntpdc to clear it. While the production version on the web is dated one
    day before the development version, its ntp_loopfilter.c file is dated
    February 2007 and does set it.

    Unfortunately, the productino version and stable version are on two
    different tracks and with different heritage of individual modules. I
    would hope that a version of the release date would have been
    synchronized to the development version of that date, but this is not
    the case. Accordingly, you can't believe anytnhing I say or can I fix
    anything you report, unless you are using a relatively recent
    development version. This holds true for all presumed features, bugs and
    documentation.

    As some of you know, I have been working full time since June 2007
    cleaning up the code, aligning to the NTPv4 specification, adding new
    features and rewriting much of the web documentation. The core protocol
    modules in the production version date from late 2006 and early 2007, so
    most of the work reported to this list and the hackers list is not in
    the production version. So, if you suspect I have done something evil
    and are using the production version, I can't help you.

    Dave

    David Woolley wrote:

    > David L. Mills wrote:
    >
    >> David,
    >>
    >> The bit is never set, so the system calls never show error.

    >
    >
    > That conflicts with the evidence presented by the questioner. I think
    > it is true that ntpd never sets it in the kernel(although 4.2.4p4 (which
    > is more recent than his) does set it in the user space copy. However
    > the kernel does set it, as I already noted, on startup, when the time is
    > set manually, and when the estimated error hits its end stop.
    >
    > However, that is largely irrelevant, as one could rephrase the question
    > to be, earlier versions of ntpd used to set the estimated error to soem
    > low value when started, why is his version leaving it set at 16+ seconds?
    >
    > (I suspect user error.)
    >
    >> David Woolley wrote:
    >>
    >>> David L. Mills wrote:
    >>>
    >>>>
    >>>> It doesn't make sense to manage that bit. Use the maximum error
    >>>> statistic instead.
    >>>
    >>>
    >>>
    >>> Whilst that is a good suggestion. Those statistics are also showing
    >>> an alarm state in this case.


  10. Re: Linux NTP Kernel unsync flag remains long after NTP & Kernel have PPL sync

    On 2008-08-27, David L. Mills wrote:

    > David Woolley wrote:
    >
    >> David L. Mills wrote:
    >>
    >>> The bit is never set, so the system calls never show error.

    >>
    >> That conflicts with the evidence presented by the questioner. I think
    >> it is true that ntpd never sets it in the kernel(although 4.2.4p4 (which
    >> is more recent than his) does set it in the user space copy.

    >
    > The NTP development version on the web (p125) does not set the
    > STA_UNSYNC bit anywhere. A grep for this bit shows only legacy means for
    > ntpdc to clear it. While the production version on the web is dated one
    > day before the development version, its ntp_loopfilter.c file is dated
    > February 2007 and does set it.


    [snip]

    > As some of you know, I have been working full time since June 2007
    > cleaning up the code, aligning to the NTPv4 specification, adding new
    > features and rewriting much of the web documentation. The core protocol
    > modules in the production version date from late 2006 and early 2007, so
    > most of the work reported to this list and the hackers list is not in
    > the production version. So, if you suspect I have done something evil
    > and are using the production version, I can't help you.


    Both the NTP-stable and NTP-Dev releases are given equal billing on the
    NTP Project download page (http://www.ntp.org/downloads.html) and the
    NTP Public Services Project download page
    (http://support.ntp.org/download).

    All NTP releases are announced in a variety of ways which are detailed
    at http://support.ntp.org/bin/view/Main...eNotifications. We do
    not yet offer a dedicated release announcements mailing list but would
    consider doing so if there was sufficient interest.

    Notifications for every NTP-dev release are sent to the hackers@ mailing
    list at the time of the release. These notifications contain change
    information as well as download links.

    We have made an effort to insure that all releases are well publicized.
    But we can't control what version of NTP is shipped with the many OSes
    which are out there. Plus, quite a few people prefer to stick with the
    software versions which are pre-packaged for and shipped with their
    particular OS.

    For BSD users FreshPorts currently lists ntp-devel 4.2.5p122
    and ntp 4.2.4p4 at http://www.freshports.org/net/ntp-devel/ and
    http://www.freshports.org/net/ntp/ respectively.

    Debian does not ship any ntp-dev packages so I have set up a build
    system for Debian packages of ntp-dev (against the current Debian
    stable release "etch" on x86). These packages are available from
    http://packages.ntp.org/debian.

    If there is sufficient interest "we" could look at leveraging
    the openSUSE Build Service, https://build.opensuse.org/, for
    building packages for other Linux OSes and architectures. A list
    of the supported RPM-based OSes is available about half-way down
    http://en.opensuse.org/Build_Service...package_how_to.
    I may need assistance from someone well versed in building RPM packages
    and setting up RPM archives.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  11. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    On 2008-08-26, Darryl Miles wrote:

    > Unruh wrote:
    >
    > > A far better idea is to monitor the offset from the ntp servers to
    > > let you know if there is a clock problem.

    >
    > I'd appreciate a tool for that. "/usr/sbin/ntpdc -check
    > 0.0000:0000:0000 -print"


    ntpq is the preferred monitoring tool.

    ntpq -c"rv 0 offset" will tell you the current offset of your ntpd.

    > that takes various parameters for your acceptable accuracy and returns
    > with zero/non-zero exit status. That might also dump data like
    > adjtimex -print and indicate items of concern to the administrator.


    Collecting information from all those sources is the job for a script.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  12. Re: Linux NTP Kernel unsync flag remains longafter NTP&Kernel have PPL sync

    Steve Kostecke wrote:
    > On 2008-08-26, Darryl Miles wrote:
    >
    >> Unruh wrote:
    >>
    >>> A far better idea is to monitor the offset from the ntp servers to
    >>> let you know if there is a clock problem.

    >> I'd appreciate a tool for that. "/usr/sbin/ntpdc -check
    >> 0.0000:0000:0000 -print"

    >
    > ntpq is the preferred monitoring tool.
    >
    > ntpq -c"rv 0 offset" will tell you the current offset of your ntpd.


    So it is the "offset" that I should look at! Thanks, I wasn't sure.

    I guess an offset of 0.0000 is perfect ?

    Now how do I tell the difference between an offset being reported as
    0.0000 due to no sync and an offset being reported as 0.0000 due to a
    perfect sync ?

    I'm trying to establish that whomever created such a
    tool/script/whatever which accepted my simple bound requirements has
    taken into account all failure scenarios that I can / the community can
    think of.

    Then make it really easy for them to ask the NTP sub-system to report on
    that its well being in respect of its primary function.


    How abouts a new ntpq/ntpdc command "summary" could be implemented, with
    a simple "key: value" output of data, with simple "WARNING:" and
    "ERROR:" and "FATAL:" reporting of concerns. With "Overall Status: GOOD"

    Another new command verify/check accuracy against a bounds specification
    and again report what is inside that bound and what is outside that bound.


    Then all that is left is to publish a paragraph into a man page with a
    few examples of possible "bound requirement specifications" and what
    they might mean to a system in real life.

    As a system admin wanting to monitor their NTP and kernel clock state
    (as judged by NTP) only needs to consult documentation and copy'n'paste
    from an example. This would be ideal planning.


    >> that takes various parameters for your acceptable accuracy and returns
    >> with zero/non-zero exit status. That might also dump data like
    >> adjtimex -print and indicate items of concern to the administrator.

    >
    > Collecting information from all those sources is the job for a script.


    No problem on the mechanism to do it, but its a job for an NTP groker
    and maybe something to be shipped as part of the NTP suite, i.e. not
    something a system administrator wants to make up on the spot and get it
    so easily wrong.


    Darryl

  13. Re: Linux NTP Kernel unsync flag remains longafter NTP&Kernel have PPL sync

    Darryl Miles wrote:
    >I guess an offset of 0.0000 is perfect ?


    Yes.

    >Now how do I tell the difference between an offset being reported as
    >0.0000 due to no sync and an offset being reported as 0.0000 due to a
    >perfect sync ?


    Look at the output of that command while (say) NTP is starting up and
    not yet synchronised:

    assID=0 status=c011 sync_alarm, sync_unspec, 1 event, event_restart,
    offset=0.000

    compared to normal running:

    assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,
    offset=0.148

    Dave
    --
    ** Dave Holland ** Systems Support -- Infrastructure Management **
    ** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK **

  14. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    On 2008-08-28, Dave Holland wrote:

    > Darryl Miles wrote:
    >
    >>I guess an offset of 0.0000 is perfect ?

    >
    > Yes.


    Remember that these stats are just a snapshot. The real indicator of
    clock stability is to summarize the stats over a long period of time.

    The peer.awk utility in the scripts directory may be used for this
    purpose. For example, the system at my desk shows:

    $ awk -f peer.awk /var/log/ntpstats/peerstats

    ident cnt mean rms max delay dist disp
    ================================================== ============
    192.168.19.4 66 -1.412 1.410 3.703 1.339 21.965 17.137

    >>Now how do I tell the difference between an offset being reported as
    >>0.0000 due to no sync and an offset being reported as 0.0000 due to a
    >>perfect sync ?

    >
    > Look at the output of that command while (say) NTP is starting up and
    > not yet synchronised:
    >
    > assID=0 status=c011 sync_alarm, sync_unspec, 1 event, event_restart,
    > offset=0.000
    >
    > compared to normal running:
    >
    > assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,


    The status bits are decoded on that line:

    0xxx == leap_none
    x6xx == sync_ntp
    xx4x == 4 events
    xxx4 == event_peer/strat_chg

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  15. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    Hello Darryl,

    On Tuesday, August 26, 2008 at 22:24:37 +0000, Darryl Miles wrote:

    > I'm not sure I agree that hwclock/drift file is even good for managing
    > the hardware RTC. While the machine is switched on we have NTP, while
    > the machine is switched off the internal/component temperature is
    > vastly different so any drift estimation maintained over time whilst
    > powered up might not be in the right ballpark


    Of course: hwclock needs the switched-off-during-night RTC drift rate in
    its /etc/adjtime file, in order to initialise at best the system clock
    at startup next morning, before NTP takes control.


    > you do maintain different drift data whilst powered up and powered
    > down don't you ?


    Outside of some specific cases (like 24/7 servers), generally hwclock
    doesn't really care about the drift rate of the RTC during power up.
    Maintaining 2 rates is of course possible, but delicate. In many cases
    there is only one important drift rate: the power-down rate.


    One basic method to get and use the power-down RTC drift rate is to call
    "hwclock --systohc --nodrift" during shutdown, with --nodrift to prevent
    an importune recalculation of the rate. Then after startup, as soon as
    the system clock is tightly synced, call "hwclock --systohc" to evaluate
    the RTC drift since the last shutdown. However this method cannot
    possibly work with the eleven-minutes mode.


    Serge.
    --
    Serge point Bets arobase laposte point net

  16. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel havePPL sync

    Darryl Miles wrote:

    > Now how do I tell the difference between an offset being reported as
    > 0.0000 due to no sync and an offset being reported as 0.0000 due to a
    > perfect sync ?
    >

    Perfect sync might be associated with a gaussian distribution of offsets
    centred around zero, although a systematic error could also produce the
    same results. Every offset being zero indicates there is something
    broken - it is too perfect. (Note, the statistics of offsets might not
    be gaussian.)

  17. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel havePPL sync

    Steve Kostecke wrote:

    >
    > ntpq -c"rv 0 offset" will tell you the current offset of your ntpd.



    Careful. That is "Offset", not the common sense meaning of offset,
    which might be put something like: the true error between the local
    clock and true time.

    To the extent that is possible to say that "Offset" accurately
    represents offset, ntpd's algorithms are flawed, and need improving
    until "Offset" again consists only of measurement noise. Whether they
    are actually flawed is the basis of the long running dispute between
    Unruh and Dave Mills.

  18. Re: Linux NTP Kernel unsync flag remains longafter NTP&Kernel have PPL sync

    Steve Kostecke wrote:
    > On 2008-08-28, Dave Holland wrote:
    >> Darryl Miles wrote:
    >>
    >>> I guess an offset of 0.0000 is perfect ?

    >> Yes.

    >
    > Remember that these stats are just a snapshot. The real indicator of
    > clock stability is to summarize the stats over a long period of time.


    Yes and I still see it that NTP the daemon is in the correct position to
    make the judgment.

    All that remains is to feed NTP with a command detailing your
    bounds/requirements and let it come back with an opinion in respect of that.



    >>> Now how do I tell the difference between an offset being reported as
    >>> 0.0000 due to no sync and an offset being reported as 0.0000 due to a
    >>> perfect sync ?

    >> Look at the output of that command while (say) NTP is starting up and
    >> not yet synchronised:
    >>
    >> assID=0 status=c011 sync_alarm, sync_unspec, 1 event, event_restart,
    >> offset=0.000
    >>
    >> compared to normal running:
    >>
    >> assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,


    So "sync_ntp" is the important one here ?

    And the state of convergence is the current estimated "offset" ?

    I think what I want it both "sync_ntp" and a small enough offset to be
    happy.



    This still does not explain why kernel info for me is at the endstops:

    ntpdc> kerninfo
    pll offset: 4294.97 s
    pll frequency: -62.504 ppm
    maximum error: 16.384 s
    estimated error: 16.384 s
    status: 0041 pll unsync
    pll time constant: 6
    precision: 1e-06 s
    frequency tolerance: 512 ppm


    The systems appear to be keeping time but are not happy to reduce their
    estimate of possible error.



    Thanks for all your pointers in this matter,

    Darryl

  19. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    On 2008-08-28, Serge Bets wrote:

    > One basic method to get and use the power-down RTC drift rate is to call
    > "hwclock --systohc --nodrift" during shutdown, with --nodrift to prevent
    > an importune recalculation of the rate. Then after startup, as soon as
    > the system clock is tightly synced, call "hwclock --systohc" to evaluate
    > the RTC drift since the last shutdown. However this method cannot
    > possibly work with the eleven-minutes mode.


    Why does any of this matter when you have NTP to set the clock at start
    up?

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  20. Re: Linux NTP Kernel unsync flag remains long after NTP&Kernel have PPL sync

    On 2008-08-28, David Woolley wrote:

    > Steve Kostecke wrote:
    >
    >> ntpq -c"rv 0 offset" will tell you the current offset of your ntpd.

    >
    > Careful. That is "Offset", not the common sense meaning of offset,
    > which might be put something like: the true error between the local
    > clock and true time.


    Let's see ... there's the offsets shown with:

    ntpq -c"rv 0 offset"

    ntpq -p

    ntpdc -c kern | grep offset

    Take your pick ...

    Running all of those commands together against one of my systems
    produces the following (I've combined the two ntpq invocations):

    $ ntpq -pc"rv 0 offset" edge_box | awk '/offset=/ { print }; \
    /^*/ { print $1 " " $9}'; ntpdc -c kern edge_box | grep offset
    offset=0.421
    *ntp.cox.net 0.264
    pll offset: 0.000407 s

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

+ Reply to Thread
Page 1 of 2 1 2 LastLast