Odd (mis)behavior when reference clock fails - NTP

This is a discussion on Odd (mis)behavior when reference clock fails - NTP ; [demime 1.01d removed an attachment of type multipart/signed]...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 33

Thread: Odd (mis)behavior when reference clock fails

  1. Odd (mis)behavior when reference clock fails


    [demime 1.01d removed an attachment of type multipart/signed]

  2. Re: Odd (mis)behavior when reference clockfails

    On Tue, 16 Sep 2008, Kevin Oberman wrote:

    > We have a fairly large "mesh" of NTP servers spread across the
    > US. Almost all have PPS reference clocks and are quite
    > accurate. Recently one of the reference clocks located across the county
    > seems to have failed. Such is life.
    >
    > The problem is that the system's time started drifting and eventually
    > became far enough out of sync with the mesh to be marked as a bad
    > ticker.
    >
    > The only way I could get the clock to slew or step the time was to edit
    > the configuration and comment out the reference clock and PPS. It looks
    > like the system will only use the time from a reference clock when and if
    > the clock is configured, even if it can't be read.
    >
    > Is there any way to "fix" this?


    What is it that you consider broken? Please clarify.
    I've re-read this several times, and don't see the problem.
    A reference clock broke. It was disregarded because it chimed
    badly.
    You expected something different?

    A clock must be configured to be used, yes. Sad but true.

    > --
    > R. Kevin Oberman, Network Engineer
    > Energy Sciences Network (ESnet)
    > Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    > E-mail: oberman@es.net Phone: +1 510 486-8634
    > Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
    >


  3. Re: Odd (mis)behavior when reference clock fails

    hundoj@comcast.net (Rob Neal) writes:

    >On Tue, 16 Sep 2008, Kevin Oberman wrote:


    >> We have a fairly large "mesh" of NTP servers spread across the
    >> US. Almost all have PPS reference clocks and are quite
    >> accurate. Recently one of the reference clocks located across the county
    >> seems to have failed. Such is life.
    >>
    >> The problem is that the system's time started drifting and eventually
    >> became far enough out of sync with the mesh to be marked as a bad
    >> ticker.
    >>
    >> The only way I could get the clock to slew or step the time was to edit
    >> the configuration and comment out the reference clock and PPS. It looks
    >> like the system will only use the time from a reference clock when and if
    >> the clock is configured, even if it can't be read.
    >>
    >> Is there any way to "fix" this?


    > What is it that you consider broken? Please clarify.
    > I've re-read this several times, and don't see the problem.
    > A reference clock broke. It was disregarded because it chimed
    > badly.
    > You expected something different?


    A hardware clock broke. The computer which was using that hardware clock
    insisted on using that hardware clock even though it gave no time. It acted
    as a server, and eventually its time drifted so badly everyone else saw it
    as a bad chimer.
    It seems to have had other server lines in the /etc/ntp.conf, but ignored
    them in favour of a non-working refclock.

    That is how I interpret what he said, but I may be wrong as well.


    > A clock must be configured to be used, yes. Sad but true.



  4. Re: Odd (mis)behavior when reference clockfails


    [demime 1.01d removed an attachment of type multipart/signed]

  5. Re: Odd (mis)behavior when reference clock fails

    Firstly, the original of this thread root has been demimed out of
    existence by the mail to news gateway. I thought the official line is
    that what went out to the mailing list was the same that which went to
    the newsgroup.

    All that remains is:

    > [demime 1.01d removed an attachment of type multipart/signed]


    This can be confirmed on Google groups.

    Rob Neal wrote:
    > On Tue, 16 Sep 2008, Kevin Oberman wrote:
    >
    >> We have a fairly large "mesh" of NTP servers spread across the
    >> US. Almost all have PPS reference clocks and are quite
    >> accurate. Recently one of the reference clocks located across the county
    >> seems to have failed. Such is life.
    >>
    >> The problem is that the system's time started drifting and eventually
    >> became far enough out of sync with the mesh to be marked as a bad
    >> ticker.
    >>
    >> The only way I could get the clock to slew or step the time was to edit
    >> the configuration and comment out the reference clock and PPS. It looks
    >> like the system will only use the time from a reference clock when and if
    >> the clock is configured, even if it can't be read.


    That should not be the case. Are you sure that the clock had stopped
    responding and stopped providing a PPS signal? If it is still providing
    PPS this will be used, and other clocks only to resolve the second
    ambiguity.

    Another thing to check of is whether there was a local clock configured.
    This can compromise fault recovery.

    What we really need is the contents of the configuration file and the
    result of running ntpq peers. We may then ask you for the result of
    running ntpq rv on the system and on each of its associations.

    Please reply in plain text, or directly to the newsgroup, otherwise
    neither I nor the originator of NTP will see your reply.

  6. Re: Odd (mis)behavior when reference clock fails

    On 2008-09-19, David Woolley wrote:

    > Firstly, the original of this thread root has been demimed out of
    > existence by the mail to news gateway. I thought the official line is
    > that what went out to the mailing list was the same that which went to
    > the newsgroup.


    Thank-you so much for your sarcasm. It goes so far in helping to make
    things work better.

    All articles which are posted to the news-group are forwarded to the
    mailing list after MIME stripping (see the further discussion below).

    All messages which are posted to the mailing list, with a very small
    number of exceptions, are injected to the news group.

    > All that remains is:
    >
    > > [demime 1.01d removed an attachment of type multipart/signed]


    This occured because the OP insists on digitally signing his mailing list
    posts.

    In the past we have received numerous loud complaints which voiced
    strong objections to certain types of content crossing the gateway from
    mail to news. Vociferous complaints have been made about virtually every
    form of content which is not plain text.

    We have also received numerous complaints about undesireable content
    (e.g. MI5 complaints) propagating from news to mail.

    We have made an effort to ensure that _only_ the content and formatting
    which has been deemed acceptable by the participants in this news group
    is allowed to propagate from mail to news.

    Unfortunately in the _very_ few cases that something not been processed
    correctly we receive more complaints.

    So far no one, not even _you_, has come forward with any vaguely
    constructive suggestions. Nor has anyone displayed an interest in doing
    anything more than mashing their reply button.

    > Please reply in plain text,


    I have directly contact the OP each time that he sends a signed message
    and asked that he resend the message without the signature. To this date
    he has failed to do so.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  7. Re: Odd (mis)behavior when reference clockfails

    Dave,

    The gateway chokes on signed mail and all of my mail is signed. As a
    result, I guess none of my messages will ever make it to the news
    group. :-(

    I'm going to TRY to get my mailer to not sign this, but it may not make
    it either.
    --
    R. Kevin Oberman, Network Engineer
    Energy Sciences Network (ESnet)
    Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    E-mail: oberman@es.net Phone: +1 510 486-8634
    Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
    > From: David Woolley
    > Date: Fri, 19 Sep 2008 08:08:38 +0100
    > Sender: questions-bounces+oberman=es.net@lists.ntp.org
    >
    >
    > Firstly, the original of this thread root has been demimed out of
    > existence by the mail to news gateway. I thought the official line is
    > that what went out to the mailing list was the same that which went to
    > the newsgroup.
    >
    > All that remains is:
    >
    > > [demime 1.01d removed an attachment of type multipart/signed]

    >
    > This can be confirmed on Google groups.
    >
    > Rob Neal wrote:
    > > On Tue, 16 Sep 2008, Kevin Oberman wrote:
    > >
    > >> We have a fairly large "mesh" of NTP servers spread across the
    > >> US. Almost all have PPS reference clocks and are quite
    > >> accurate. Recently one of the reference clocks located across the county
    > >> seems to have failed. Such is life.
    > >>
    > >> The problem is that the system's time started drifting and eventually
    > >> became far enough out of sync with the mesh to be marked as a bad
    > >> ticker.
    > >>
    > >> The only way I could get the clock to slew or step the time was to edit
    > >> the configuration and comment out the reference clock and PPS. It looks
    > >> like the system will only use the time from a reference clock when and if
    > >> the clock is configured, even if it can't be read.

    >
    > That should not be the case. Are you sure that the clock had stopped
    > responding and stopped providing a PPS signal? If it is still providing
    > PPS this will be used, and other clocks only to resolve the second
    > ambiguity.
    >
    > Another thing to check of is whether there was a local clock configured.
    > This can compromise fault recovery.
    >
    > What we really need is the contents of the configuration file and the
    > result of running ntpq peers. We may then ask you for the result of
    > running ntpq rv on the system and on each of its associations.
    >
    > Please reply in plain text, or directly to the newsgroup, otherwise
    > neither I nor the originator of NTP will see your reply.
    >
    > _______________________________________________
    > questions mailing list
    > questions@lists.ntp.org
    > https://lists.ntp.org/mailman/listinfo/questions
    >


  8. Re: Odd (mis)behavior when reference clockfails

    > From: Steve Kostecke
    > Date: 19 Sep 2008 12:21:19 GMT
    > Sender: questions-bounces+oberman=es.net@lists.ntp.org
    >
    >
    > On 2008-09-19, David Woolley wrote:
    >
    > > Firstly, the original of this thread root has been demimed out of
    > > existence by the mail to news gateway. I thought the official line is
    > > that what went out to the mailing list was the same that which went to
    > > the newsgroup.

    >
    > Thank-you so much for your sarcasm. It goes so far in helping to make
    > things work better.
    >
    > All articles which are posted to the news-group are forwarded to the
    > mailing list after MIME stripping (see the further discussion below).
    >
    > All messages which are posted to the mailing list, with a very small
    > number of exceptions, are injected to the news group.
    >
    > > All that remains is:
    > >
    > > > [demime 1.01d removed an attachment of type multipart/signed]

    >
    > This occured because the OP insists on digitally signing his mailing list
    > posts.
    >
    > In the past we have received numerous loud complaints which voiced
    > strong objections to certain types of content crossing the gateway from
    > mail to news. Vociferous complaints have been made about virtually every
    > form of content which is not plain text.
    >
    > We have also received numerous complaints about undesireable content
    > (e.g. MI5 complaints) propagating from news to mail.
    >
    > We have made an effort to ensure that _only_ the content and formatting
    > which has been deemed acceptable by the participants in this news group
    > is allowed to propagate from mail to news.
    >
    > Unfortunately in the _very_ few cases that something not been processed
    > correctly we receive more complaints.
    >
    > So far no one, not even _you_, has come forward with any vaguely
    > constructive suggestions. Nor has anyone displayed an interest in doing
    > anything more than mashing their reply button.
    >
    > > Please reply in plain text,

    >
    > I have directly contact the OP each time that he sends a signed message
    > and asked that he resend the message without the signature. To this date
    > he has failed to do so.
    >
    > --
    > Steve Kostecke
    > NTP Public Services Project - http://support.ntp.org/
    >
    > _______________________________________________
    > questions mailing list
    > questions@lists.ntp.org
    > https://lists.ntp.org/mailman/listinfo/questions
    >


    Sorry, but my e-mail system is configured to sign ALL messages. This is
    simply the policy. I just sent out one that I tried to over-ride that. I
    don't know if it worked as it's not supposed to be "over-rideable". I'm
    doing the same with this one, so it may or may not go out.

    I don't deal with Usenet server software and have not for about 15
    years, so I can't claim to know what "state-of-the-art" is, but dumping
    a message because of it being MIME, a standard that has been supported
    in most mail software for over 15 years, when the type of the mail
    "part" is test/plain, baffles me. If the gateway wants to dump the
    signature, that's no big deal, but I don't know why it would dump the
    text/plain part. The whole idea of signed clear-text was that it would
    "just work" with all software, whether it "knew" anything about PGP or
    even MIME.

    None the less, it's a volunteer effort and I can't really complain
    unless I'm willing to do it, myself, and I'm not. I guess I'll just have
    to live with a lot of folks not seeing my occasional messages.
    --
    R. Kevin Oberman, Network Engineer
    Energy Sciences Network (ESnet)
    Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    E-mail: oberman@es.net Phone: +1 510 486-8634
    Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

  9. Re: Odd (mis)behavior when reference clock fails

    Kevin Oberman wrote:
    > Dave,
    >
    > The gateway chokes on signed mail and all of my mail is signed. As a
    > result, I guess none of my messages will ever make it to the news
    > group. :-(
    >
    > I'm going to TRY to get my mailer to not sign this, but it may not
    > make it either.



    That one looks OK to me Kevin. Why not use the Usenet group if the mail
    gateway is a problem for you?

    Cheers,
    David



  10. Re: Odd (mis)behavior when reference clock fails

    On 2008-09-19, Kevin Oberman wrote:

    > The gateway chokes on signed mail and all of my mail is signed. As a
    > result, I guess none of my messages will ever make it to the news
    > group. :-(
    >
    > I'm going to TRY to get my mailer to not sign this, but it may not make
    > it either.


    I have replied to Kevin via e-mail and we will resolve this issue
    involving his signed messages privately.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  11. Re: Odd (mis)behavior when reference clock fails

    Unruh wrote:
    > hundoj@comcast.net (Rob Neal) writes:
    >
    >>On Tue, 16 Sep 2008, Kevin Oberman wrote:

    >
    >>> We have a fairly large "mesh" of NTP servers spread across the
    >>> US. Almost all have PPS reference clocks and are quite
    >>> accurate. Recently one of the reference clocks located across the county
    >>> seems to have failed. Such is life.
    >>>
    >>> The problem is that the system's time started drifting and eventually
    >>> became far enough out of sync with the mesh to be marked as a bad
    >>> ticker.
    >>>
    >>> The only way I could get the clock to slew or step the time was to edit
    >>> the configuration and comment out the reference clock and PPS. It looks
    >>> like the system will only use the time from a reference clock when and
    >>> if the clock is configured, even if it can't be read.
    >>>
    >>> Is there any way to "fix" this?

    >
    >> What is it that you consider broken? Please clarify.
    >> I've re-read this several times, and don't see the problem.
    >> A reference clock broke. It was disregarded because it chimed
    >> badly.
    >> You expected something different?

    >
    > A hardware clock broke. The computer which was using that hardware clock
    > insisted on using that hardware clock even though it gave no time. It
    > acted as a server, and eventually its time drifted so badly everyone else
    > saw it as a bad chimer.
    > It seems to have had other server lines in the /etc/ntp.conf, but ignored
    > them in favour of a non-working refclock.
    >
    > That is how I interpret what he said, but I may be wrong as well.


    This is also how I understand this.

    Maybe the problem occurred because either the refclock did not report its
    failure state correctly, or ntpd's refclock driver did not pass the fail
    state on to the NTP kernel, so the refclock was not discarded after it
    failed.

    It would be helpful to know the exact NTP version, and which hardware clock
    and refclock driver was used.

    Martin
    --
    Martin Burnicki

    Meinberg Funkuhren
    Bad Pyrmont
    Germany

  12. Re: Odd (mis)behavior when reference clockfails

    > From: Martin Burnicki
    > Date: Tue, 23 Sep 2008 09:34:06 +0200
    > Sender: questions-bounces+oberman=es.net@lists.ntp.org
    >
    >
    > Unruh wrote:
    > > hundoj@comcast.net (Rob Neal) writes:
    > >
    > >>On Tue, 16 Sep 2008, Kevin Oberman wrote:

    > >
    > >>> We have a fairly large "mesh" of NTP servers spread across the
    > >>> US. Almost all have PPS reference clocks and are quite
    > >>> accurate. Recently one of the reference clocks located across the county
    > >>> seems to have failed. Such is life.
    > >>>
    > >>> The problem is that the system's time started drifting and eventually
    > >>> became far enough out of sync with the mesh to be marked as a bad
    > >>> ticker.
    > >>>
    > >>> The only way I could get the clock to slew or step the time was to edit
    > >>> the configuration and comment out the reference clock and PPS. It looks
    > >>> like the system will only use the time from a reference clock when and
    > >>> if the clock is configured, even if it can't be read.
    > >>>
    > >>> Is there any way to "fix" this?

    > >
    > >> What is it that you consider broken? Please clarify.
    > >> I've re-read this several times, and don't see the problem.
    > >> A reference clock broke. It was disregarded because it chimed
    > >> badly.
    > >> You expected something different?

    > >
    > > A hardware clock broke. The computer which was using that hardware clock
    > > insisted on using that hardware clock even though it gave no time. It
    > > acted as a server, and eventually its time drifted so badly everyone else
    > > saw it as a bad chimer.
    > > It seems to have had other server lines in the /etc/ntp.conf, but ignored
    > > them in favour of a non-working refclock.
    > >
    > > That is how I interpret what he said, but I may be wrong as well.

    >
    > This is also how I understand this.
    >
    > Maybe the problem occurred because either the refclock did not report its
    > failure state correctly, or ntpd's refclock driver did not pass the fail
    > state on to the NTP kernel, so the refclock was not discarded after it
    > failed.
    >
    > It would be helpful to know the exact NTP version, and which hardware clock
    > and refclock driver was used.


    Martin,

    It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a EndRun
    Tech CDMA clock using the TrueTime driver. When the system was running,
    ntpq claimed no successful polls of the reference clock or the PPS. It
    was getting good responses from other systems, but not syncing to
    them. The offset started small after the clock failed, about .003, and
    steadily grew to over 5 ms. The reference clock always showed a zero
    reachability, delay and offset and .001 jitter.

    Here is my configuration:
    server 127.127.5.1 prefer minpoll 4 maxpoll 4
    fudge 127.127.5.1 refid CDMA
    fudge 127.127.5.1 time1 .011
    server 127.127.22.1 minpoll 4 maxpoll 4
    fudge 127.127.22.1 flag3 1
    peer time1-owamp.es.net iburst key 2
    peer time2-owamp.es.net iburst key 2
    peer time3-owamp.es.net iburst key 2
    peer time4-owamp.es.net iburst key 2
    peer time5-owamp.es.net iburst key 2
    peer time6-owamp.es.net iburst key 2
    peer time7-owamp.es.net iburst key 2
    peer time8-owamp.es.net iburst key 2
    peer time9-owamp.es.net iburst key 2
    peer time10-owamp.es.net iburst key 2
    peer time11-owamp.es.net iburst key 2
    peer time12-owamp.es.net iburst key 2

    All peers are identical systems with CDMA clocks. All are firewalled so
    that they are not publicly visible.

    Here is the ntpq -p output after restoring the reference clock to the
    config and letting it run for a few minutes. Drift is already
    significant!
    # ntpq -p
    remote refid st t when poll reach delay offset jitter
    ================================================== ============================
    TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    -time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    *time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    -time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    +time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058
    +time5-owamp.es. .PPS. 1 u 45 64 177 23.907 -10.406 0.014
    -time6-owamp.es. .PPS. 1 u 46 64 177 14.790 -10.340 0.062
    -time7-owamp.es. .PPS. 1 u 50 64 73 25.160 -10.381 0.022
    -time8-owamp.es. .PPS. 1 u 27 64 177 27.378 -10.388 0.054
    -time9-owamp.es. .PPS. 1 u 43 64 177 75.571 -10.118 0.067
    +time10-owamp.es .PPS. 1 u 47 64 177 24.068 -10.401 0.048
    -time11-owamp.es .PPS. 1 u 35 64 177 74.542 -10.314 0.035
    -time12-owamp.es .PPS. 1 u 49 64 176 7.224 -10.361 0.036
    --
    R. Kevin Oberman, Network Engineer
    Energy Sciences Network (ESnet)
    Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    E-mail: oberman@es.net Phone: +1 510 486-8634
    Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

  13. Re: Odd (mis)behavior when reference clock fails

    On 2008-09-23, Kevin Oberman wrote:

    > [---=| Quote block shrinked by t-prot: 40 lines snipped |=---]
    >
    >> It would be helpful to know the exact NTP version, and which hardware clock
    >> and refclock driver was used.

    >
    > It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a EndRun
    > Tech CDMA clock using the TrueTime driver. When the system was running,
    > ntpq claimed no successful polls of the reference clock or the PPS. It
    > was getting good responses from other systems, but not syncing to
    > them.


    The ntpq peer billboard you posted shows that ntpd _has_ chosen another
    system as the sys_peer. See below.

    > The offset started small after the clock failed, about .003, and
    > steadily grew to over 5 ms. The reference clock always showed a zero
    > reachability, delay and offset and .001 jitter.


    ntpd has not received any data from the ref-clock. That's one problem.
    You may want to check the CDMA clock to make sure that it is actually
    working.

    The increasing drift is another issue.

    > Here is the ntpq -p output after restoring the reference clock to the
    > config and letting it run for a few minutes. Drift is already
    > significant!
    > # ntpq -p
    > remote refid st t when poll reach delay offset jitter
    >================================================== ============================
    > TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    > PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    > -time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    > *time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    > -time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    > +time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058


    Is there something about this system which is different from the other
    time servers?

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  14. Re: Odd (mis)behavior when reference clockfails

    > From: Steve Kostecke
    > Date: 23 Sep 2008 16:07:44 GMT
    > Sender: questions-bounces+oberman=es.net@lists.ntp.org
    >
    >
    > On 2008-09-23, Kevin Oberman wrote:
    >
    > > [---=| Quote block shrinked by t-prot: 40 lines snipped |=---]
    > >
    > >> It would be helpful to know the exact NTP version, and which hardware clock
    > >> and refclock driver was used.

    > >
    > > It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a EndRun
    > > Tech CDMA clock using the TrueTime driver. When the system was running,
    > > ntpq claimed no successful polls of the reference clock or the PPS. It
    > > was getting good responses from other systems, but not syncing to
    > > them.

    >
    > The ntpq peer billboard you posted shows that ntpd _has_ chosen another
    > system as the sys_peer. See below.


    Yes, that is quite clear.

    > > The offset started small after the clock failed, about .003, and
    > > steadily grew to over 5 ms. The reference clock always showed a zero
    > > reachability, delay and offset and .001 jitter.

    >
    > ntpd has not received any data from the ref-clock. That's one problem.
    > You may want to check the CDMA clock to make sure that it is actually
    > working.


    It is NOT working. That is what started this whole thing. The clock
    failed and time started drifting even though it had lots of peers with
    working clocks. (The system in question is about 5000 kilometers away
    from me.) Except for the time drift, ntp seemed to be working fine. It
    just is not drifting the systems time and I don't understand why.

    > The increasing drift is another issue.
    >
    > > Here is the ntpq -p output after restoring the reference clock to the
    > > config and letting it run for a few minutes. Drift is already
    > > significant!
    > > # ntpq -p
    > > remote refid st t when poll reach delay offset jitter
    > >================================================== ============================
    > > TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    > > PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    > > -time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    > > *time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    > > -time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    > > +time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058

    >
    > Is there something about this system which is different from the other
    > time servers?


    As stated, all of the servers are identical in terms of hardware and
    software and configuration. The only differences in the ntp.conf is that
    each system is missing the entry for itself.
    --
    R. Kevin Oberman, Network Engineer
    Energy Sciences Network (ESnet)
    Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    E-mail: oberman@es.net Phone: +1 510 486-8634
    Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

  15. Re: Odd (mis)behavior when reference clock fails

    On 2008-09-23, Kevin Oberman wrote:

    >"Steve Kostecke" wrote:
    >
    >> On 2008-09-23, Kevin Oberman wrote:
    >>
    >> > [---=| Quote block shrinked by t-prot: 40 lines snipped |=---]
    >> >
    >> >> It would be helpful to know the exact NTP version, and which
    >> >> hardware clock and refclock driver was used.
    >> >
    >> > It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a
    >> > EndRun Tech CDMA clock using the TrueTime driver. When the system
    >> > was running, ntpq claimed no successful polls of the reference
    >> > clock or the PPS. It was getting good responses from other systems,
    >> > but not syncing to them.

    >>
    >> The ntpq peer billboard you posted shows that ntpd _has_ chosen
    >> another system as the sys_peer. See below.

    >
    > Yes, that is quite clear.


    The sys_peer is the time source that this ntpd is "synced" to.

    >> > remote refid st t when poll reach delay offset jitter
    >> >================================================== ===============
    >> >TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    >> >PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    >> >-time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    >> >*time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    >> >-time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    >> >+time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058

    >>
    >> Is there something about this system which is different from the other
    >> time servers?

    >
    > As stated, all of the servers are identical in terms of hardware and
    > software and configuration. The only differences in the ntp.conf is that
    > each system is missing the entry for itself.


    It is possible that there may a problem with this hardware. Does the
    clock always drift in the same direction? Do you see periodic clock
    steps? You may need to adjust the kernel's tick.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  16. Re: Odd (mis)behavior when reference clock fails

    Steve Kostecke writes:

    >On 2008-09-23, Kevin Oberman wrote:


    >>"Steve Kostecke" wrote:
    >>
    >>> On 2008-09-23, Kevin Oberman wrote:
    >>>
    >>> > [---=| Quote block shrinked by t-prot: 40 lines snipped |=---]
    >>> >
    >>> >> It would be helpful to know the exact NTP version, and which
    >>> >> hardware clock and refclock driver was used.
    >>> >
    >>> > It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a
    >>> > EndRun Tech CDMA clock using the TrueTime driver. When the system
    >>> > was running, ntpq claimed no successful polls of the reference
    >>> > clock or the PPS. It was getting good responses from other systems,
    >>> > but not syncing to them.
    >>>
    >>> The ntpq peer billboard you posted shows that ntpd _has_ chosen
    >>> another system as the sys_peer. See below.

    >>
    >> Yes, that is quite clear.


    >The sys_peer is the time source that this ntpd is "synced" to.


    >>> > remote refid st t when poll reach delay offset jitter
    >>> >================================================== ===============
    >>> >TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    >>> >PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    >>> >-time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    >>> >*time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    >>> >-time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    >>> >+time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058
    >>>
    >>> Is there something about this system which is different from the other
    >>> time servers?

    >>
    >> As stated, all of the servers are identical in terms of hardware and
    >> software and configuration. The only differences in the ntp.conf is that
    >> each system is missing the entry for itself.


    >It is possible that there may a problem with this hardware. Does the
    >clock always drift in the same direction? Do you see periodic clock
    >steps? You may need to adjust the kernel's tick.


    That is precisely what ntp is supposed to do for you.
    Unless you are suggesting that the clock drift is greater than 500PPM. He
    could look at the drift rate (adjtimex -p and look at the frequency.
    The quoted one is in weird units.
    It looks line frequency/14555 is PPM
    If this is near 500 then you are introuble and have very bad hardware.



    >--
    >Steve Kostecke
    >NTP Public Services Project - http://support.ntp.org/


  17. Re: Odd (mis)behavior when reference clock fails

    >It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a EndRun
    >Tech CDMA clock using the TrueTime driver. When the system was running,
    >ntpq claimed no successful polls of the reference clock or the PPS. It
    >was getting good responses from other systems, but not syncing to
    >them. The offset started small after the clock failed, about .003, and
    >steadily grew to over 5 ms. The reference clock always showed a zero
    >reachability, delay and offset and .001 jitter.


    Do you have a good collection of old log files?

    How about deleting the drift file and restarting ntpd?

    I've seen ntpd get confused when the drift on the system it's running
    on changes by a big step. I can't think of why a clock dying would
    cause that sort of change, but I don't have any better suggestions.

    I have a Linux box that changes by almost 200 PPM when I turn a
    program on/off. That program does a lot of serial port activity.
    (It's polling a UPS as fast as it can.) I haven't tracked it down.
    I hope FreeBSD doesn't have quirks like that.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  18. Re: Odd (mis)behavior when reference clockfails

    > From: Unruh
    > Date: Tue, 23 Sep 2008 20:05:48 GMT
    > Sender: questions-bounces+oberman=es.net@lists.ntp.org
    >
    >
    > Steve Kostecke writes:
    >
    > >On 2008-09-23, Kevin Oberman wrote:

    >
    > >>"Steve Kostecke" wrote:
    > >>
    > >>> On 2008-09-23, Kevin Oberman wrote:
    > >>>
    > >>> > [---=| Quote block shrinked by t-prot: 40 lines snipped |=---]
    > >>> >
    > >>> >> It would be helpful to know the exact NTP version, and which
    > >>> >> hardware clock and refclock driver was used.
    > >>> >
    > >>> > It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a
    > >>> > EndRun Tech CDMA clock using the TrueTime driver. When the system
    > >>> > was running, ntpq claimed no successful polls of the reference
    > >>> > clock or the PPS. It was getting good responses from other systems,
    > >>> > but not syncing to them.
    > >>>
    > >>> The ntpq peer billboard you posted shows that ntpd _has_ chosen
    > >>> another system as the sys_peer. See below.
    > >>
    > >> Yes, that is quite clear.

    >
    > >The sys_peer is the time source that this ntpd is "synced" to.

    >
    > >>> > remote refid st t when poll reach delay offset jitter
    > >>> >================================================== ===============
    > >>> >TRUETIME(1) .CDMA. 0 l - 16 0 0.000 0.000 0.001
    > >>> >PPS(1) .PPS. 0 l - 16 0 0.000 0.000 0.001
    > >>> >-time1-owamp.es. .PPS. 1 u 17 64 177 2.058 -10.335 0.038
    > >>> >*time2-owamp.es. .PPS. 1 u 49 64 177 24.556 -10.408 0.020
    > >>> >-time3-owamp.es. .PPS. 1 u 63 64 176 55.640 -10.337 0.049
    > >>> >+time4-owamp.es. .PPS. 1 u 59 64 176 20.770 -10.405 0.058
    > >>>
    > >>> Is there something about this system which is different from the other
    > >>> time servers?
    > >>
    > >> As stated, all of the servers are identical in terms of hardware and
    > >> software and configuration. The only differences in the ntp.conf is that
    > >> each system is missing the entry for itself.

    >
    > >It is possible that there may a problem with this hardware. Does the
    > >clock always drift in the same direction? Do you see periodic clock
    > >steps? You may need to adjust the kernel's tick.

    >
    > That is precisely what ntp is supposed to do for you.
    > Unless you are suggesting that the clock drift is greater than 500PPM. He
    > could look at the drift rate (adjtimex -p and look at the frequency.
    > The quoted one is in weird units.
    > It looks line frequency/14555 is PPM
    > If this is near 500 then you are introuble and have very bad hardware.


    I still suspect that you are missing some things I mentioned about the
    problem. First, the system syncs to ntp just fine as long as I comment
    the reference clocks (truetime and PPS) out. If I simply list all of the
    remote ntp servers, it works fine. When the CDMA clock was receiving
    time, all was well, too.

    Time is very stable and I see the drift at 93.576 ppm. It's a FreeBSD
    system, so no adjtimex command. I used ntptime to get the frequency
    information.

    Here is a simple table of operation:
    Refclock refclock refclock not
    working in ntp.conf in ntp.conf
    YES syncs syncs
    NO drifts syncs

    The only case of failure is when the reference clock is in the ntp.conf
    file and the reference clock is not providing good time. When I say "not
    providing time, it is providing time that is pretty close, but it is
    marked "No Satellite Lock". It maybe that it is, somehow, still syncing
    to the time from that clock, but it is claiming not to have read the
    clock at all.

    I am very suspicious as the offset I was seeing was only about 5 ms last
    week and is 10 ms today. (I have been running with the reference clock
    commented out of ntp.conf for that time.)

    Another thought...could it be PPS that is causing it? After all, the pin
    on the bulkhead connector is still getting the PPS signal. I am using the
    kernel PPS implementation, so could that be training the kernel even
    though ntp is not using it?
    --
    R. Kevin Oberman, Network Engineer
    Energy Sciences Network (ESnet)
    Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
    E-mail: oberman@es.net Phone: +1 510 486-8634
    Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

  19. Re: Odd (mis)behavior when reference clock fails

    On 2008-09-23, Unruh wrote:

    > That is precisely what ntp is supposed to do for you.
    > Unless you are suggesting that the clock drift is greater than 500PPM.


    Unidirectional drift and periodic clock steps are a good indicator that
    the clock drift exceeds ntpd's correction capabilities. In these cases
    adjusting the tick to bring the clock drift within +/- 500PPM is
    sometimes the solution.

    As suggested elsewhere in this thread stopping ntpd, deleting the
    drift.file, and starting ntpd is another potential solution.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  20. Re: Odd (mis)behavior when reference clock fails

    hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray) writes:

    >>It's 4.2.4p4 running on FreeBSD 7.0. The reference clock is a EndRun
    >>Tech CDMA clock using the TrueTime driver. When the system was running,
    >>ntpq claimed no successful polls of the reference clock or the PPS. It
    >>was getting good responses from other systems, but not syncing to
    >>them. The offset started small after the clock failed, about .003, and
    >>steadily grew to over 5 ms. The reference clock always showed a zero
    >>reachability, delay and offset and .001 jitter.


    >Do you have a good collection of old log files?


    >How about deleting the drift file and restarting ntpd?


    >I've seen ntpd get confused when the drift on the system it's running
    >on changes by a big step. I can't think of why a clock dying would
    >cause that sort of change, but I don't have any better suggestions.


    >I have a Linux box that changes by almost 200 PPM when I turn a
    >program on/off. That program does a lot of serial port activity.
    >(It's polling a UPS as fast as it can.) I haven't tracked it down.


    Why would you poll a UPS as fast as you can? This sounds like the old
    joke-- Doctor I have a real headache-- What do you do?-- Before it hurts I
    bash my head against a brick wall.

    It sounds like you are losing timer interrupts.


    >I hope FreeBSD doesn't have quirks like that.


    >--
    >These are my opinions, not necessarily my employer's. I hate spam.



+ Reply to Thread
Page 1 of 2 1 2 LastLast