Why false tickers one day, and not the next day? - NTP

This is a discussion on Why false tickers one day, and not the next day? - NTP ; In article , dromedaryl@yahoo.com wrote: > Node-A's ntp.conf: > server 210.173.160.27 # external NTP server > server node-B iburst # other node > server 127.127.1.1 > fudge 127.127.1.1 stratum 9 > driftfile /etc/ntp.drift > Node-B's ntp.conf: > server 210.173.160.27 # ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: Why false tickers one day, and not the next day?

  1. Re: Why false tickers one day, and not the next day?

    In article <923d72cd-b5ea-4a7b-945a-59ad9a310ce4@i12g2000prf.googlegroups.com>,
    dromedaryl@yahoo.com wrote:

    > Node-A's ntp.conf:
    > server 210.173.160.27 # external NTP server
    > server node-B iburst # other node
    > server 127.127.1.1
    > fudge 127.127.1.1 stratum 9
    > driftfile /etc/ntp.drift


    > Node-B's ntp.conf:
    > server 210.173.160.27 # external NTP server
    > server node-A iburst # other node


    I don't have time to analyze this very carefully, but initial responses are:

    The correct way of doing this is using peer, not server.

    > server 127.127.1.1


    However, if you have more than one machine with a local clock, they should
    be in a strict hiearchy, so node A should not peer with node B. Circularity
    can result in a mutual appreciation society.

    If you must have circularity and local clocks, it is essential
    that each server sees more real clocks than local clocks, so that the local
    clock derived time can always be outvoted. I think it is advisable to
    outvote the local clock, even without circularity.

    Note. You should ask youself whether you really expect to lose the
    external time servers for long enough that the two local servers will
    drift apart by an unacceptable amount. That typically means several
    hours of downtime to over a day. If not, local clocks offer no benefit
    and introduce risks.

    > fudge 127.127.1.1 stratum 11



    > There is nothing logged by ntpd.


    You have a problem with your logging.

    > The nodes' drifts are high:
    > # cat /etc/ntp.drift
    > node-A: 499.206
    > node-B: 497.070


    These are end stop values. It might be worth checking whether this is
    simply the result of the blind leading the blind, or whether the software
    clocks are this bad, when uncorrected. If the clocks are good, you may
    need to repair the drift files to stand some chance of getting a good lock.

    > The nodes and the external time server are in Asia. I have an
    > identically setup cluster in North America using the same Asian time
    > server, and that cluster has no problem keeping the Asian server as a


    The root delay will be larger, which means that the local clock derived
    times can differ more from the real time before the real time gets voted
    out. The local clock mechanism was designed for cases where the softwar
    clock was disciplined, but not by NTP. As a result they report a zero
    root dispersion, whereas their true root dispersion is unbounded.


  2. Why false tickers one day, and not the next day?

    I'm trying to deterime why I'm having a problem with ntpd marking all
    the servers it contacts as false tickers one day and the next day
    everything is okay. I'm giving the explanation at the top of this
    posting, with the output of various ntpq's below.

    The setup is a two node FreeBSD cluster, node-A and node-B. They are
    on the same subnet and switch.

    Node-A's ntp.conf:
    server 210.173.160.27 # external NTP server
    server node-B iburst # other node
    server 127.127.1.1
    fudge 127.127.1.1 stratum 9
    driftfile /etc/ntp.drift

    Node-B's ntp.conf:
    server 210.173.160.27 # external NTP server
    server node-A iburst # other node
    server 127.127.1.1
    fudge 127.127.1.1 stratum 11
    driftfile /etc/ntp.drift

    The idea is that as the cluster expands, node-A and node-B will be
    time servers for the new nodes. Node-A's local clock has a lower
    stratum value than node-B's so in the case that the cluster loses
    connection to the external server, node-A is the preferred chimer for
    the cluster. If node-A loses its connection to the external server
    (but not to node-B), node-A will use node-B as its server, and vice
    versa.

    What's happening is that things go as expected for a short time with
    node-A and node-B using the external time server as their system peer,
    and using each other as candidate peers.

    But within a few minutes, the external time server gets marked as a
    false ticker by both nodes, and both nodes mark each other as false
    tickers.

    There is nothing logged by ntpd.

    The nodes' drifts are high:
    # cat /etc/ntp.drift
    node-A: 499.206
    node-B: 497.070

    The nodes and the external time server are in Asia. I have an
    identically setup cluster in North America using the same Asian time
    server, and that cluster has no problem keeping the Asian server as a
    peer, despite having a delay of about 120 msecs, nearly a 100 times
    higher than the Asian cluster's delay to the time server.

    The next day, after restarting ntpd on the nodes and resetting
    the time on all nodes with ntpdate, everything worked as
    expected with the time syncing properly, no false tickers, and the
    nodes' drifts are under 30.0. No network changes were made.

    Any idea on what's going on here? What would cause all the servers to
    be marked as false tickers, and then be fine the next day? Is there
    a way to configure ntpd so this won't happen?

    Here's the output of a number of sequential calls to "ntpq -p":

    Just after starting up ntpd:

    node-A: remote refid st t when poll reach
    delay offset jitter
    node-A:
    ================================================== ============================
    node-A: 210.173.160.27 210.173.176.251 2 u 112 256 17
    1.447 -2.810 78.540
    node-A: node-B .INIT. 16 u 273 512 0
    0.000 0.000 4000.00
    node-A: *LOCAL(1) LOCAL(1) 9 l 33 64 77
    0.000 0.000 0.002
    node-B:
    ================================================== ============================
    node-B: 210.173.160.27 210.173.176.251 2 u 101 256 17
    1.358 -4.869 76.916
    node-B: node-A .INIT. 16 u 265 512 0
    0.000 0.000 4000.00
    node-B: *LOCAL(1) LOCAL(1) 11 l 36 64 77
    0.000 0.000 0.002

    A few minutes later, node-A has the external server as a system peer
    and node-B as a candidate peer. But node-B marks the external server
    as a false ticker, and using Node-A as the system peer:

    node-A: remote refid st t when poll reach
    delay offset jitter
    node-A:
    ================================================== ============================
    node-A: *210.173.160.27 210.173.176.251 2 u 290 512 37
    1.447 -2.810 137.124
    node-A: +node-B LOCAL(1) 12 u 181 1024 1 0.109
    -12.652 0.128
    node-A: LOCAL(1) LOCAL(1) 9 l 14 64 377
    0.000 0.000 0.002
    node-B:
    ================================================== ============================
    node-B: x210.173.160.27 210.173.176.251 2 u 278 512 37
    1.358 -4.869 133.904
    node-B: *node-A LOCAL(1) 10 u 172 1024 1
    0.113 12.824 0.099
    node-B: LOCAL(1) LOCAL(1) 11 l 16 64 377
    0.000 0.000 0.002

    A few minutes later. This looks great as it's what's expected:

    node-A: remote refid st t when poll reach
    delay offset jitter
    node-A:
    ================================================== ============================
    node-A: *210.173.160.27 210.173.176.251 2 u 492 512 37
    1.447 -2.810 137.124
    node-A: +node-B LOCAL(1) 12 u 383 1024 1 0.109
    -12.652 0.128
    node-A: LOCAL(1) LOCAL(1) 9 l 24 64 377
    0.000 0.000 0.002
    node-B:
    ================================================== ============================
    node-B: *210.173.160.27 210.173.176.251 2 u 480 512 37
    1.358 -4.869 133.904
    node-B: +node-A LOCAL(1) 10 u 374 1024 1
    0.113 12.824 0.099
    node-B: LOCAL(1) LOCAL(1) 11 l 24 64 377
    0.000 0.000 0.002

    A few minutes later everything becomes a false ticker. The offset to
    the external server has increased dramatically:

    node-A: remote refid st t when poll reach
    delay offset jitter
    node-A:
    ================================================== ============================
    node-A: x210.173.160.27 210.173.176.251 2 u 28 64 377 1.455
    -572.95 354.508
    node-A: xnode-B LOCAL(1) 12 u 624 1024 1 0.109
    -12.652 0.128
    node-A: *LOCAL(1) LOCAL(1) 9 l 2 64 377
    0.000 0.000 0.002
    node-B:
    ================================================== ============================
    node-B: x210.173.160.27 210.173.176.251 2 u 15 64 377 1.529
    -561.00 345.641
    node-B: xnode-A LOCAL(1) 10 u 614 1024 1
    0.113 12.824 0.099
    node-B: *LOCAL(1) LOCAL(1) 11 l 9 64 377
    0.000 0.000 0.002

    It also appears that when node-B polls the external server and decides
    to mark it as a false ticker, it also decided to change node-A from a
    candidate to false ticker, despite not polling it.

    node-A: remote refid st t when poll reach
    delay offset jitter
    node-A:
    ================================================== ============================
    node-A: x210.173.160.27 210.173.176.251 2 u 32 64 377 1.455
    -572.95 134.136
    node-A: xnode-B LOCAL(1) 12 u 52 64 3 0.109
    -12.652 4.786
    node-A: *LOCAL(1) LOCAL(1) 9 l 66 64 377
    0.000 0.000 0.002
    node-B:
    ================================================== ============================
    node-B: x210.173.160.27 210.173.176.251 2 u 14 64 377 1.529
    -561.00 130.750
    node-B: xnode-A LOCAL(1) 10 u 41 64 3
    0.113 12.824 4.833
    node-B: *LOCAL(1) LOCAL(1) 11 l 10 64 377
    0.000 0.000 0.002

    Thanks for any help.

    DD

  3. Re: Why false tickers one day, and not the next day?


    >The nodes' drifts are high:
    ># cat /etc/ntp.drift
    >node-A: 499.206
    >node-B: 497.070


    500 ppm is the limit.

    >The next day, after restarting ntpd on the nodes and resetting
    >the time on all nodes with ntpdate, everything worked as
    >expected with the time syncing properly, no false tickers, and the
    >nodes' drifts are under 30.0. No network changes were made.


    There is/was some case where ntpd would get confused and bang
    its head against the limits. It would often recover if you rebooted
    the system or maybe just restarted ntpd.

    I think something in that area was fixed a while ago, but I
    don't remember the details and I could easily be wrong.

    I'm pretty sure you aren't the first person to ask a question like
    that.

    What version of ntpd are you using? Can you easily upgrade to
    a recent ntp-dev?

    Have you seen that more than once?

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  4. Re: Why false tickers one day, and not the next day?

    On Dec 17, 9:04 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
    Murray) wrote:
    > I think something in that area was fixed a while ago, but I
    > don't remember the details and I could easily be wrong.
    >
    > I'm pretty sure you aren't the first person to ask a question like
    > that.
    >
    > What version of ntpd are you using? Can you easily upgrade to
    > a recent ntp-dev?


    I've been running 4.2.0. I just built 4.2.4 and I will see how that
    works.

    > Have you seen that more than once?


    Yes, numerous times.

    DD

  5. Re: Why false tickers one day, and not the next day?

    dromedaryl@yahoo.com wrote:
    > On Dec 17, 9:04 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
    > Murray) wrote:
    >> I think something in that area was fixed a while ago, but I
    >> don't remember the details and I could easily be wrong.
    >>
    >> I'm pretty sure you aren't the first person to ask a question like
    >> that.
    >>
    >> What version of ntpd are you using? Can you easily upgrade to
    >> a recent ntp-dev?

    >
    > I've been running 4.2.0. I just built 4.2.4 and I will see how that
    > works.
    >
    >> Have you seen that more than once?

    >
    > Yes, numerous times.


    One thing I would recommend is that you change your configuration:

    > Node-A's ntp.conf:
    > server 210.173.160.27 # external NTP server
    > server node-B iburst # other node
    > server 127.127.1.1
    > fudge 127.127.1.1 stratum 9
    > driftfile /etc/ntp.drift
    >
    > Node-B's ntp.conf:
    > server 210.173.160.27 # external NTP server
    > server node-A iburst # other node
    > server 127.127.1.1
    > fudge 127.127.1.1 stratum 11
    > driftfile /etc/ntp.drift


    You are pointing to a single external server AND the other node as a
    server. In this kind of configuration you should use peer for the other
    server. Nominally this is a two server configuration which is the worst
    sort. In reality you have only one but it has no way of telling that.
    What I would recommend is to have 3 external servers and peer for the
    other node:
    peer node-B iburst
    and it should work the way you expect. peers expect to work at the same
    stratum.

    Try that and let us know.

    Danny

+ Reply to Thread