NTP sync problems - NTP

This is a discussion on NTP sync problems - NTP ; Hello, My name is Martin Tengklint and I have an NTP problem that I have solved. However, I cannot explain why it didn't work with the original configuration. Is there anyone here that can help me understand the logic of ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: NTP sync problems

  1. NTP sync problems

    Hello,

    My name is Martin Tengklint and I have an NTP problem that I have
    solved. However, I cannot explain why it didn't work with the original
    configuration. Is there anyone here that can help me understand the
    logic of NTP in my case explained below?

    The topology looks like this:

    Ext.NTP Server A
    |
    |
    Ext.NTP Server B Ext.NTP Server C
    | |
    |---------------------------------------|
    |
    |
    NTP Server D
    |
    |
    NTP Client E

    The problem is that my NTP client E rejected its selected NTP server
    D, which lead to not syncing, leading to offset drifting on NTP Client
    E. I think I have located the lack of sync to a too large "root
    dispersion" value sent from the NTP server D. Its value is 1991 as
    seen below:

    # ntpq -c"rv 51316"
    status=9014 reach, conf, 1 event, event_reach,
    srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
    leap=00, stratum=2, precision=-16, rootdelay=1.785,
    rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,
    hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
    offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
    reftime=cc0328d1.feabf9bf Wed, Jun 18 2008 9:25:21.994,
    org=cc0329cb.5b962c81 Wed, Jun 18 2008 9:29:31.357,
    rec=cc031c40.f62d86e1 Wed, Jun 18 2008 8:31:44.961,
    xmt=cc031c40.f5f9b77c Wed, Jun 18 2008 8:31:44.960,
    filtdelay= 0.57 0.53 0.57 0.52 0.56 0.68 0.52
    1.11,
    filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
    3466123,
    filtdisp= 0.03 0.98 1.95 2.93 3.92 4.86 5.81
    6.77

    Upon looking at ntpq -c "as" command on the Client E, the server is in
    condition reject, most likely due to the high root dispersion.
    Correct?

    # ntpq -c"as"

    ind assID status conf reach auth condition last_event cnt
    ================================================== =========
    1 51316 9014 yes yes none reject reachable 1

    The problem exists when having the NTP server D to sync with an
    external NTP server C (stratum 1) having its own system clock as
    reference.

    On NTP Server D:

    # ntpq -c "as"
    ind assID status conf reach auth condition last_event cnt
    ================================================== =========
    1 62852 9414 yes yes none candidat reachable 1
    2 62853 9614 yes yes none sys.peer reachable 1

    Upon looking in more detail at the two associations above:

    # ntpq -c "rv 62853"
    status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
    srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
    leap=00, stratum=1, precision=-17, rootdelay=0.000,
    rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
    pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
    delay=1.226, dispersion=14.849, jitter=224.514,
    reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
    org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
    rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
    xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
    filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
    filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
    -921.92,
    filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59

    # ntpq -c "rv 62852"
    status=9414 reach, conf, sel_candidat, 1 event, event_reach,
    srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
    leap=00, stratum=2, precision=-17, rootdelay=6.454,
    rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
    hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
    offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
    reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
    org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
    rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
    xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
    filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
    filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
    1150.61,
    filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63

    ....I can see that the one selected (NTP server C, i.e. AssId: 62853)
    has a ref.id of LCL (meaning it is syncing to its local system clock?)
    while the other one, the candidate (NTP server B, stratum 2) is having
    NTP server A as ref.id, meaning syncing it syncs to NTP server A.

    Again, when having NTP server D to primarily sync with NTP server C,
    the "root dispersion" apparently gets too high, while having the NTP
    server D to sync with NTP server B is fixing the problem.

    My question is why the root dispersion becomes too high upon syncing
    to an external server having its own local system clock as reference
    (i.e. NTP server C)?

    Many thanks in advance!

    /eztoril

  2. Re: NTP sync problems

    martin.tengklint@spray.se wrote:
    >
    > The topology looks like this:
    >
    > Ext.NTP Server A
    > |
    > |
    > Ext.NTP Server B Ext.NTP Server C
    > | |
    > |---------------------------------------|
    > |
    > |
    > NTP Server D
    > |
    > |
    > NTP Client E
    >
    > The problem is that my NTP client E rejected its selected NTP server
    > D, which lead to not syncing, leading to offset drifting on NTP Client
    > E. I think I have located the lack of sync to a too large "root
    > dispersion" value sent from the NTP server D. Its value is 1991 as
    > seen below:
    >
    > # ntpq -c"rv 51316"
    > status=9014 reach, conf, 1 event, event_reach,
    > srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
    > leap=00, stratum=2, precision=-16, rootdelay=1.785,
    > rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,


    Yup. rootdispersion is high enough for rejection.

    > hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
    > offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
    > reftime=cc0328d1.feabf9bf Wed, Jun 18 2008 9:25:21.994,
    > org=cc0329cb.5b962c81 Wed, Jun 18 2008 9:29:31.357,
    > rec=cc031c40.f62d86e1 Wed, Jun 18 2008 8:31:44.961,
    > xmt=cc031c40.f5f9b77c Wed, Jun 18 2008 8:31:44.960,
    > filtdelay= 0.57 0.53 0.57 0.52 0.56 0.68 0.52
    > 1.11,
    > filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
    > 3466123,


    This exceeds the panic threshold, so, unless this is first time and you
    have -g, NTP will abort if accepts this offset.

    > filtdisp= 0.03 0.98 1.95 2.93 3.92 4.86 5.81
    > 6.77
    >
    > Upon looking at ntpq -c "as" command on the Client E, the server is in
    > condition reject, most likely due to the high root dispersion.
    > Correct?
    >
    > # ntpq -c"as"
    >
    > ind assID status conf reach auth condition last_event cnt
    > ================================================== =========
    > 1 51316 9014 yes yes none reject reachable 1
    >
    > The problem exists when having the NTP server D to sync with an
    > external NTP server C (stratum 1) having its own system clock as
    > reference.
    >
    > On NTP Server D:
    >
    > # ntpq -c "as"
    > ind assID status conf reach auth condition last_event cnt
    > ================================================== =========
    > 1 62852 9414 yes yes none candidat reachable 1
    > 2 62853 9614 yes yes none sys.peer reachable 1
    >
    > Upon looking in more detail at the two associations above:
    >
    > # ntpq -c "rv 62853"
    > status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
    > srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
    > leap=00, stratum=1, precision=-17, rootdelay=0.000,
    > rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
    > pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
    > delay=1.226, dispersion=14.849, jitter=224.514,
    > reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
    > org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
    > rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
    > xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
    > filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
    > filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
    > -921.92,
    > filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59
    >
    > # ntpq -c "rv 62852"
    > status=9414 reach, conf, sel_candidat, 1 event, event_reach,
    > srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
    > leap=00, stratum=2, precision=-17, rootdelay=6.454,
    > rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
    > hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
    > offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
    > reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
    > org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
    > rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
    > xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
    > filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
    > filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
    > 1150.61,


    Note that the two servers differ by more than two seconds. I'm not sure
    why they aren't both rejected as false tickers (in systems with LCL
    clocks, it is important to be able to outvote the local clock with
    enough real clocks, and one is far too few to do that!

    I think rv 0 on D would be instructive, but it looks to me as though D
    is either rejecting both C and B, or it is trying to jump between them
    and the resulting huge jitter is causing the root dispersion to go
    through the roof. (Rather than jumping, it may be using one and
    rejecting the other in its popcorn filter.)

    > filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63
    >
    > ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
    > has a ref.id of LCL (meaning it is syncing to its local system clock?)


    LCL is local clock, which means that any reference clock it actually has
    is broken.

    Both are selected. The one with the lowest stratum gets to donate its
    stratum and quality data, but they are both survivors, and both will be
    used to calculate the time.

    I would consider a server claiming to sync to LCL and having stratum 1
    to be badly misconfigured. Undisciplined local clocks should always
    have the highest stratum that just works, so that they are last choice
    and don't propagate too far. The default for LCL is maybe OK if the
    machine is accurately synchronised by some non-NTP means and steps are
    taken to disable NTP if that source fails. Going lower than the default
    really is a bad idea, and the fact that it is lower than you non-LCL
    server is why you have the anomaly here.


    > while the other one, the candidate (NTP server B, stratum 2) is having
    > NTP server A as ref.id, meaning syncing it syncs to NTP server A.
    >
    > Again, when having NTP server D to primarily sync with NTP server C,
    > the "root dispersion" apparently gets too high, while having the NTP
    > server D to sync with NTP server B is fixing the problem.
    >
    > My question is why the root dispersion becomes too high upon syncing
    > to an external server having its own local system clock as reference
    > (i.e. NTP server C)?


    Because C and B are not getting times traceable to the same source and
    there isn't an X and Y synchronised to the same source as B, to outvote C.

  3. Re: NTP sync problems

    On Jul 1, 10:57*pm, David Woolley
    wrote:
    > martin.tengkl...@spray.se wrote:
    >
    > > The topology looks like this:

    >
    > > Ext.NTP Server A
    > > * * * * *|
    > > * * * * *|
    > > Ext.NTP Server B * * * * * Ext.NTP Server C
    > > * * * * *| * * * * * * * * * * * * ** * * * * * |
    > > * * * * *|---------------------------------------|
    > > * * * * * * * * * * * * * * * |
    > > * * * * * * * * * * * * * * * |
    > > * * * * * * * * * * *NTP Server D
    > > * * * * * * * * * * * * * * * |
    > > * * * * * * * * * * * * * * * |
    > > * * * * * * * * * * * NTP Client E

    >
    > > The problem is that my NTP client E rejected its selected NTP server
    > > D, which lead to not syncing, leading to offset drifting on NTP Client
    > > E. I think I have located the lack of sync to a too large "root
    > > dispersion" value sent from the NTP server D. Its value is 1991 as
    > > seen below:

    >
    > > # ntpq -c"rv 51316"
    > > status=9014 reach, conf, 1 event, event_reach,
    > > srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
    > > leap=00, stratum=2, precision=-16, rootdelay=1.785,
    > > rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,

    >
    > Yup. *rootdispersion is high enough for rejection.
    >
    > > hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
    > > offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
    > > reftime=cc0328d1.feabf9bf *Wed, Jun 18 2008 *9:25:21.994,
    > > org=cc0329cb.5b962c81 *Wed, Jun 18 2008 *9:29:31.357,
    > > rec=cc031c40.f62d86e1 *Wed, Jun 18 2008 *8:31:44.961,
    > > xmt=cc031c40.f5f9b77c *Wed, Jun 18 2008 *8:31:44.960,
    > > filtdelay= * * 0.57 * *0.53 * *0.57 * *0.52 * *0.56 * *0.68 * *0.52
    > > 1.11,
    > > filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
    > > 3466123,

    >
    > This exceeds the panic threshold, so, unless this is first time and you
    > have -g, NTP will abort if accepts this offset.
    >
    >
    >
    >
    >
    > > filtdisp= * * *0.03 * *0.98 * *1.95 * *2.93 * *3..92 * *4.86 * *5.81
    > > 6.77

    >
    > > Upon looking at ntpq -c "as" command on the Client E, the server is in
    > > condition reject, most likely due to the high root dispersion.
    > > Correct?

    >
    > > # ntpq -c"as"

    >
    > > ind assID status *conf reach auth condition *last_event cnt
    > > ================================================== =========
    > > * 1 51316 *9014 * yes * yes *none * *reject * reachable *1

    >
    > > The problem exists when having the NTP server D to sync with an
    > > external NTP server C (stratum 1) having its own system clock as
    > > reference.

    >
    > > On NTP Server D:

    >
    > > # ntpq -c "as"
    > > ind assID status conf reach auth condition last_event cnt
    > > ================================================== =========
    > > 1 62852 9414 yes yes none candidat reachable 1
    > > 2 62853 9614 yes yes none sys.peer reachable 1

    >
    > > Upon looking in more detail at the two associations above:

    >
    > > *# ntpq -c "rv 62853"
    > > status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
    > > srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
    > > leap=00, stratum=1, precision=-17, rootdelay=0.000,
    > > rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
    > > pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
    > > delay=1.226, dispersion=14.849, jitter=224.514,
    > > reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
    > > org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
    > > rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
    > > xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
    > > filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
    > > filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
    > > -921.92,
    > > filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59

    >
    > > # ntpq -c "rv 62852"
    > > status=9414 reach, conf, sel_candidat, 1 event, event_reach,
    > > srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
    > > leap=00, stratum=2, precision=-17, rootdelay=6.454,
    > > rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
    > > hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
    > > offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
    > > reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
    > > org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
    > > rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
    > > xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
    > > filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
    > > filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
    > > 1150.61,

    >
    > Note that the two servers differ by more than two seconds. *I'm not sure
    > why they aren't both rejected as false tickers (in systems with LCL
    > clocks, it is important to be able to outvote the local clock with
    > enough real clocks, and one is far too few to do that!
    >
    > I think rv 0 on D would be instructive, but it looks to me as though D
    > is either rejecting both C and B, or it is trying to jump between them
    > and the resulting huge jitter is causing the root dispersion to go
    > through the roof. *(Rather than jumping, it may be using one and
    > rejecting the other in its popcorn filter.)
    >
    > > filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63

    >
    > > ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
    > > has a ref.id of LCL (meaning it is syncing to its local system clock?)

    >
    > LCL is local clock, which means that any reference clock it actually has
    > is broken.
    >
    > Both are selected. *The one with the lowest stratum gets to donate its
    > stratum and quality data, but they are both survivors, and both will be
    > used to calculate the time.
    >
    > I would consider a server claiming to sync to LCL and having stratum 1
    > to be badly misconfigured. *Undisciplined local clocks should always
    > have the highest stratum that just works, so that they are last choice
    > and don't propagate too far. *The default for LCL is maybe OK if the
    > machine is accurately synchronised by some non-NTP means and steps are
    > taken to disable NTP if that source fails. *Going lower than the default
    > really is a bad idea, and the fact that it is lower than you non-LCL
    > server is why you have the anomaly here.
    >
    > > while the other one, the candidate (NTP server B, stratum 2) is having
    > > NTP server A as ref.id, meaning syncing it syncs to NTP server A.

    >
    > > Again, when having NTP server D to primarily sync with NTP server C,
    > > the "root dispersion" apparently gets too high, while having the NTP
    > > server D to sync with NTP server B is fixing the problem.

    >
    > > My question is why the root dispersion becomes too high upon syncing
    > > to an external server having its own local system clock as reference
    > > (i.e. NTP server C)?

    >
    > Because C and B are not getting times traceable to the same source and
    > there isn't an X and Y synchronised to the same source as B, to outvote C.- Hide quoted text -
    >
    > - Show quoted text -- Hide quoted text -
    >
    > - Show quoted text -


    Ok, thanks for the quick reply!

    Just to clarify even more. Please correct me if I'm wrong:

    Because B and C are not getting their times traceable to the same
    source, NTP on D have difficulties to choose between these two time
    sources (as seen, B and C differs more than 2 secs). They are both
    survivors and both are used in time calculation, due to lack of reason
    to outvote C.

    The one with the lowest stratum (i.e C) gets to donate its quality
    data, including a hugh jitter, resulting in root dispersion to go
    through the roof. And a high root dispersion value gets NTP on E to
    reject NTP on D.

    Correct?

    BR,
    Martin

  4. Re: NTP sync problems

    On Jul 2, 10:07*am, martin.tengkl...@spray.se wrote:
    > On Jul 1, 10:57*pm, David Woolley
    >
    >
    >
    >
    >
    > wrote:
    > > martin.tengkl...@spray.se wrote:

    >
    > > > The topology looks like this:

    >
    > > > Ext.NTP Server A
    > > > * * * * *|
    > > > * * * * *|
    > > > Ext.NTP Server B * * * * * Ext.NTP Server C
    > > > * * * * *| * * * * * * * * * * * * * * * * * * * |
    > > > * * * * *|---------------------------------------|
    > > > * * * * * * * * * * * * * * * |
    > > > * * * * * * * * * * * * * * * |
    > > > * * * * * * * * * * *NTP Server D
    > > > * * * * * * * * * * * * * * * |
    > > > * * * * * * * * * * * * * * * |
    > > > * * * * * * * * * * * NTP Client E

    >
    > > > The problem is that my NTP client E rejected its selected NTP server
    > > > D, which lead to not syncing, leading to offset drifting on NTP Client
    > > > E. I think I have located the lack of sync to a too large "root
    > > > dispersion" value sent from the NTP server D. Its value is 1991 as
    > > > seen below:

    >
    > > > # ntpq -c"rv 51316"
    > > > status=9014 reach, conf, 1 event, event_reach,
    > > > srcadr=cliente, srcport=123, dstadr=169.254.5.34, dstport=123,
    > > > leap=00, stratum=2, precision=-16, rootdelay=1.785,
    > > > rootdispersion=1991.028, refid=10.112.1.14, reach=377, unreach=0,

    >
    > > Yup. *rootdispersion is high enough for rejection.

    >
    > > > hmode=3, pmode=4, hpoll=6, ppoll=6, flash=00 ok, keyid=0,
    > > > offset=3466396.411, delay=0.567, dispersion=0.956, jitter=37.305,
    > > > reftime=cc0328d1.feabf9bf *Wed, Jun 18 2008 *9:25:21.994,
    > > > org=cc0329cb.5b962c81 *Wed, Jun 18 2008 *9:29:31.357,
    > > > rec=cc031c40.f62d86e1 *Wed, Jun 18 2008 *8:31:44.961,
    > > > xmt=cc031c40.f5f9b77c *Wed, Jun 18 2008 *8:31:44.960,
    > > > filtdelay= * * 0.57 * *0.53 * *0.57 * *0.52 * *0..56 * *0.68 * *0.52
    > > > 1.11,
    > > > filtoffset= 3466396 3466359 3466320 3466282 3466235 3466198 3466160
    > > > 3466123,

    >
    > > This exceeds the panic threshold, so, unless this is first time and you
    > > have -g, NTP will abort if accepts this offset.

    >
    > > > filtdisp= * * *0.03 * *0.98 * *1.95 * *2.93 * *3.92 * *4.86 * *5.81
    > > > 6.77

    >
    > > > Upon looking at ntpq -c "as" command on the Client E, the server is in
    > > > condition reject, most likely due to the high root dispersion.
    > > > Correct?

    >
    > > > # ntpq -c"as"

    >
    > > > ind assID status *conf reach auth condition *last_event cnt
    > > > ================================================== =========
    > > > * 1 51316 *9014 * yes * yes *none * *reject * reachable *1

    >
    > > > The problem exists when having the NTP server D to sync with an
    > > > external NTP server C (stratum 1) having its own system clock as
    > > > reference.

    >
    > > > On NTP Server D:

    >
    > > > # ntpq -c "as"
    > > > ind assID status conf reach auth condition last_event cnt
    > > > ================================================== =========
    > > > 1 62852 9414 yes yes none candidat reachable 1
    > > > 2 62853 9614 yes yes none sys.peer reachable 1

    >
    > > > Upon looking in more detail at the two associations above:

    >
    > > > *# ntpq -c "rv 62853"
    > > > status=9614 reach, conf, sel_sys.peer, 1 event, event_reach,
    > > > srcadr=10.112.1.14, srcport=123, dstadr=10.112.2.90, dstport=123,
    > > > leap=00, stratum=1, precision=-17, rootdelay=0.000,
    > > > rootdispersion=10.284, refid=LCL, reach=377, unreach=0, hmode=3,
    > > > pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0, offset=-1128.193,
    > > > delay=1.226, dispersion=14.849, jitter=224.514,
    > > > reftime=cc12fb96.0a522000 Mon, Jun 30 2008 9:28:38.040,
    > > > org=cc12fbad.30179000 Mon, Jun 30 2008 9:29:01.187,
    > > > rec=cc12fbae.5110fdd4 Mon, Jun 30 2008 9:29:02.316,
    > > > xmt=cc12fbae.50bd8b10 Mon, Jun 30 2008 9:29:02.315,
    > > > filtdelay= 1.23 1.40 1.68 1.50 1.19 1.28 1.10 1.27,
    > > > filtoffset= -1128.1 -903.68 -1144.7 -1133.5 -814.17 -1125.2 -1125.2
    > > > -921.92,
    > > > filtdisp= 0.04 15.38 30.73 46.10 61.46 76.82 92.21 107.59

    >
    > > > # ntpq -c "rv 62852"
    > > > status=9414 reach, conf, sel_candidat, 1 event, event_reach,
    > > > srcadr=10.112.1.13, srcport=123, dstadr=10.112.2.90, dstport=123,
    > > > leap=00, stratum=2, precision=-17, rootdelay=6.454,
    > > > rootdispersion=15.533, refid=10.109.1.164, reach=377, unreach=0,
    > > > hmode=3, pmode=4, hpoll=10, ppoll=10, flash=00 ok, keyid=0,
    > > > offset=1147.347, delay=1.298, dispersion=14.874, jitter=0.641,
    > > > reftime=cc12f9fa.ed579000 Mon, Jun 30 2008 9:21:46.927,
    > > > org=cc12fbd3.785bc000 Mon, Jun 30 2008 9:29:39.470,
    > > > rec=cc12fbd2.52cdc1fb Mon, Jun 30 2008 9:29:38.323,
    > > > xmt=cc12fbd2.52726f6f Mon, Jun 30 2008 9:29:38.322,
    > > > filtdelay= 1.30 1.15 1.47 1.24 1.29 2.20 1.54 1.45,
    > > > filtoffset= 1147.35 1147.99 1371.63 1132.04 1143.24 1460.54 1150.79
    > > > 1150.61,

    >
    > > Note that the two servers differ by more than two seconds. *I'm not sure
    > > why they aren't both rejected as false tickers (in systems with LCL
    > > clocks, it is important to be able to outvote the local clock with
    > > enough real clocks, and one is far too few to do that!

    >
    > > I think rv 0 on D would be instructive, but it looks to me as though D
    > > is either rejecting both C and B, or it is trying to jump between them
    > > and the resulting huge jitter is causing the root dispersion to go
    > > through the roof. *(Rather than jumping, it may be using one and
    > > rejecting the other in its popcorn filter.)

    >
    > > > filtdisp= 0.04 15.41 30.79 46.18 61.57 76.91 92.26 107.63

    >
    > > > ...I can see that the one selected (NTP server C, i.e. AssId: 62853)
    > > > has a ref.id of LCL (meaning it is syncing to its local system clock?)

    >
    > > LCL is local clock, which means that any reference clock it actually has
    > > is broken.

    >
    > > Both are selected. *The one with the lowest stratum gets to donate its
    > > stratum and quality data, but they are both survivors, and both will be
    > > used to calculate the time.

    >
    > > I would consider a server claiming to sync to LCL and having stratum 1
    > > to be badly misconfigured. *Undisciplined local clocks should always
    > > have the highest stratum that just works, so that they are last choice
    > > and don't propagate too far. *The default for LCL is maybe OK if the
    > > machine is accurately synchronised by some non-NTP means and steps are
    > > taken to disable NTP if that source fails. *Going lower than the default
    > > really is a bad idea, and the fact that it is lower than you non-LCL
    > > server is why you have the anomaly here.

    >
    > > > while the other one, the candidate (NTP server B, stratum 2) is having
    > > > NTP server A as ref.id, meaning syncing it syncs to NTP server A.

    >
    > > > Again, when having NTP server D to primarily sync with NTP server C,
    > > > the "root dispersion" apparently gets too high, while having the NTP
    > > > server D to sync with NTP server B is fixing the problem.

    >
    > > > My question is why the root dispersion becomes too high upon syncing
    > > > to an external server having its own local system clock as reference
    > > > (i.e. NTP server C)?

    >
    > > Because C and B are not getting times traceable to the same source and
    > > there isn't an X and Y synchronised to the same source as B, to outvote C.- Hide quoted text -

    >
    > > - Show quoted text -- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > Ok, thanks for the quick reply!
    >
    > Just to clarify even more. Please correct me if I'm wrong:
    >
    > Because B and C are not getting their times traceable to the same
    > source, NTP on D have difficulties to choose between these two time
    > sources (as seen, B and C differs more than 2 secs). They are both
    > survivors and both are used in time calculation, due to lack of reason
    > to outvote C.
    >
    > The one with the lowest stratum (i.e C) gets to donate its quality
    > data, including a hugh jitter, resulting in root dispersion to go
    > through the roof. And a high root dispersion value gets NTP on E to
    > reject NTP on D.
    >
    > Correct?
    >
    > BR,
    > Martin- Hide quoted text -
    >
    > - Show quoted text -


    Additional question: As seen in the logs, server B has a quite low
    jitter while server C has huge jitter.
    Why is that? Is it because of a shaky local clock on server C or is it
    because of server C lacks a reliable source?

    Thanks in advance!

    BR,
    Martin

+ Reply to Thread