Is this normal behavior? - NTP

This is a discussion on Is this normal behavior? - NTP ; I'm not very knowledgeable about NTP, but I'm suspicious of its implementation on one of my servers. I have one server running the Debian sarge package of NTP: cn2:/var/log# ntpq ntpq> version ntpq 4.2.0a@1:4.2.0a+stable-2-r Fri Aug 26 10:30:19 UTC 2005 ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: Is this normal behavior?

  1. Is this normal behavior?

    I'm not very knowledgeable about NTP, but I'm suspicious of its
    implementation on one of my servers. I have one server running the
    Debian sarge package of NTP:
    cn2:/var/log# ntpq
    ntpq> version
    ntpq 4.2.0a@1:4.2.0a+stable-2-r Fri Aug 26 10:30:19 UTC 2005 (1)
    ntpq>

    I've tried to set this server up as a timeserver for my network, using
    tock.usno.mil, a time server at my institution (Johns Hopkins
    University) and the pool timeservers:
    ntpq> peers
    remote refid st t when poll reach delay offset
    jitter
    ================================================== ======================
    ======
    +jhname.hcf.jhu. 128.4.1.1 2 u 60 64 177 3.313 835.818
    545.232
    *ntp1.usno.navy. .USNO. 1 u 60 64 177 8.567 827.174
    551.616
    +trane.wu-wien.a 195.13.1.153 3 u 57 64 177 125.292 841.188
    548.251
    +221-15-178-69.g 140.142.16.34 2 u 50 64 177 107.300 1212.00
    395.490
    +tock.jrc.us 207.168.62.76 2 u 58 64 177 16.942 1020.49
    432.251
    LOCAL(0) LOCAL(0) 13 l 55 64 177 0.000 0.000
    0.002
    ntpq>

    I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
    reachability 377, and it frequently and inexplicitly drops to 1 as I'm
    watching it. Ten minutes ago reachability was 177 as shown above.
    Watching it in the last minute, it dropped from 377 to 1 on all sources.
    Now it's:
    Every 2s: ntpq -p
    Fri Dec 8 10:35:18 2006

    remote refid st t when poll reach delay offset
    jitter
    ================================================== ======================
    ======
    jhname.hcf.jhu. 128.4.1.1 2 u 53 64 3 2.707 660.049
    176.587
    ntp1.usno.navy. .USNO. 1 u 52 64 3 8.904 663.193
    181.873
    trane.wu-wien.a 195.13.1.153 3 u 49 64 3 125.979 681.070
    186.182
    221-15-178-69.g 140.142.16.34 2 u 51 64 3 104.207 485.281
    183.834
    tock.jrc.us 207.168.62.76 2 u 50 64 3 16.912 668.565
    186.520
    LOCAL(0) LOCAL(0) 13 l 49 64 3 0.000 0.000
    0.002

    Is this normal behavior for NTP, to frequently lose the ability to reach
    a timeserver? If not, how can I troubleshoot it further?

    Here's the syslog entries pertaining to npt for just one hour this
    morning:
    Dec 8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
    Dec 8 09:07:21 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    Dec 8 09:08:25 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    Dec 8 09:09:31 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s
    Dec 8 09:27:51 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    Dec 8 09:28:56 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 09:36:18 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    Dec 8 09:36:25 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 09:40:40 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    Dec 8 09:41:45 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 09:41:47 cn2 ntpd[16955]: synchronized to 69.178.15.221, stratum
    2
    Dec 8 09:43:46 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 09:49:21 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    Dec 8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
    Dec 8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    Dec 8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    Dec 8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    Dec 8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    Dec 8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    Dec 8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
    Dec 8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    Dec 8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1

    These time resets seem rather large to me. Is this normal, too?

    Here's the output of the peerstat.qwk script:
    cn2:/var/log/ntpstats# gawk -f
    /home/kevinz/ntp-4.2.2p4/scripts/stats/peer.awk peerstats
    ident cnt mean rms max delay dist
    disp
    ================================================== ======================
    ==
    67.128.71.75 715 2364.263 996.393 2364.263 16.751 947.994
    96.315
    127.127.1.0 844 0.000 0.000 0.000 0.000 0.990
    0.957
    192.5.41.41 728 2321.937 1023.300 2964.896 7.907 944.371
    94.581
    137.208.3.51 675 2500.134 995.134 3013.699 124.989 503.205
    44.951
    128.220.2.7 721 2405.904 982.298 2712.599 2.564 940.869
    95.476
    69.178.15.221 722 2395.872 1002.130 2911.691 104.730 997.703
    95.423
    cn2:/var/log/ntpstats#

    Are there any other diagnostics that I could run to help identify any
    problem?

    Thank you all for your help, advice and suggestions.

    -Kevin

    Kevin Zembower
    Internet Services Group manager
    Center for Communication Programs
    Bloomberg School of Public Health
    Johns Hopkins University
    111 Market Place, Suite 310
    Baltimore, Maryland 21202
    410-659-6139
    _______________________________________________
    questions mailing list
    questions@lists.ntp.isc.org
    https://lists.ntp.isc.org/mailman/listinfo/questions

  2. Re: Is this normal behavior?

    The behavior you see is not normal.

    I forget the 'sign' of steps - I'm not sure if you are gaining or losing
    time.

    Either way, again, not normal.

    Please see:

    http://ntp.isc.org/Support/TroubleshootingNTP

    in particular, the first two subchapters (known hardware issues and known OS
    issues).

    H

  3. Re: Is this normal behavior?

    Zembower, Kevin wrote:

    > I'm not very knowledgeable about NTP, but I'm suspicious of its
    > implementation on one of my servers. I have one server running the
    > Debian sarge package of NTP:
    > cn2:/var/log# ntpq
    > ntpq> version
    > ntpq 4.2.0a@1:4.2.0a+stable-2-r Fri Aug 26 10:30:19 UTC 2005 (1)
    > ntpq>
    >
    > I've tried to set this server up as a timeserver for my network, using
    > tock.usno.mil, a time server at my institution (Johns Hopkins
    > University) and the pool timeservers:
    > ntpq> peers
    > remote refid st t when poll reach delay offset
    > jitter
    > ================================================== ======================
    > ======
    > +jhname.hcf.jhu. 128.4.1.1 2 u 60 64 177 3.313 835.818
    > 545.232
    > *ntp1.usno.navy. .USNO. 1 u 60 64 177 8.567 827.174
    > 551.616
    > +trane.wu-wien.a 195.13.1.153 3 u 57 64 177 125.292 841.188
    > 548.251
    > +221-15-178-69.g 140.142.16.34 2 u 50 64 177 107.300 1212.00
    > 395.490
    > +tock.jrc.us 207.168.62.76 2 u 58 64 177 16.942 1020.49
    > 432.251
    > LOCAL(0) LOCAL(0) 13 l 55 64 177 0.000 0.000
    > 0.002
    > ntpq>


    If you are going to use pool servers, you will be much better off using
    the "US" sub pool. The round trip delays to Europe are far too large
    for really good time keeping.

    >
    > I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
    > reachability 377, and it frequently and inexplicitly drops to 1 as I'm
    > watching it. Ten minutes ago reachability was 177 as shown above.
    > Watching it in the last minute, it dropped from 377 to 1 on all sources.
    > Now it's:
    > Every 2s: ntpq -p
    > Fri Dec 8 10:35:18 2006
    >
    > remote refid st t when poll reach delay offset
    > jitter
    > ================================================== ======================
    > ======
    > jhname.hcf.jhu. 128.4.1.1 2 u 53 64 3 2.707 660.049
    > 176.587
    > ntp1.usno.navy. .USNO. 1 u 52 64 3 8.904 663.193
    > 181.873
    > trane.wu-wien.a 195.13.1.153 3 u 49 64 3 125.979 681.070
    > 186.182
    > 221-15-178-69.g 140.142.16.34 2 u 51 64 3 104.207 485.281
    > 183.834
    > tock.jrc.us 207.168.62.76 2 u 50 64 3 16.912 668.565
    > 186.520
    > LOCAL(0) LOCAL(0) 13 l 49 64 3 0.000 0.000
    > 0.002
    >

    Next, you do seem to have a reachability problem. This is NOT normal
    behavior. Since all the configured servers, including the local clock,
    exhibit the problem it is clearly not a problem with the servers. I'm
    inclined to suspect a bug in the version you are running, or a problem
    with your O/S. The fact that the local clock is affected suggests that
    the problem is internal to your system.

    > Is this normal behavior for NTP, to frequently lose the ability to reach
    > a timeserver? If not, how can I troubleshoot it further?
    >
    > Here's the syslog entries pertaining to npt for just one hour this
    > morning:
    > Dec 8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
    > Dec 8 09:07:21 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:08:25 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 09:09:31 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s
    > Dec 8 09:27:51 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:28:56 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:36:18 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:36:25 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:40:40 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:41:45 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:41:47 cn2 ntpd[16955]: synchronized to 69.178.15.221, stratum
    > 2
    > Dec 8 09:43:46 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:49:21 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
    > Dec 8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
    > Dec 8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >
    > These time resets seem rather large to me. Is this normal, too?


    The large resets are not normal either. These are usually observed when
    the system in question is losing clock interrupts. Both Windows and
    Linux have been known to exhibit this problem. In the case of Linux
    there is a kernel parameter called "HZ" which, if set to 250 or 1000
    increases the probability that interrupts can be masked or disabled for
    two consecutive clock "ticks". The fix is to set HZ to 100.
    >
    > Here's the output of the peerstat.qwk script:
    > cn2:/var/log/ntpstats# gawk -f
    > /home/kevinz/ntp-4.2.2p4/scripts/stats/peer.awk peerstats
    > ident cnt mean rms max delay dist
    > disp
    > ================================================== ======================
    > ==
    > 67.128.71.75 715 2364.263 996.393 2364.263 16.751 947.994
    > 96.315
    > 127.127.1.0 844 0.000 0.000 0.000 0.000 0.990
    > 0.957
    > 192.5.41.41 728 2321.937 1023.300 2964.896 7.907 944.371
    > 94.581
    > 137.208.3.51 675 2500.134 995.134 3013.699 124.989 503.205
    > 44.951
    > 128.220.2.7 721 2405.904 982.298 2712.599 2.564 940.869
    > 95.476
    > 69.178.15.221 722 2395.872 1002.130 2911.691 104.730 997.703
    > 95.423


    These numbers are NOT good.

    The first thing to do is to fix HZ if you are running Linux. If you are
    not running Linux please tell us what you ARE running.

    If fixing HZ does not improve performance, repost with more details as
    to hardware, O/S, network, etc.

  4. Re: Is this normal behavior?

    In article <2E8AE992B157C0409B18D0225D0B476304C57680@XCH-VN01.sph.ad.jhsph.edu>,
    kzembowe@jhuccp.org (Zembower, Kevin) wrote:

    > Dec 8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
    > Dec 8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s


    You have a serious problem with your machine running slow. On Linux this
    is often due to lost clock interrupts as a result of using a higher HZ
    figure in the kernel than the disk driver can support. It could also
    mean a broken motherboard clock, the effects of power management, a wrong
    value having been calculated for the CPU frequency, etc. The fact that
    you report high but intermediate offsets tends to rule out the possibility
    that you have coflicting clock synchronisation software.

    > *ntp1.usno.navy. .USNO. 1 u 60 64 177 8.567 827.174
    > 551.616


    Do you meet the rules of engagement conditions for using a stratum
    one server (although this one tends to be overloaded and not particularly
    good as a result)? In any case, note that the offseet has already reached
    827ms.

    > +trane.wu-wien.a 195.13.1.153 3 u 57 64 177 125.292 841.188
    > 548.251
    > +221-15-178-69.g 140.142.16.34 2 u 50 64 177 107.300 1212.00
    > 395.490


    These two servers are too far away to be useful, given that you can
    achieve single figure delays to other servers.

    > I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
    > reachability 377, and it frequently and inexplicitly drops to 1 as I'm


    This is because the offset becomes unacceptably high, and a step
    is initiated, before it gets to that point. Whenever the clock is
    stepped (which is never desirable, after the initial synchronisation)
    the states of the servers are discarded and ntpd starts over (but with
    updated frequency and offset estimates).

    > Is this normal behavior for NTP, to frequently lose the ability to reach
    > a timeserver? If not, how can I troubleshoot it further?


    What's probably happening here is that each server is rejected
    in turn. Server hopping does happen, but not like this.

    > These time resets seem rather large to me. Is this normal, too?


    This is the fundamental symptom.

    > Are there any other diagnostics that I could run to help identify any
    > problem?


    Check if the rate of loss correlates with any form of system activity
    (particularly IDE disks).

    Disable any power management features.

    Make sure that HZ=100 or rebuild the kernel to make it so.

    Check the clock behaviour running MS-DOS or the oldest available Windows
    (basically to avoid all device activity and use quite large ticks. If it
    loses at more than 450ppm, get it working in that environement before
    running the normal system (actually, you can correct pure frequency
    errors of more than this, but a good machine should be within about
    20ppm and the worst I've seen is about 300ppm, so this large an error
    probably indicates a system that is too unreliable for the job.

    Check the frequency correction. If it is not on the, 500ppm, end stop,
    it may indicate that your time loss is intermittent.

    If you meet the conditions for using stratum one public servers, it would
    probably be a good idea to dedicate a machine to being the site
    stratume two server. This can be relatively low specfication (well,
    actually very low) which means that it is much less likely to suffer from
    the more technical causes of this sort of problem.

    Read the recent thread that concluded that a power management related
    parameter can sometimes avoid a problem.

  5. Re: Is this normal behavior?

    In article ,
    Richard B. Gilbert wrote:

    > Next, you do seem to have a reachability problem. This is NOT normal


    He does't have a reachability problem. He has a step reset problem.

    The servers are perfectly reachable when he tries to access them, but
    the reachability data is cleared on each clock step.

  6. Re: Is this normal behavior?

    David Woolley wrote:
    > In article ,
    > Richard B. Gilbert wrote:
    >
    >
    >>Next, you do seem to have a reachability problem. This is NOT normal

    >
    >
    > He does't have a reachability problem. He has a step reset problem.
    >
    > The servers are perfectly reachable when he tries to access them, but
    > the reachability data is cleared on each clock step.


    Okay, I've never observed this stepping problem since I run my server
    on Solaris rather than Linux.

  7. Re: Is this normal behavior?

    Zembower, Kevin wrote:
    > I'm not very knowledgeable about NTP, but I'm suspicious of its
    > implementation on one of my servers. I have one server running the
    > Debian sarge package of NTP:
    > cn2:/var/log# ntpq
    > ntpq> version
    > ntpq 4.2.0a@1:4.2.0a+stable-2-r Fri Aug 26 10:30:19 UTC 2005 (1)
    > ntpq>
    >
    > I've tried to set this server up as a timeserver for my network, using
    > tock.usno.mil, a time server at my institution (Johns Hopkins
    > University) and the pool timeservers:
    > ntpq> peers
    > remote refid st t when poll reach delay offset
    > jitter
    > ================================================== ======================
    > ======
    > +jhname.hcf.jhu. 128.4.1.1 2 u 60 64 177 3.313 835.818
    > 545.232
    > *ntp1.usno.navy. .USNO. 1 u 60 64 177 8.567 827.174
    > 551.616
    > +trane.wu-wien.a 195.13.1.153 3 u 57 64 177 125.292 841.188
    > 548.251
    > +221-15-178-69.g 140.142.16.34 2 u 50 64 177 107.300 1212.00
    > 395.490
    > +tock.jrc.us 207.168.62.76 2 u 58 64 177 16.942 1020.49
    > 432.251
    > LOCAL(0) LOCAL(0) 13 l 55 64 177 0.000 0.000
    > 0.002
    > ntpq>
    >
    > I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
    > reachability 377, and it frequently and inexplicitly drops to 1 as I'm
    > watching it. Ten minutes ago reachability was 177 as shown above.
    > Watching it in the last minute, it dropped from 377 to 1 on all sources.
    > Now it's:
    > Every 2s: ntpq -p
    > Fri Dec 8 10:35:18 2006
    >
    > remote refid st t when poll reach delay offset
    > jitter
    > ================================================== ======================
    > ======
    > jhname.hcf.jhu. 128.4.1.1 2 u 53 64 3 2.707 660.049
    > 176.587
    > ntp1.usno.navy. .USNO. 1 u 52 64 3 8.904 663.193
    > 181.873
    > trane.wu-wien.a 195.13.1.153 3 u 49 64 3 125.979 681.070
    > 186.182
    > 221-15-178-69.g 140.142.16.34 2 u 51 64 3 104.207 485.281
    > 183.834
    > tock.jrc.us 207.168.62.76 2 u 50 64 3 16.912 668.565
    > 186.520
    > LOCAL(0) LOCAL(0) 13 l 49 64 3 0.000 0.000
    > 0.002
    >
    > Is this normal behavior for NTP, to frequently lose the ability to reach
    > a timeserver? If not, how can I troubleshoot it further?



    Hell no! You seem to have a serious network problem of some sort. Can
    you ping these servers? At a time when the problem is manifesting itself?

    It has nothing to do with your present problem but the pool has
    presented you with servers several thousand miles away from you. The
    delays are such that it is most unlikely that these servers will ever be
    selected. I believe that the pool allows you to specify servers in the
    US (other areas as well but you, being in the US should be using nearby
    servers). You should find out how and configure accordingly. I think
    you just stick a "US" in front of the "pool.ntp.org". If you want to
    play guessing games you might try us.pool.ntp.org.

    >
    > Here's the syslog entries pertaining to npt for just one hour this
    > morning:
    > Dec 8 09:03:06 cn2 ntpd[16955]: time reset +3.120367 s
    > Dec 8 09:07:21 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:08:25 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 09:09:31 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:23:33 cn2 ntpd[16955]: time reset +3.503628 s
    > Dec 8 09:27:51 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:28:56 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:36:18 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:36:25 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:40:40 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:41:45 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:41:47 cn2 ntpd[16955]: synchronized to 69.178.15.221, stratum
    > 2
    > Dec 8 09:43:46 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 09:49:21 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
    > Dec 8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    > Dec 8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    > Dec 8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    > Dec 8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
    > Dec 8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    > Dec 8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >
    > These time resets seem rather large to me. Is this normal, too?


    That's not normal either. Nor is the fact that your system is "clock
    hopping".

    >
    > Here's the output of the peerstat.qwk script:
    > cn2:/var/log/ntpstats# gawk -f
    > /home/kevinz/ntp-4.2.2p4/scripts/stats/peer.awk peerstats
    > ident cnt mean rms max delay dist
    > disp
    > ================================================== ======================
    > ==
    > 67.128.71.75 715 2364.263 996.393 2364.263 16.751 947.994
    > 96.315
    > 127.127.1.0 844 0.000 0.000 0.000 0.000 0.990
    > 0.957
    > 192.5.41.41 728 2321.937 1023.300 2964.896 7.907 944.371
    > 94.581
    > 137.208.3.51 675 2500.134 995.134 3013.699 124.989 503.205
    > 44.951
    > 128.220.2.7 721 2405.904 982.298 2712.599 2.564 940.869
    > 95.476
    > 69.178.15.221 722 2395.872 1002.130 2911.691 104.730 997.703
    > 95.423


    Excuse me while I barf! I can't tell what's wrong but something is FUBAR!

    What kind of internet connection do you have? Tin cans and string?


  8. Re: Is this normal behavior?

    In article <45E8AAC2.2050905@comcast.net> "Richard B. gilbert"
    writes:
    >Zembower, Kevin wrote:
    >> I notice the problem here, and if I run 'watch ntpq -p.' Seldom is my
    >> reachability 377, and it frequently and inexplicitly drops to 1 as I'm
    >> watching it. Ten minutes ago reachability was 177 as shown above.
    >> Watching it in the last minute, it dropped from 377 to 1 on all sources.

    [snip]
    >>
    >> Is this normal behavior for NTP, to frequently lose the ability to reach
    >> a timeserver? If not, how can I troubleshoot it further?

    >
    >
    >Hell no! You seem to have a serious network problem of some sort. Can
    >you ping these servers? At a time when the problem is manifesting itself?


    Uh, you're suggesting that his network is so bad that it takes back
    packets that have already been received? Obviously this has nothing to
    do with network connectivity, the explanation is below.

    >> Dec 8 09:49:21 cn2 ntpd[16955]: time reset +4.512453 s
    >> Dec 8 09:53:41 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    >> Dec 8 09:54:39 cn2 ntpd[16955]: synchronized to 128.220.2.7, stratum 2
    >> Dec 8 09:54:44 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >> Dec 8 10:01:11 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    >> Dec 8 10:02:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >> Dec 8 10:09:40 cn2 ntpd[16955]: synchronized to 67.128.71.75, stratum 2
    >> Dec 8 10:10:51 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >> Dec 8 10:10:51 cn2 ntpd[16955]: time reset +3.950081 s
    >> Dec 8 10:15:08 cn2 ntpd[16955]: synchronized to LOCAL(0), stratum 13
    >> Dec 8 10:16:14 cn2 ntpd[16955]: synchronized to 192.5.41.41, stratum 1
    >>
    >> These time resets seem rather large to me. Is this normal, too?

    >
    >That's not normal either.


    Moreover, it's the reason for the 'reach' register dropping to 1 - every
    time ntpd does a reset, it effectively starts over with its
    calculations, intentionally resetting all previous state including
    'reach'.

    And of course the resets reveal the real problem here - your (Kevin's)
    clock is drifting like crazy. The fact that the resets are large is one
    problem indicator, but in normal operation there shouldn't be any resets
    at all, and the thing to take special note of is the relation between
    the size of the resets and the interval between them.

    In this case, the last reset was + ~ 4 seconds, 21.5 minutes after the
    previous one, which means that your clock is slow by 4/1290 or 3100 ppm
    (parts per million), which is way beyond the 500 ppm limit where ntpd
    can operate without resets. You may have a hardware problem, but it's
    probably more likely that you're losing clock interrupts.

    A reportedly common case of this is running Linux with a high "clock
    rate" a.k.a. "hz", like 1000 which seems to be popular in recent
    distributions. Dropping it down to a more traditional 100 or at least
    250 is said to help. Another cause may be high disk I/O activitiy on an
    OS where disk drivers lock out clock interrupts for long periods -
    making sure to use DMA for this (if supported by driver and HW of
    course) can help in this case.

    --Per Hedeland
    per@hedeland.org


+ Reply to Thread