High NTP drift values, time resets and hwclock command - NTP

This is a discussion on High NTP drift values, time resets and hwclock command - NTP ; Hello, as I know there are a lot of issues according high NTP drift values and time resets. Nevertheless I need some help according my specific NTP configuration problem. I have a multi blade shelf where I try to synchronize ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: High NTP drift values, time resets and hwclock command

  1. High NTP drift values, time resets and hwclock command

    Hello,

    as I know there are a lot of issues according high NTP drift values and
    time resets.
    Nevertheless I need some help according my specific NTP configuration
    problem.

    I have a multi blade shelf where I try to synchronize the time on each
    blade according one reliable time reference.

    Lets say two blades are connected to the outside world and can
    theoretical reach external NTP servers. Other blades behind are only
    able to get the time from the 2 servers.
    Because I dont know which one is up and running I configured just both
    NTP servers fro the blades behind.

    The NTP servers which reach external NTP servers do peer each other to
    beeing able to synchronize each other if one server can not reach the
    external server but the partner server.
    Additionally I want the other blades synchronizing their time to the
    front servers even if these servers do not have a valid external time
    signal. Therefore following configuration apply for the front servers:

    remote refid st t when poll reach delay offset
    jitter
    ================================================== ============================
    127.127.1.0 LOCAL(0) 11 l 10 64 377 0.000 0.000
    0.001
    +10.168.105.52 192.168.130.172 4 u 134 256 376 0.215 -1.613
    0.231
    *192.168.130.172 139.21.3.139 3 u 80 256 377 0.204 -1.643
    0.125
    +192.168.130.173 139.21.3.139 3 u 24 256 377 0.174 -1.309
    0.090
    -192.168.130.178 139.21.3.139 3 u 29 256 377 0.196 -0.683
    0.087


    The other front server has a similar configuration:

    remote refid st t when poll reach delay offset
    jitter
    ================================================== ============================
    127.127.1.0 LOCAL(0) 12 l 14 64 377 0.000 0.000
    0.001
    -10.168.105.60 192.168.130.172 4 u 156 256 377 0.136 1.909
    0.249
    *192.168.130.172 139.21.3.139 3 u 440 1024 377 0.255 0.364
    0.530
    +192.168.130.173 139.21.3.139 3 u 313 1024 377 0.195 0.632
    0.425
    +192.168.130.178 139.21.3.139 3 u 424 1024 377 0.219 1.249
    0.659


    The server behind the front server all have following configuration:
    remote refid st t when poll reach delay offset
    jitter
    ================================================== ============================
    *10.168.105.60 192.168.130.172 4 u 150 256 377 0.122 25.306
    17.234
    +10.168.105.52 192.168.130.172 4 u 160 256 377 0.109 32.352
    23.496

    I know to have 2 servers is not really optimal but I need the
    redundancy for the servers behind the front servers and I know to
    configure a shared ip is not allowed because it can confuse ntp
    seriously.

    Additionally I observed time resets

    Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
    Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    stratum 3
    Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
    Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    stratum 3
    Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
    Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    stratum 3

    My ntp.conf file of the front server looks as follows:

    restrict default ignore
    tinker panic 0
    restrict 127.0.0.1
    # allow access from dependent blades via internal ip addresses
    restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap
    server 127.127.1.0
    fudge 127.127.1.0 stratum 11
    driftfile /etc/ntp.drift
    broadcastdelay 0.008
    peer 10.168.105.52
    restrict 10.168.105.52
    server 192.168.130.172 prefer iburst
    restrict 192.168.130.172 nomodify
    server 192.168.130.173 iburst
    restrict 192.168.130.173 nomodify
    server 192.168.130.178 iburst
    restrict 192.168.130.178 nomodify


    and for the servers behind
    restrict default ignore
    tinker panic 0
    restrict 127.0.0.1
    server 10.168.105.60 iburst
    restrict 10.168.105.60 nomodify
    server 10.168.105.52 iburst
    restrict 10.168.105.52 nomodify
    driftfile /etc/ntp.drift
    broadcastdelay 0.008


    So can anybody tell me whether the configuration still contains NTP do
    nots, because I have sometimes throubles with high ntp drift values.
    Maybe the local clock is bad.
    Additionally I use a hwclock -systohc command every 10 minutes by a
    cron job to update the hwclock periodically to be able to have a valid
    hardware clock in case of the blade is not shutdown properly and
    shutdown script is not executed at all. Could it be that hwclock call
    can confuse the NTP protocol?

    Thank you for your answers.


  2. Re: High NTP drift values, time resets and hwclockcommand

    Sergio Ferruchi wrote:
    > My ntp.conf file of the front server looks as follows:
    >
    > restrict default ignore
    > tinker panic 0
    > restrict 127.0.0.1
    > # allow access from dependent blades via internal ip addresses
    > restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap
    > server 127.127.1.0
    > fudge 127.127.1.0 stratum 11
    > driftfile /etc/ntp.drift
    > broadcastdelay 0.008
    > peer 10.168.105.52
    > restrict 10.168.105.52
    > server 192.168.130.172 prefer iburst
    > restrict 192.168.130.172 nomodify
    > server 192.168.130.173 iburst
    > restrict 192.168.130.173 nomodify
    > server 192.168.130.178 iburst
    > restrict 192.168.130.178 nomodify
    >
    >
    > and for the servers behind
    > restrict default ignore
    > tinker panic 0
    > restrict 127.0.0.1
    > server 10.168.105.60 iburst
    > restrict 10.168.105.60 nomodify
    > server 10.168.105.52 iburst
    > restrict 10.168.105.52 nomodify
    > driftfile /etc/ntp.drift
    > broadcastdelay 0.008
    >



    Lose all restrict statements. You can add them back in when you have
    things workintg to your satisfaction and when you understand what each
    one is doing.

    Why do you have broadcastdelay specified? You didn't configure a
    broadcast client so what's it for? Why are you using tinker panic 0
    instead of using the -g option in the startup?

    >
    > So can anybody tell me whether the configuration still contains NTP do
    > nots, because I have sometimes throubles with high ntp drift values.
    > Maybe the local clock is bad.


    You need to start with a simple ntp.conf file and then add complexity
    only if you really need it. In your case you are using private addresses
    so you really don't need to bother with restrict statements in the first
    place unless you cannot trust other people in your local network in
    which case you have a social/HR problem rather than a technical one.

    > Additionally I use a hwclock -systohc command every 10 minutes by a
    > cron job to update the hwclock periodically to be able to have a valid
    > hardware clock in case of the blade is not shutdown properly and
    > shutdown script is not executed at all. Could it be that hwclock call
    > can confuse the NTP protocol?
    >


    Never do this. You need to leave it to NTP to control the clock. It
    knows better what it's doing. It may very well be the main cause of your
    problems. Who advised you to do this?

    Danny
    _______________________________________________
    questions mailing list
    questions@lists.ntp.isc.org
    https://lists.ntp.isc.org/mailman/listinfo/questions


  3. Re: High NTP drift values, time resets and hwclock command

    Danny Mayer wrote:
    > Sergio Ferruchi wrote:
    >
    >>My ntp.conf file of the front server looks as follows:
    >>
    >>restrict default ignore


    >
    >>Additionally I use a hwclock -systohc command every 10 minutes by a
    >>cron job to update the hwclock periodically to be able to have a valid
    >>hardware clock in case of the blade is not shutdown properly and
    >>shutdown script is not executed at all. Could it be that hwclock call
    >>can confuse the NTP protocol?
    >>

    >
    >
    > Never do this. You need to leave it to NTP to control the clock. It
    > knows better what it's doing. It may very well be the main cause of your
    > problems. Who advised you to do this?
    >


    He's talking about the hardware clock, the one that runs on a battery
    when the power is off and is used, or can be used, for an initial
    approximation of the correct time. AFAIK ntpd does not control that and
    setting it should not affect ntpd.

    OTOH setting it every ten minutes seems like overkill; if it gains or
    loses a significant amount of time in ten minutes, it's going to be
    REALLY wrong after the power has been off for an hour or two.

  4. Re: High NTP drift values, time resets and hwclock command

    On 2006-07-26, Sergio Ferruchi wrote:

    > The server behind the front server all have following configuration:
    > remote refid st t when poll reach delay offset jitter
    >================================================== ======================
    > *10.168.105.60 192.168.130.172 4 u 150 256 377 0.122 25.306 17.234
    > +10.168.105.52 192.168.130.172 4 u 160 256 377 0.109 32.352 23.496


    This is only a snapshot of the current peer stats. You need to watch
    this over time to see how it changes.

    I'd be a bit concerned about a 25ms offset to a time server in the same
    rack.

    > Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
    > Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > stratum 3
    > Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
    > Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > stratum 3
    > Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
    > Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > stratum 3


    This means that 'sb1-1' has drifted more than the default step threshold
    (128ms). Does this occur only on the "clients" or the "servers"?

    If the resets (steps, actually) were always in the same direction and of
    roughly the same magnitude it could be a matter of a tick adjustment.
    But since the steps are divergent it's likely something else.

    What OS / (kernel) version are you running? Does the hardware have any
    sort of power-management, variable processor speed, etc. ?

    > My ntp.conf file of the front server looks as follows:


    aka "servers"

    > driftfile /etc/ntp.drift


    Some people feel that daemons have no business writing to /etc and
    should use /var. But this is not a problem.

    > broadcastdelay 0.008


    Unneeded but not a problem.

    > tinker panic 0


    This command modifies the ntpd panic threshold (which is normally 1024
    seconds). Setting this to 0 disables the panic sanity check and a clock
    offset of any value will be accepted.

    Why do you feel you need this?

    > server 127.127.1.0
    > fudge 127.127.1.0 stratum 11


    You've correctly fudged the LocalCLK to a reasonable stratum. You may
    wish to fudge the LocalCLK on the two (front) servers to different
    strata (i.e. one to 11 and the other to 12) so that the clients will
    follow one of the (front) servers.

    > restrict default ignore
    > restrict 127.0.0.1
    > # allow access from dependent blades via internal ip addresses
    > restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap


    OK. You may wish to review the explanation of 'nomodify' at
    http://ntp.isc.org/Support/AccessRestrictions.

    > peer 10.168.105.52
    > restrict 10.168.105.52


    OK

    > server 192.168.130.172 prefer iburst
    > restrict 192.168.130.172 nomodify
    > server 192.168.130.173 iburst
    > restrict 192.168.130.173 nomodify
    > server 192.168.130.178 iburst
    > restrict 192.168.130.178 nomodify


    OK

    > and for the servers behind


    aka "clients"

    > driftfile /etc/ntp.drift
    > broadcastdelay 0.008
    > tinker panic 0


    See my comments above.

    > restrict default ignore
    > restrict 127.0.0.1
    > server 10.168.105.60 iburst
    > restrict 10.168.105.60 nomodify
    > server 10.168.105.52 iburst
    > restrict 10.168.105.52 nomodify


    OK

    --
    Steve Kostecke
    NTP Public Services Project - http://ntp.isc.org/

  5. Re: High NTP drift values, time resets and hwclock command

    At first I really thank you all for answering my questions.

    Danny Mayer Part

    > In your case you are using private addresses
    > so you really don't need to bother with restrict statements in the first
    > place unless you cannot trust other people in your local network in
    > which case you have a social/HR problem rather than a technical one.


    Ok but I dont think I have a access permission problem. The clients can
    access the servers. I just want to restrict access from servers or
    clients which should not access them.

    > Never do this. You need to leave it to NTP to control the clock. It
    > knows better what it's doing. It may very well be the main cause of your
    > problems. Who advised you to do this?


    The problem was some clients where never shutdown for some reason.
    Thats why hardware clock was never updated and with every startup a
    totally wrong time stamp was set.


    Richard B. Gilbert

    > OTOH setting it every ten minutes seems like overkill; if it gains or
    > loses a significant amount of time in ten minutes, it's going to be
    > REALLY wrong after the power has been off for an hour or two.


    Ok I will reconsider the intervall.

    Steve Kostecke Part

    > I'd be a bit concerned about a 25ms offset to a time server in the same
    > rack.


    Ok it was after I added some more external servers and now it is much
    smaller (1 ms). It seems it is stable now.
    drift value is now around 70 PPM.

    >
    > > Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
    > > Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > > stratum 3
    > > Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
    > > Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > > stratum 3
    > > Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
    > > Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
    > > stratum 3

    >
    > This means that 'sb1-1' has drifted more than the default step threshold
    > (128ms). Does this occur only on the "clients" or the "servers"?


    It appears on server and on the clients but I assume the reason was the
    artificial high ntp.drift value of the ntp server measurement.

    > What OS / (kernel) version are you running? Does the hardware have any
    > sort of power-management, variable processor speed, etc. ?


    It is a 2.6.10 Kernel.

    I am currently request information on the power management
    configuration of the blades.
    Can I check it by using a command?

    > Some people feel that daemons have no business writing to /etc and
    > should use /var. But this is not a problem.


    In principle you are right. But its a configuration file which is read
    when NTPD is started maybe therefore it is often written to /etc.

    >
    > > broadcastdelay 0.008

    >
    > Unneeded but not a problem.


    Ok I will delete it.

    >
    > > tinker panic 0

    >
    > This command modifies the ntpd panic threshold (which is normally 1024
    > seconds). Setting this to 0 disables the panic sanity check and a clock
    > offset of any value will be accepted.
    >
    > Why do you feel you need this?


    The problem was when ntpdate fails the server would not be able to
    adapt to big offsets. But you are right I will consider the usage of -g
    option.


    > > server 127.127.1.0
    > > fudge 127.127.1.0 stratum 11

    >
    > You've correctly fudged the LocalCLK to a reasonable stratum. You may
    > wish to fudge the LocalCLK on the two (front) servers to different
    > strata (i.e. one to 11 and the other to 12) so that the clients will
    > follow one of the (front) servers.


    Actually thats the case. I did use for the one server 11 and for the
    other one 12. What I observed was that thats sometimes not helped to
    let the clients synchronizes from the lower stratum. Maybe the filter
    algorithm did choose the other one for other reasons. Unfortunately I
    did not really understand how the filter mechanism is working exactly.
    Is there an simple description that I can understand it. ;-)
    I will try to use 11 and 13 instead. Maybe 11 and 12 is to less
    difference.

    I think the main problem was caused by the local clock configuration
    but I need it to be fault tolerant in case of temporarely ntp server
    outages. This assures the client and servers follows the same time.
    What makes it sometimes instable is when only one external NTP server
    is responding. Then I think the local clock on both servers do decide
    to dont trus the one external NTP server. Is this possible or can this
    only happen if the one external Server goes insane.
    I also wonder why in such cases the drift value is 500 PPM. Is it
    possible that the local clock + peer + only one external server
    available is the main reason for the problem.
    Why can a drift value can be determined that wrong and is it possible
    that the drift value is not corrected automatically if more than one
    server is available again.

    Again many thanks to your contributions.


+ Reply to Thread