High NTP drift values, time resets and hwclock command - NTP
This is a discussion on High NTP drift values, time resets and hwclock command - NTP ; Hello,
as I know there are a lot of issues according high NTP drift values and
time resets.
Nevertheless I need some help according my specific NTP configuration
problem.
I have a multi blade shelf where I try to synchronize ...
-
High NTP drift values, time resets and hwclock command
Hello,
as I know there are a lot of issues according high NTP drift values and
time resets.
Nevertheless I need some help according my specific NTP configuration
problem.
I have a multi blade shelf where I try to synchronize the time on each
blade according one reliable time reference.
Lets say two blades are connected to the outside world and can
theoretical reach external NTP servers. Other blades behind are only
able to get the time from the 2 servers.
Because I dont know which one is up and running I configured just both
NTP servers fro the blades behind.
The NTP servers which reach external NTP servers do peer each other to
beeing able to synchronize each other if one server can not reach the
external server but the partner server.
Additionally I want the other blades synchronizing their time to the
front servers even if these servers do not have a valid external time
signal. Therefore following configuration apply for the front servers:
remote refid st t when poll reach delay offset
jitter
================================================== ============================
127.127.1.0 LOCAL(0) 11 l 10 64 377 0.000 0.000
0.001
+10.168.105.52 192.168.130.172 4 u 134 256 376 0.215 -1.613
0.231
*192.168.130.172 139.21.3.139 3 u 80 256 377 0.204 -1.643
0.125
+192.168.130.173 139.21.3.139 3 u 24 256 377 0.174 -1.309
0.090
-192.168.130.178 139.21.3.139 3 u 29 256 377 0.196 -0.683
0.087
The other front server has a similar configuration:
remote refid st t when poll reach delay offset
jitter
================================================== ============================
127.127.1.0 LOCAL(0) 12 l 14 64 377 0.000 0.000
0.001
-10.168.105.60 192.168.130.172 4 u 156 256 377 0.136 1.909
0.249
*192.168.130.172 139.21.3.139 3 u 440 1024 377 0.255 0.364
0.530
+192.168.130.173 139.21.3.139 3 u 313 1024 377 0.195 0.632
0.425
+192.168.130.178 139.21.3.139 3 u 424 1024 377 0.219 1.249
0.659
The server behind the front server all have following configuration:
remote refid st t when poll reach delay offset
jitter
================================================== ============================
*10.168.105.60 192.168.130.172 4 u 150 256 377 0.122 25.306
17.234
+10.168.105.52 192.168.130.172 4 u 160 256 377 0.109 32.352
23.496
I know to have 2 servers is not really optimal but I need the
redundancy for the servers behind the front servers and I know to
configure a shared ip is not allowed because it can confuse ntp
seriously.
Additionally I observed time resets
Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
stratum 3
Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
stratum 3
Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
stratum 3
My ntp.conf file of the front server looks as follows:
restrict default ignore
tinker panic 0
restrict 127.0.0.1
# allow access from dependent blades via internal ip addresses
restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap
server 127.127.1.0
fudge 127.127.1.0 stratum 11
driftfile /etc/ntp.drift
broadcastdelay 0.008
peer 10.168.105.52
restrict 10.168.105.52
server 192.168.130.172 prefer iburst
restrict 192.168.130.172 nomodify
server 192.168.130.173 iburst
restrict 192.168.130.173 nomodify
server 192.168.130.178 iburst
restrict 192.168.130.178 nomodify
and for the servers behind
restrict default ignore
tinker panic 0
restrict 127.0.0.1
server 10.168.105.60 iburst
restrict 10.168.105.60 nomodify
server 10.168.105.52 iburst
restrict 10.168.105.52 nomodify
driftfile /etc/ntp.drift
broadcastdelay 0.008
So can anybody tell me whether the configuration still contains NTP do
nots, because I have sometimes throubles with high ntp drift values.
Maybe the local clock is bad.
Additionally I use a hwclock -systohc command every 10 minutes by a
cron job to update the hwclock periodically to be able to have a valid
hardware clock in case of the blade is not shutdown properly and
shutdown script is not executed at all. Could it be that hwclock call
can confuse the NTP protocol?
Thank you for your answers.
-
Re: High NTP drift values, time resets and hwclockcommand
Sergio Ferruchi wrote:
> My ntp.conf file of the front server looks as follows:
>
> restrict default ignore
> tinker panic 0
> restrict 127.0.0.1
> # allow access from dependent blades via internal ip addresses
> restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap
> server 127.127.1.0
> fudge 127.127.1.0 stratum 11
> driftfile /etc/ntp.drift
> broadcastdelay 0.008
> peer 10.168.105.52
> restrict 10.168.105.52
> server 192.168.130.172 prefer iburst
> restrict 192.168.130.172 nomodify
> server 192.168.130.173 iburst
> restrict 192.168.130.173 nomodify
> server 192.168.130.178 iburst
> restrict 192.168.130.178 nomodify
>
>
> and for the servers behind
> restrict default ignore
> tinker panic 0
> restrict 127.0.0.1
> server 10.168.105.60 iburst
> restrict 10.168.105.60 nomodify
> server 10.168.105.52 iburst
> restrict 10.168.105.52 nomodify
> driftfile /etc/ntp.drift
> broadcastdelay 0.008
>
Lose all restrict statements. You can add them back in when you have
things workintg to your satisfaction and when you understand what each
one is doing.
Why do you have broadcastdelay specified? You didn't configure a
broadcast client so what's it for? Why are you using tinker panic 0
instead of using the -g option in the startup?
>
> So can anybody tell me whether the configuration still contains NTP do
> nots, because I have sometimes throubles with high ntp drift values.
> Maybe the local clock is bad.
You need to start with a simple ntp.conf file and then add complexity
only if you really need it. In your case you are using private addresses
so you really don't need to bother with restrict statements in the first
place unless you cannot trust other people in your local network in
which case you have a social/HR problem rather than a technical one.
> Additionally I use a hwclock -systohc command every 10 minutes by a
> cron job to update the hwclock periodically to be able to have a valid
> hardware clock in case of the blade is not shutdown properly and
> shutdown script is not executed at all. Could it be that hwclock call
> can confuse the NTP protocol?
>
Never do this. You need to leave it to NTP to control the clock. It
knows better what it's doing. It may very well be the main cause of your
problems. Who advised you to do this?
Danny
_______________________________________________
questions mailing list
questions@lists.ntp.isc.org
https://lists.ntp.isc.org/mailman/listinfo/questions
-
Re: High NTP drift values, time resets and hwclock command
Danny Mayer wrote:
> Sergio Ferruchi wrote:
>
>>My ntp.conf file of the front server looks as follows:
>>
>>restrict default ignore
>
>>Additionally I use a hwclock -systohc command every 10 minutes by a
>>cron job to update the hwclock periodically to be able to have a valid
>>hardware clock in case of the blade is not shutdown properly and
>>shutdown script is not executed at all. Could it be that hwclock call
>>can confuse the NTP protocol?
>>
>
>
> Never do this. You need to leave it to NTP to control the clock. It
> knows better what it's doing. It may very well be the main cause of your
> problems. Who advised you to do this?
>
He's talking about the hardware clock, the one that runs on a battery
when the power is off and is used, or can be used, for an initial
approximation of the correct time. AFAIK ntpd does not control that and
setting it should not affect ntpd.
OTOH setting it every ten minutes seems like overkill; if it gains or
loses a significant amount of time in ten minutes, it's going to be
REALLY wrong after the power has been off for an hour or two.
-
Re: High NTP drift values, time resets and hwclock command
On 2006-07-26, Sergio Ferruchi wrote:
> The server behind the front server all have following configuration:
> remote refid st t when poll reach delay offset jitter
>================================================== ======================
> *10.168.105.60 192.168.130.172 4 u 150 256 377 0.122 25.306 17.234
> +10.168.105.52 192.168.130.172 4 u 160 256 377 0.109 32.352 23.496
This is only a snapshot of the current peer stats. You need to watch
this over time to see how it changes.
I'd be a bit concerned about a 25ms offset to a time server in the same
rack.
> Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
> Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> stratum 3
> Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
> Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> stratum 3
> Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
> Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> stratum 3
This means that 'sb1-1' has drifted more than the default step threshold
(128ms). Does this occur only on the "clients" or the "servers"?
If the resets (steps, actually) were always in the same direction and of
roughly the same magnitude it could be a matter of a tick adjustment.
But since the steps are divergent it's likely something else.
What OS / (kernel) version are you running? Does the hardware have any
sort of power-management, variable processor speed, etc. ?
> My ntp.conf file of the front server looks as follows:
aka "servers"
> driftfile /etc/ntp.drift
Some people feel that daemons have no business writing to /etc and
should use /var. But this is not a problem.
> broadcastdelay 0.008
Unneeded but not a problem.
> tinker panic 0
This command modifies the ntpd panic threshold (which is normally 1024
seconds). Setting this to 0 disables the panic sanity check and a clock
offset of any value will be accepted.
Why do you feel you need this?
> server 127.127.1.0
> fudge 127.127.1.0 stratum 11
You've correctly fudged the LocalCLK to a reasonable stratum. You may
wish to fudge the LocalCLK on the two (front) servers to different
strata (i.e. one to 11 and the other to 12) so that the clients will
follow one of the (front) servers.
> restrict default ignore
> restrict 127.0.0.1
> # allow access from dependent blades via internal ip addresses
> restrict 10.168.105.0 mask 255.255.248.0 nomodify notrap
OK. You may wish to review the explanation of 'nomodify' at
http://ntp.isc.org/Support/AccessRestrictions.
> peer 10.168.105.52
> restrict 10.168.105.52
OK
> server 192.168.130.172 prefer iburst
> restrict 192.168.130.172 nomodify
> server 192.168.130.173 iburst
> restrict 192.168.130.173 nomodify
> server 192.168.130.178 iburst
> restrict 192.168.130.178 nomodify
OK
> and for the servers behind
aka "clients"
> driftfile /etc/ntp.drift
> broadcastdelay 0.008
> tinker panic 0
See my comments above.
> restrict default ignore
> restrict 127.0.0.1
> server 10.168.105.60 iburst
> restrict 10.168.105.60 nomodify
> server 10.168.105.52 iburst
> restrict 10.168.105.52 nomodify
OK
--
Steve Kostecke
NTP Public Services Project - http://ntp.isc.org/
-
Re: High NTP drift values, time resets and hwclock command
At first I really thank you all for answering my questions.
Danny Mayer Part
> In your case you are using private addresses
> so you really don't need to bother with restrict statements in the first
> place unless you cannot trust other people in your local network in
> which case you have a social/HR problem rather than a technical one.
Ok but I dont think I have a access permission problem. The clients can
access the servers. I just want to restrict access from servers or
clients which should not access them.
> Never do this. You need to leave it to NTP to control the clock. It
> knows better what it's doing. It may very well be the main cause of your
> problems. Who advised you to do this?
The problem was some clients where never shutdown for some reason.
Thats why hardware clock was never updated and with every startup a
totally wrong time stamp was set.
Richard B. Gilbert
> OTOH setting it every ten minutes seems like overkill; if it gains or
> loses a significant amount of time in ten minutes, it's going to be
> REALLY wrong after the power has been off for an hour or two.
Ok I will reconsider the intervall.
Steve Kostecke Part
> I'd be a bit concerned about a 25ms offset to a time server in the same
> rack.
Ok it was after I added some more external servers and now it is much
smaller (1 ms). It seems it is stable now.
drift value is now around 70 PPM.
>
> > Jul 26 12:16:41 sb1-1 ntpd[10225]: time reset +0.481624 s
> > Jul 26 12:18:00 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
> > Jul 26 12:36:16 sb1-1 ntpd[10225]: time reset -0.197015 s
> > Jul 26 12:37:35 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
> > Jul 26 13:07:42 sb1-1 ntpd[10225]: time reset +0.263151 s
> > Jul 26 13:09:02 sb1-1 ntpd[10225]: synchronized to 192.168.130.172,
> > stratum 3
>
> This means that 'sb1-1' has drifted more than the default step threshold
> (128ms). Does this occur only on the "clients" or the "servers"?
It appears on server and on the clients but I assume the reason was the
artificial high ntp.drift value of the ntp server measurement.
> What OS / (kernel) version are you running? Does the hardware have any
> sort of power-management, variable processor speed, etc. ?
It is a 2.6.10 Kernel.
I am currently request information on the power management
configuration of the blades.
Can I check it by using a command?
> Some people feel that daemons have no business writing to /etc and
> should use /var. But this is not a problem.
In principle you are right. But its a configuration file which is read
when NTPD is started maybe therefore it is often written to /etc.
>
> > broadcastdelay 0.008
>
> Unneeded but not a problem.
Ok I will delete it.
>
> > tinker panic 0
>
> This command modifies the ntpd panic threshold (which is normally 1024
> seconds). Setting this to 0 disables the panic sanity check and a clock
> offset of any value will be accepted.
>
> Why do you feel you need this?
The problem was when ntpdate fails the server would not be able to
adapt to big offsets. But you are right I will consider the usage of -g
option.
> > server 127.127.1.0
> > fudge 127.127.1.0 stratum 11
>
> You've correctly fudged the LocalCLK to a reasonable stratum. You may
> wish to fudge the LocalCLK on the two (front) servers to different
> strata (i.e. one to 11 and the other to 12) so that the clients will
> follow one of the (front) servers.
Actually thats the case. I did use for the one server 11 and for the
other one 12. What I observed was that thats sometimes not helped to
let the clients synchronizes from the lower stratum. Maybe the filter
algorithm did choose the other one for other reasons. Unfortunately I
did not really understand how the filter mechanism is working exactly.
Is there an simple description that I can understand it. ;-)
I will try to use 11 and 13 instead. Maybe 11 and 12 is to less
difference.
I think the main problem was caused by the local clock configuration
but I need it to be fault tolerant in case of temporarely ntp server
outages. This assures the client and servers follows the same time.
What makes it sometimes instable is when only one external NTP server
is responding. Then I think the local clock on both servers do decide
to dont trus the one external NTP server. Is this possible or can this
only happen if the one external Server goes insane.
I also wonder why in such cases the drift value is 500 PPM. Is it
possible that the local clock + peer + only one external server
available is the main reason for the problem.
Why can a drift value can be determined that wrong and is it possible
that the drift value is not corrected automatically if more than one
server is available again.
Again many thanks to your contributions.