-
Time reset
The ntp log file shows when NTP steps the time. But then the potential harm
is already done. Especially if the time moves backward, our server might
have serious trouble. Is there a log event which indicates that the time is
going to be reset in order to enable us to take appropriate action before
the actual reset?
Thanks a lot,
Jan
-
Re: Time reset
jkvbe wrote:[color=blue]
> The ntp log file shows when NTP steps the time. But then the potential harm
> is already done. Especially if the time moves backward, our server might
> have serious trouble. Is there a log event which indicates that the time is
> going to be reset in order to enable us to take appropriate action before
> the actual reset?
>
> Thanks a lot,
>
> Jan
>
>[/color]
I don't know of any advance warning.
DOES the time step backward?
If ntpd is working properly it should NOT need to step the time at all
with the possible exception of a single step when ntpd is first started.
If ntpd is stepping time regularly, you have some other problem. If you
find and fix that problem, ntpd should stop stepping the time.
There are/were known issues with some Linux systems; during periods of
high disk usage, clock interrupts would be lost resulting in a FORWARD
step. AFAIK these issues were related to EIDE disks used in PIO mode
rather than DMA mode. ISTR reading that the problem has been fixed in
recent versions of Linux. YMMV
-
Re: Time reset
[color=blue]
>The ntp log file shows when NTP steps the time. But then the potential harm
>is already done. Especially if the time moves backward, our server might
>have serious trouble. Is there a log event which indicates that the time is
>going to be reset in order to enable us to take appropriate action before
>the actual reset?[/color]
I don't know of any way to get advanced warning when ntpd is about to
step the time.
There are command line switches to prevent stepping and to allow
one step at startup time.
The disadvantage with preventing steps is that it might take a long
time to correct the time. But if you start with good time your clock
will never get off far enough to cause problems.
Is there a wiki page on this topic?
--
These are my opinions, not necessarily my employer's. I hate spam.
-
Re: Time reset
Hal Murray wrote:[color=blue][color=green]
>> The ntp log file shows when NTP steps the time. But then the potential harm
>> is already done. Especially if the time moves backward, our server might
>> have serious trouble. Is there a log event which indicates that the time is
>> going to be reset in order to enable us to take appropriate action before
>> the actual reset?
>>[/color]
>
> I don't know of any way to get advanced warning when ntpd is about to
> step the time.
>
> There are command line switches to prevent stepping and to allow
> one step at startup time.
>
> The disadvantage with preventing steps is that it might take a long
> time to correct the time. But if you start with good time your clock
> will never get off far enough to cause problems.
>
> Is there a wiki page on this topic?
>[/color]
Another disadvantage with preventing steps is that it isn't really a
supported mode (because it's a "tinker") and, as I've found, it doesn't
always work. When I disable time steps on a linux 2.6.18 kernel, the
drift value goes to +/-500 and can actually swap sign from one run to
the next. This happens even though a time step was never needed (i.e.
offset never went >128ms). With time steps enabled the drift value
settles <90ppm (and again, no step actually occurs).
[color=blue]
>From what I've been able to piece together, this different behavior[/color]
between step/!step is probably due to the kernel time discipline being
disabled with !step, coupled with a (potential) bug in linux that forces
NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
an adjustment is rounded up or down). I've not verified the bug is
present in my 2.6.18 linux kernel, so don't quote me on it. One might
ask why the kernel time discipline is preemptively disabled in this
manner -- maybe there is a good reason.
Our application also does not currently handle backward time steps. Our
workaround to the problematic !step is to realize, as others on this
list have pointed out, that a time step should never occur in a normally
functioning system. If a step does occur, we probably have bigger
problems than those caused by the step itself, such as: lost timer
interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc.
Andy
-
Re: Time reset
"jkvbe" <jkvbe@NOSPAMyahoo.com> writes:
[color=blue]
>The ntp log file shows when NTP steps the time. But then the potential harm
>is already done. Especially if the time moves backward, our server might
>have serious trouble. Is there a log event which indicates that the time is
>going to be reset in order to enable us to take appropriate action before
>the actual reset?[/color]
On what kind of system? HOw big a step? ntp should NOT have to step the
time except maybe when it is started up on bootup. If it steps the time.
then there is something very wrong in your system. Find out what it is.
The only log event might be to notice that the offset is say >50ms. Use
that as your warning.
[color=blue]
>Thanks a lot,[/color]
[color=blue]
>Jan[/color]
-
Re: Time reset
[email]andy.helten@dot21rts.com[/email] (Andy Helten) writes:
[color=blue]
>Hal Murray wrote:[color=green][color=darkred]
>>> The ntp log file shows when NTP steps the time. But then the potential harm
>>> is already done. Especially if the time moves backward, our server might
>>> have serious trouble. Is there a log event which indicates that the time is
>>> going to be reset in order to enable us to take appropriate action before
>>> the actual reset?
>>>[/color]
>>
>> I don't know of any way to get advanced warning when ntpd is about to
>> step the time.
>>
>> There are command line switches to prevent stepping and to allow
>> one step at startup time.
>>
>> The disadvantage with preventing steps is that it might take a long
>> time to correct the time. But if you start with good time your clock
>> will never get off far enough to cause problems.
>>
>> Is there a wiki page on this topic?
>>[/color][/color]
[color=blue]
>Another disadvantage with preventing steps is that it isn't really a
>supported mode (because it's a "tinker") and, as I've found, it doesn't
>always work. When I disable time steps on a linux 2.6.18 kernel, the
>drift value goes to +/-500 and can actually swap sign from one run to
>the next. This happens even though a time step was never needed (i.e.
>offset never went >128ms). With time steps enabled the drift value
>settles <90ppm (and again, no step actually occurs).[/color]
That certainly sounds like a bug to me.
[color=blue][color=green]
>>From what I've been able to piece together, this different behavior[/color]
>between step/!step is probably due to the kernel time discipline being
>disabled with !step, coupled with a (potential) bug in linux that forces
>NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
>an adjustment is rounded up or down). I've not verified the bug is
>present in my 2.6.18 linux kernel, so don't quote me on it. One might
>ask why the kernel time discipline is preemptively disabled in this
>manner -- maybe there is a good reason.[/color]
AFAIK it is not the kernel that does the time step. Ie, the kernel
discipline is not what demands the step. Also, adjtime certainly does not
have a 1ms granularity.
[color=blue]
>Our application also does not currently handle backward time steps. Our
>workaround to the problematic !step is to realize, as others on this
>list have pointed out, that a time step should never occur in a normally
>functioning system. If a step does occur, we probably have bigger
>problems than those caused by the step itself, such as: lost timer
>interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc.[/color]
Yup.
[color=blue]
>Andy[/color]
-
Re: Time reset
Unruh wrote:[color=blue]
> [email]andy.helten@dot21rts.com[/email] (Andy Helten) writes:
>[color=green]
>> Another disadvantage with preventing steps is that it isn't really a
>> supported mode (because it's a "tinker") and, as I've found, it doesn't
>> always work. When I disable time steps on a linux 2.6.18 kernel, the
>> drift value goes to +/-500 and can actually swap sign from one run to
>> the next. This happens even though a time step was never needed (i.e.
>> offset never went >128ms). With time steps enabled the drift value
>> settles <90ppm (and again, no step actually occurs).
>>[/color]
>
> That certainly sounds like a bug to me.
>
>[/color]
Me too, but disabling time step is a tinker and tinkers are generally
use at your own risk. Besides, after much testing, I'm fairly certain
the problem is indeed with the kernel -- especially considering I did
not have this problem on an older kernel.
[color=blue][color=green][color=darkred]
>> >From what I've been able to piece together, this different behavior[/color]
>> between step/!step is probably due to the kernel time discipline being
>> disabled with !step, coupled with a (potential) bug in linux that forces
>> NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
>> an adjustment is rounded up or down). I've not verified the bug is
>> present in my 2.6.18 linux kernel, so don't quote me on it. One might
>> ask why the kernel time discipline is preemptively disabled in this
>> manner -- maybe there is a good reason.
>>[/color]
>
> AFAIK it is not the kernel that does the time step. Ie, the kernel
> discipline is not what demands the step. Also, adjtime certainly does not
> have a 1ms granularity.
>
>
>[/color]
That is also my understanding, that the kernel does not perform the time
step but it is the kernel that updates the system time every tick. My
understanding of the kernel time discipline is that NTP sets the size of
the update to account for the system clock drift. This mechanism is
apparently disabled when time stepping is disabled. Don't ask me why.
-
Re: Time reset
[color=blue][color=green]
>> That certainly sounds like a bug to me.[/color][/color]
[color=blue]
>Me too, but disabling time step is a tinker and tinkers are generally
>use at your own risk. Besides, after much testing, I'm fairly certain
>the problem is indeed with the kernel -- especially considering I did
>not have this problem on an older kernel.[/color]
What kernel version worked correctly? What version doesn't?
--
These are my opinions, not necessarily my employer's. I hate spam.
-
Re: Time reset
Hal Murray wrote:[color=blue][color=green][color=darkred]
>>> That certainly sounds like a bug to me.
>>>[/color][/color]
>
>[color=green]
>> Me too, but disabling time step is a tinker and tinkers are generally
>> use at your own risk. Besides, after much testing, I'm fairly certain
>> the problem is indeed with the kernel -- especially considering I did
>> not have this problem on an older kernel.
>>[/color]
>
> What kernel version worked correctly? What version doesn't?
>
>[/color]
Does _not_ work on RedHawk 4.2, linux 2.6.18.8:
Linux sbc1 2.6.18.8-RedHawk-4.2-trace #1 SMP PREEMPT Tue May 29 12:44:24
Does work on RedHat EL4, linux 2.6.9-5:
Linux ntp1 2.6.9-5.EL #1 Wed Jan 5 19:22:18 EST 2005 i686 i686 i386
GNU/Linux
Note that RedHawk is based on RedHat EL4 Update 4 (in other words, you
first install EL4 U4 and then install a small RedHawk upgrade).
Also note that this is not a new problem (OK, it's not that old either...):
[url]https://lists.ntp.org/pipermail/questions/2008-March/017722.html[/url]
We plan to upgrade to the latest version of RedHawk. I'm not sure what
kernel version is in that release, but I do know it is based on RedHat EL5.
Andy
-
Re: Time reset
Andy Helten wrote:[color=blue]
>[/color]
[color=blue][color=green][color=darkred]
>>> offset never went >128ms). With time steps enabled the drift value
>>> settles <90ppm (and again, no step actually occurs).[/color][/color][/color]
90ms is a relatively bad static frequency error; a good machine will be
around 10ms. That won't help a clean cold start.
I didn't check, but did you have the default min and maxpoll values. A
high minpoll might make it difficult to get the loop to converge from
there without overflows.
[color=blue]
>
> That is also my understanding, that the kernel does not perform the time
> step but it is the kernel that updates the system time every tick. My
> understanding of the kernel time discipline is that NTP sets the size of[/color]
The kernel time discipline is turned off if you disable steps completely
(i.e. you set the minimum error for a step to be more than half a
second or you set it to zero).
[color=blue]
>[/color]
-
Re: Time reset
David Woolley wrote:[color=blue]
> Andy Helten wrote:
>[color=green][color=darkred]
>>>> offset never went >128ms). With time steps enabled the drift value
>>>> settles <90ppm (and again, no step actually occurs).
>>>>[/color][/color]
>
> 90ms is a relatively bad static frequency error; a good machine will be
> around 10ms. That won't help a clean cold start.
>
> I didn't check, but did you have the default min and maxpoll values. A
> high minpoll might make it difficult to get the loop to converge from
> there without overflows.
>[/color]
My current problem is that drift settles at 82ppm (what I called <90 in
previous email) in one run and then 32ppm in another run (with a reboot
between). This is similar to the problem I had with stepping disabled
where drift would go from +500ppm in one run and then swing all the way
to -500ppm in another run (usually with a reboot between). I am not
going to spend another minute troubleshooting this problem until we get
an updated linux kernel. I will dig into it more deeply if the new
kernel exhibits this same drift instability.
Our system is considered "real-time" and thus has many constraints on
it, namely that it will run in an isolated environment with no Internet
connection. Our setup runs one machine with NTP as a local stratum 1
server using an IRIG-B time source. On that machine I have minpoll set
to the lowest (16 seconds). I had to do this so that NTP would begin
serving sync requests in a reasonable amount. Startup time is another
constraint and we have other boards running as NTP clients that must
sync with the NTP server before they can finish initialization. I don't
set maxpoll on the server because I've never caught the server changing
the polling interval from 16 seconds -- maybe it's a reference clock
feature.
All other boards in the system run as NTP clients and I use "minpoll 5
maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
think the idea was to improve NTP reaction time to changes in the
"synchronization environment". I'm not sure whether those poll settings
achieve that, but it sounds like you are suggesting a lower minpoll may
speed convergence in cases of higher drift.
Andy
-
Re: Time reset
[email]andy.helten@dot21rts.com[/email] (Andy Helten) writes:
[color=blue]
>David Woolley wrote:[color=green]
>> Andy Helten wrote:
>>[color=darkred]
>>>>> offset never went >128ms). With time steps enabled the drift value
>>>>> settles <90ppm (and again, no step actually occurs).
>>>>>[/color]
>>
>> 90ms is a relatively bad static frequency error; a good machine will be
>> around 10ms. That won't help a clean cold start.
>>
>> I didn't check, but did you have the default min and maxpoll values. A
>> high minpoll might make it difficult to get the loop to converge from
>> there without overflows.
>>[/color][/color]
[color=blue]
>My current problem is that drift settles at 82ppm (what I called <90 in
>previous email) in one run and then 32ppm in another run (with a reboot
>between). This is similar to the problem I had with stepping disabled
>where drift would go from +500ppm in one run and then swing all the way
>to -500ppm in another run (usually with a reboot between). I am not
>going to spend another minute troubleshooting this problem until we get
>an updated linux kernel. I will dig into it more deeply if the new
>kernel exhibits this same drift instability.[/color]
That is an incredibly unstable clock. It is hard to imagine that this is a
kernel problem. This is on one of your machines? It is not the server
connected to the IRIG-B is it?
[color=blue]
>Our system is considered "real-time" and thus has many constraints on
>it, namely that it will run in an isolated environment with no Internet
>connection. Our setup runs one machine with NTP as a local stratum 1
>server using an IRIG-B time source. On that machine I have minpoll set[/color]
No need for internet if you have a local clock.
[color=blue]
>to the lowest (16 seconds). I had to do this so that NTP would begin
>serving sync requests in a reasonable amount. Startup time is another
>constraint and we have other boards running as NTP clients that must
>sync with the NTP server before they can finish initialization. I don't
>set maxpoll on the server because I've never caught the server changing
>the polling interval from 16 seconds -- maybe it's a reference clock
>feature.[/color]
[color=blue]
>All other boards in the system run as NTP clients and I use "minpoll 5
>maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
>think the idea was to improve NTP reaction time to changes in the
>"synchronization environment". I'm not sure whether those poll settings
>achieve that, but it sounds like you are suggesting a lower minpoll may
>speed convergence in cases of higher drift.[/color]
No. He meant if you had minpoll say 8 or 10 it would make settling down
long if the ssytem did not start with a good drift value.
However, even minpoll 5 means one data sample every 4 hours roughly(since
ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
convergence. And even minpoll 4, the minimum, is only one sample every 2
hrs.
-
Re: Time reset
Unruh wrote:[color=blue]
> [email]andy.helten@dot21rts.com[/email] (Andy Helten) writes:
>[color=green]
>> My current problem is that drift settles at 82ppm (what I called <90 in
>> previous email) in one run and then 32ppm in another run (with a reboot
>> between). This is similar to the problem I had with stepping disabled
>> where drift would go from +500ppm in one run and then swing all the way
>> to -500ppm in another run (usually with a reboot between). I am not
>> going to spend another minute troubleshooting this problem until we get
>> an updated linux kernel. I will dig into it more deeply if the new
>> kernel exhibits this same drift instability.
>>[/color]
>
>
> That is an incredibly unstable clock. It is hard to imagine that this is a
> kernel problem. This is on one of your machines? It is not the server
> connected to the IRIG-B is it?
>
>[/color]
I'm fairly certain the board's oscillator is stable. I wrote a simple
perl script that keyed of a PPS print from a GPS-to-IRIGB box. When the
PPS time was printed, I grabbed local system time as well as IRIGB time
from the local IRIGB PMC. Using this approach, the system oscillator's
drift (without NTP running) was measured to be within the +/-30ppm
oscillator specifications. This procedure was reliable over several
runs and was repeated on at least one other board with an IRIG-B receiver.
Yes, there is a potential for problems in many different areas within
this setup, however, after much troubleshooting to isolate the problem,
the 2.6.18 kernel has always been involved in the non-working
configuration. An older kernel worked fine with the same IRIG-B driver,
the same version of NTP, but different hardware, so I haven't completely
exonerated the hardware. At any rate, this has been put on the back
burner until we can get the latest RedHawk release, which isn't due
until mid April.
[color=blue][color=green]
>> All other boards in the system run as NTP clients and I use "minpoll 5
>> maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
>> think the idea was to improve NTP reaction time to changes in the
>> "synchronization environment". I'm not sure whether those poll settings
>> achieve that, but it sounds like you are suggesting a lower minpoll may
>> speed convergence in cases of higher drift.
>>[/color]
>
> No. He meant if you had minpoll say 8 or 10 it would make settling down
> long if the ssytem did not start with a good drift value.
> However, even minpoll 5 means one data sample every 4 hours roughly(since
> ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
> convergence. And even minpoll 4, the minimum, is only one sample every 2
> hrs.
>
>[/color]
Hmmm, clearly the more I learn about NTP, the less I know.
Andy
-
Re: Time reset
[color=blue]
>My current problem is that drift settles at 82ppm (what I called <90 in
>previous email) in one run and then 32ppm in another run (with a reboot
>between). This is similar to the problem I had with stepping disabled
>where drift would go from +500ppm in one run and then swing all the way
>to -500ppm in another run (usually with a reboot between). I am not
>going to spend another minute troubleshooting this problem until we get
>an updated linux kernel. I will dig into it more deeply if the new
>kernel exhibits this same drift instability.[/color]
I think we are talking about two different bugs here.
The different drifts on reboot are due to a quirk in the tsc
calibration code in the kernal. Grep your sys log for messages
like these:
Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
Those bottom bits jumping arround correspond to the different
drift values.
If you only have one system, you can pick one and hack your
kernel to smash in a constant value at the right place.
Or you can add something like this to your boot line:
clocksource=acpi_pm
That's assuming your hardware has acpi and whatever.
I've been using it for a while. I haven't noticed any quirks,
but who knows.
--
These are my opinions, not necessarily my employer's. I hate spam.
-
Re: Time reset
Unruh wrote:[color=blue]
> [email]andy.helten@dot21rts.com[/email] (Andy Helten) writes:
>
>
>[color=green]
>>David Woolley wrote:
>>[color=darkred]
>>>Andy Helten wrote:
>>>
>>>
>>>>>>offset never went >128ms). With time steps enabled the drift value
>>>>>>settles <90ppm (and again, no step actually occurs).
>>>>>>
>>>>>
>>>90ms is a relatively bad static frequency error; a good machine will be
>>>around 10ms. That won't help a clean cold start.
>>>
>>>I didn't check, but did you have the default min and maxpoll values. A
>>>high minpoll might make it difficult to get the loop to converge from
>>>there without overflows.
>>>[/color]
>>[/color]
>[color=green]
>>My current problem is that drift settles at 82ppm (what I called <90 in
>>previous email) in one run and then 32ppm in another run (with a reboot
>>between). This is similar to the problem I had with stepping disabled
>>where drift would go from +500ppm in one run and then swing all the way
>>to -500ppm in another run (usually with a reboot between). I am not
>>going to spend another minute troubleshooting this problem until we get
>>an updated linux kernel. I will dig into it more deeply if the new
>>kernel exhibits this same drift instability.[/color]
>
>
>
> That is an incredibly unstable clock. It is hard to imagine that this is a
> kernel problem. This is on one of your machines? It is not the server
> connected to the IRIG-B is it?
>
>[color=green]
>>Our system is considered "real-time" and thus has many constraints on
>>it, namely that it will run in an isolated environment with no Internet
>>connection. Our setup runs one machine with NTP as a local stratum 1
>>server using an IRIG-B time source. On that machine I have minpoll set[/color]
>
>
> No need for internet if you have a local clock.
>
>
>[color=green]
>>to the lowest (16 seconds). I had to do this so that NTP would begin
>>serving sync requests in a reasonable amount. Startup time is another
>>constraint and we have other boards running as NTP clients that must
>>sync with the NTP server before they can finish initialization. I don't
>>set maxpoll on the server because I've never caught the server changing
>>the polling interval from 16 seconds -- maybe it's a reference clock
>>feature.[/color]
>
>[color=green]
>>All other boards in the system run as NTP clients and I use "minpoll 5
>>maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
>>think the idea was to improve NTP reaction time to changes in the
>>"synchronization environment". I'm not sure whether those poll settings
>>achieve that, but it sounds like you are suggesting a lower minpoll may
>>speed convergence in cases of higher drift.[/color]
>
>
> No. He meant if you had minpoll say 8 or 10 it would make settling down
> long if the ssytem did not start with a good drift value.
> However, even minpoll 5 means one data sample every 4 hours roughly(since
> ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
> convergence. And even minpoll 4, the minimum, is only one sample every 2
> hrs.
>
>[/color]
I must be missing something! Minpoll=5 means 2^5 seconds is the minimum
poll interval. How are you getting to every four hours from that? ISTR
that the default minpoll is 6 which gives 2^6 or 64 seconds.
If the server lines in ntp.conf include the "iburst" keyword, the
servers will be polled with an initial burst of eight requests sent at
two second intervals. This fills the pipeline and "pacifies" the
filter. Thereafter, ntpd adjusts the polling interval as it thinks
best. Normally the poll interval will increase to somewhere between 256
and 1024 seconds once the clock is synchronized. In general, the better
the network connection the higher the maximum poll interval.
It's interesting to watch the performance of ntpd improve as the network
quiets down during the hours when most people sleep!
-
Re: Time reset
On 3 apr, 23:10, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:
....[color=blue]
>
> DOES the time step backward?
>
> If ntpd is working properly it should NOT need to step the time at all
> with the possible exception of a single step when ntpd is first started.
>
> If ntpd is stepping time regularly, you have some other problem. If you
> find and fix that problem, ntpd should stop stepping the time.
>
> There are/were known issues with some Linux systems; during periods of
> high disk usage, clock interrupts would be lost resulting in a FORWARD
> step. AFAIK these issues were related to EIDE disks used in PIO mode
> rather than DMA mode. ISTR reading that the problem has been fixed in
> recent versions of Linux. YMMV[/color]
I agree that ntpd should not stepping time regularly and that it
points to a problem if it happens regularly. But we develop an
appliance and we don't control how customers deploy it. Given the
adverse effects of stepping time (especially if it moves backwards),
I'd would have liked to be protected against badly set-up NTP
infrastructure or time servers that are compromised.
Jan
-
Re: Time reset
jkvbe wrote:[color=blue]
> On 3 apr, 23:10, "Richard B. Gilbert" <rgilber...@comcast.net> wrote:
> ...
>[color=green]
>>DOES the time step backward?
>>
>>If ntpd is working properly it should NOT need to step the time at all
>>with the possible exception of a single step when ntpd is first started.
>>
>>If ntpd is stepping time regularly, you have some other problem. If you
>>find and fix that problem, ntpd should stop stepping the time.
>>
>>There are/were known issues with some Linux systems; during periods of
>>high disk usage, clock interrupts would be lost resulting in a FORWARD
>>step. AFAIK these issues were related to EIDE disks used in PIO mode
>>rather than DMA mode. ISTR reading that the problem has been fixed in
>>recent versions of Linux. YMMV[/color]
>
>
> I agree that ntpd should not stepping time regularly and that it
> points to a problem if it happens regularly. But we develop an
> appliance and we don't control how customers deploy it. Given the
> adverse effects of stepping time (especially if it moves backwards),
> I'd would have liked to be protected against badly set-up NTP
> infrastructure or time servers that are compromised.
>
> Jan[/color]
It seems to me that, in the circumstance you describe, supplying correct
time is the customer's problem!
Having read this newsgroup for the last four or five years, I'm aware
that people do some very strange things with computer clocks. I'm
thinking, in particlar, of at least one individual who deliberately set
his clock to an incorrect time in order to see if Ntpd would correct it.
Ntpd did so, of course, but he was not happy with the way it was done or
the amount of time it took!
If it's not under your control, it's not your responsibilty! Your
instructions for the appliance should point this out pretty explicitly;
e.g. "IF YOUR TIME SERVERS CAUSE TIME TO STEP, THE FOLLOWING ADVERSE
CONSEQUENCES CAN BE EXPECTED TO OCCUR: <list of adverse consequences>
It is YOUR responsibility to ensure that this does not happen!"
The only halfway legitimate thing I can think of that would cause time
to step would be a leap second.
-
Re: Time reset
Richard B. Gilbert wrote:
[color=blue]
> that people do some very strange things with computer clocks. I'm
> thinking, in particlar, of at least one individual who deliberately set
> his clock to an incorrect time in order to see if Ntpd would correct it.[/color]
Many people do this. It is the naive users' way of testing that ntpd
"works".
[color=blue]
> Ntpd did so, of course, but he was not happy with the way it was done or
> the amount of time it took![/color]
-
Re: Time reset
Hal wrote:
[color=blue][color=green]
>> My current problem is that drift settles at 82ppm (what I called <90 in
>> previous email) in one run and then 32ppm in another run (with a reboot
>> between). This is similar to the problem I had with stepping disabled
>> where drift would go from +500ppm in one run and then swing all the way
>> to -500ppm in another run (usually with a reboot between). I am not
>> going to spend another minute troubleshooting this problem until we get
>> an updated linux kernel. I will dig into it more deeply if the new
>> kernel exhibits this same drift instability.
>>[/color]
>
> I think we are talking about two different bugs here.
>
> The different drifts on reboot are due to a quirk in the tsc
> calibration code in the kernal. Grep your sys log for messages
> like these:
> Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
> Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
> Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
> Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
> Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
> Those bottom bits jumping arround correspond to the different
> drift values.
>
> If you only have one system, you can pick one and hack your
> kernel to smash in a constant value at the right place.
>
> Or you can add something like this to your boot line:
> clocksource=acpi_pm
> That's assuming your hardware has acpi and whatever.
>
> I've been using it for a while. I haven't noticed any quirks,
> but who knows.
>
>[/color]
YES! The slight variation in measured CPU speed seems to explain my
continued drift instability (where "continued" means even with stepping
enabled). I was able to retrieve four CPU speed measurements that had
corresponding NTP loop logs. The table below shows the perfect
correlation between linux-measured CPU speed and NTP-measured drift.
Clearly the "real" CPU speed is somewhere around 2000.200 MHz.
measured CPU speed | measured drift
(MHz) | (ppm)
---------------------------------------
2000.153 | -23
2000.215 | 8
2000.321 | 61
2000.367 | 84
As I've stated before, I don't believe the oscillator is really this
unstable, but I could be wrong. After all, my CPU measurements varied
much more than yours, especially from one run to the next. However, I'm
still open to the possibility that linux's approach to speed measurement
is less than perfect (at least for my version of linux). These
measurements were on a core 2 duo (2 processors) running RedHawk linux
2.6.18.8. Hal, can you tell me which version of linux resulted in your
list of speed measurements?
I also wonder if the use of two processors has any impact on this
behavior. I tried forcing CPU affinity for the NTP process, but it
didn't have any effect on the measured drift value. This means that
either there truly is no difference between CPUs (as in different
speed/frequency characteristics) or I wasn't actually moving the process
between CPUs (using /proc/<pid>/affinity). I'm assuming both CPUs have
the same oscillator, so it makes sense that they would measure the same
drift.
Thanks,
Andy
-
Re: Time reset
[color=blue]
>As I've stated before, I don't believe the oscillator is really this
>unstable, but I could be wrong. After all, my CPU measurements varied
>much more than yours, especially from one run to the next. However, I'm
>still open to the possibility that linux's approach to speed measurement
>is less than perfect (at least for my version of linux). These
>measurements were on a core 2 duo (2 processors) running RedHawk linux
>2.6.18.8. Hal, can you tell me which version of linux resulted in your
>list of speed measurements?[/color]
Your crystal is probably fine.
At one point, I hacked my kernel to call the calibration routine
several times and printout the answer. A batch of answers from
the same time (and hopefully same temperature) had the same sort
of scatter.
I'm running 2.6.23 wih a few local hacks. 2.6.19 has similar problems.
--
These are my opinions, not necessarily my employer's. I hate spam.