Time reset - NTP

This is a discussion on Time reset - NTP ; The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 35

Thread: Time reset

  1. Time reset


    The ntp log file shows when NTP steps the time. But then the potential harm
    is already done. Especially if the time moves backward, our server might
    have serious trouble. Is there a log event which indicates that the time is
    going to be reset in order to enable us to take appropriate action before
    the actual reset?

    Thanks a lot,

    Jan



  2. Re: Time reset

    jkvbe wrote:
    > The ntp log file shows when NTP steps the time. But then the potential harm
    > is already done. Especially if the time moves backward, our server might
    > have serious trouble. Is there a log event which indicates that the time is
    > going to be reset in order to enable us to take appropriate action before
    > the actual reset?
    >
    > Thanks a lot,
    >
    > Jan
    >
    >


    I don't know of any advance warning.

    DOES the time step backward?

    If ntpd is working properly it should NOT need to step the time at all
    with the possible exception of a single step when ntpd is first started.

    If ntpd is stepping time regularly, you have some other problem. If you
    find and fix that problem, ntpd should stop stepping the time.

    There are/were known issues with some Linux systems; during periods of
    high disk usage, clock interrupts would be lost resulting in a FORWARD
    step. AFAIK these issues were related to EIDE disks used in PIO mode
    rather than DMA mode. ISTR reading that the problem has been fixed in
    recent versions of Linux. YMMV



  3. Re: Time reset


    >The ntp log file shows when NTP steps the time. But then the potential harm
    >is already done. Especially if the time moves backward, our server might
    >have serious trouble. Is there a log event which indicates that the time is
    >going to be reset in order to enable us to take appropriate action before
    >the actual reset?


    I don't know of any way to get advanced warning when ntpd is about to
    step the time.

    There are command line switches to prevent stepping and to allow
    one step at startup time.

    The disadvantage with preventing steps is that it might take a long
    time to correct the time. But if you start with good time your clock
    will never get off far enough to cause problems.

    Is there a wiki page on this topic?

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  4. Re: Time reset


    Hal Murray wrote:
    >> The ntp log file shows when NTP steps the time. But then the potential harm
    >> is already done. Especially if the time moves backward, our server might
    >> have serious trouble. Is there a log event which indicates that the time is
    >> going to be reset in order to enable us to take appropriate action before
    >> the actual reset?
    >>

    >
    > I don't know of any way to get advanced warning when ntpd is about to
    > step the time.
    >
    > There are command line switches to prevent stepping and to allow
    > one step at startup time.
    >
    > The disadvantage with preventing steps is that it might take a long
    > time to correct the time. But if you start with good time your clock
    > will never get off far enough to cause problems.
    >
    > Is there a wiki page on this topic?
    >


    Another disadvantage with preventing steps is that it isn't really a
    supported mode (because it's a "tinker") and, as I've found, it doesn't
    always work. When I disable time steps on a linux 2.6.18 kernel, the
    drift value goes to +/-500 and can actually swap sign from one run to
    the next. This happens even though a time step was never needed (i.e.
    offset never went >128ms). With time steps enabled the drift value
    settles <90ppm (and again, no step actually occurs).

    >From what I've been able to piece together, this different behavior

    between step/!step is probably due to the kernel time discipline being
    disabled with !step, coupled with a (potential) bug in linux that forces
    NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
    an adjustment is rounded up or down). I've not verified the bug is
    present in my 2.6.18 linux kernel, so don't quote me on it. One might
    ask why the kernel time discipline is preemptively disabled in this
    manner -- maybe there is a good reason.

    Our application also does not currently handle backward time steps. Our
    workaround to the problematic !step is to realize, as others on this
    list have pointed out, that a time step should never occur in a normally
    functioning system. If a step does occur, we probably have bigger
    problems than those caused by the step itself, such as: lost timer
    interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc.

    Andy

  5. Re: Time reset

    "jkvbe" writes:


    >The ntp log file shows when NTP steps the time. But then the potential harm
    >is already done. Especially if the time moves backward, our server might
    >have serious trouble. Is there a log event which indicates that the time is
    >going to be reset in order to enable us to take appropriate action before
    >the actual reset?


    On what kind of system? HOw big a step? ntp should NOT have to step the
    time except maybe when it is started up on bootup. If it steps the time.
    then there is something very wrong in your system. Find out what it is.
    The only log event might be to notice that the offset is say >50ms. Use
    that as your warning.


    >Thanks a lot,


    >Jan




  6. Re: Time reset

    andy.helten@dot21rts.com (Andy Helten) writes:


    >Hal Murray wrote:
    >>> The ntp log file shows when NTP steps the time. But then the potential harm
    >>> is already done. Especially if the time moves backward, our server might
    >>> have serious trouble. Is there a log event which indicates that the time is
    >>> going to be reset in order to enable us to take appropriate action before
    >>> the actual reset?
    >>>

    >>
    >> I don't know of any way to get advanced warning when ntpd is about to
    >> step the time.
    >>
    >> There are command line switches to prevent stepping and to allow
    >> one step at startup time.
    >>
    >> The disadvantage with preventing steps is that it might take a long
    >> time to correct the time. But if you start with good time your clock
    >> will never get off far enough to cause problems.
    >>
    >> Is there a wiki page on this topic?
    >>


    >Another disadvantage with preventing steps is that it isn't really a
    >supported mode (because it's a "tinker") and, as I've found, it doesn't
    >always work. When I disable time steps on a linux 2.6.18 kernel, the
    >drift value goes to +/-500 and can actually swap sign from one run to
    >the next. This happens even though a time step was never needed (i.e.
    >offset never went >128ms). With time steps enabled the drift value
    >settles <90ppm (and again, no step actually occurs).


    That certainly sounds like a bug to me.



    >>From what I've been able to piece together, this different behavior

    >between step/!step is probably due to the kernel time discipline being
    >disabled with !step, coupled with a (potential) bug in linux that forces
    >NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
    >an adjustment is rounded up or down). I've not verified the bug is
    >present in my 2.6.18 linux kernel, so don't quote me on it. One might
    >ask why the kernel time discipline is preemptively disabled in this
    >manner -- maybe there is a good reason.


    AFAIK it is not the kernel that does the time step. Ie, the kernel
    discipline is not what demands the step. Also, adjtime certainly does not
    have a 1ms granularity.



    >Our application also does not currently handle backward time steps. Our
    >workaround to the problematic !step is to realize, as others on this
    >list have pointed out, that a time step should never occur in a normally
    >functioning system. If a step does occur, we probably have bigger
    >problems than those caused by the step itself, such as: lost timer
    >interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc.


    Yup.


    >Andy


  7. Re: Time reset



    Unruh wrote:
    > andy.helten@dot21rts.com (Andy Helten) writes:
    >
    >> Another disadvantage with preventing steps is that it isn't really a
    >> supported mode (because it's a "tinker") and, as I've found, it doesn't
    >> always work. When I disable time steps on a linux 2.6.18 kernel, the
    >> drift value goes to +/-500 and can actually swap sign from one run to
    >> the next. This happens even though a time step was never needed (i.e.
    >> offset never went >128ms). With time steps enabled the drift value
    >> settles <90ppm (and again, no step actually occurs).
    >>

    >
    > That certainly sounds like a bug to me.
    >
    >


    Me too, but disabling time step is a tinker and tinkers are generally
    use at your own risk. Besides, after much testing, I'm fairly certain
    the problem is indeed with the kernel -- especially considering I did
    not have this problem on an older kernel.

    >> >From what I've been able to piece together, this different behavior

    >> between step/!step is probably due to the kernel time discipline being
    >> disabled with !step, coupled with a (potential) bug in linux that forces
    >> NTP's "manual" adjustments to have a granularity of 1ms (i.e. somewhere
    >> an adjustment is rounded up or down). I've not verified the bug is
    >> present in my 2.6.18 linux kernel, so don't quote me on it. One might
    >> ask why the kernel time discipline is preemptively disabled in this
    >> manner -- maybe there is a good reason.
    >>

    >
    > AFAIK it is not the kernel that does the time step. Ie, the kernel
    > discipline is not what demands the step. Also, adjtime certainly does not
    > have a 1ms granularity.
    >
    >
    >


    That is also my understanding, that the kernel does not perform the time
    step but it is the kernel that updates the system time every tick. My
    understanding of the kernel time discipline is that NTP sets the size of
    the update to account for the system clock drift. This mechanism is
    apparently disabled when time stepping is disabled. Don't ask me why.

  8. Re: Time reset


    >> That certainly sounds like a bug to me.


    >Me too, but disabling time step is a tinker and tinkers are generally
    >use at your own risk. Besides, after much testing, I'm fairly certain
    >the problem is indeed with the kernel -- especially considering I did
    >not have this problem on an older kernel.


    What kernel version worked correctly? What version doesn't?

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  9. Re: Time reset


    Hal Murray wrote:
    >>> That certainly sounds like a bug to me.
    >>>

    >
    >
    >> Me too, but disabling time step is a tinker and tinkers are generally
    >> use at your own risk. Besides, after much testing, I'm fairly certain
    >> the problem is indeed with the kernel -- especially considering I did
    >> not have this problem on an older kernel.
    >>

    >
    > What kernel version worked correctly? What version doesn't?
    >
    >


    Does _not_ work on RedHawk 4.2, linux 2.6.18.8:
    Linux sbc1 2.6.18.8-RedHawk-4.2-trace #1 SMP PREEMPT Tue May 29 12:44:24

    Does work on RedHat EL4, linux 2.6.9-5:
    Linux ntp1 2.6.9-5.EL #1 Wed Jan 5 19:22:18 EST 2005 i686 i686 i386
    GNU/Linux


    Note that RedHawk is based on RedHat EL4 Update 4 (in other words, you
    first install EL4 U4 and then install a small RedHawk upgrade).

    Also note that this is not a new problem (OK, it's not that old either...):

    https://lists.ntp.org/pipermail/ques...ch/017722.html

    We plan to upgrade to the latest version of RedHawk. I'm not sure what
    kernel version is in that release, but I do know it is based on RedHat EL5.

    Andy

  10. Re: Time reset

    Andy Helten wrote:
    >


    >>> offset never went >128ms). With time steps enabled the drift value
    >>> settles <90ppm (and again, no step actually occurs).


    90ms is a relatively bad static frequency error; a good machine will be
    around 10ms. That won't help a clean cold start.

    I didn't check, but did you have the default min and maxpoll values. A
    high minpoll might make it difficult to get the loop to converge from
    there without overflows.

    >
    > That is also my understanding, that the kernel does not perform the time
    > step but it is the kernel that updates the system time every tick. My
    > understanding of the kernel time discipline is that NTP sets the size of


    The kernel time discipline is turned off if you disable steps completely
    (i.e. you set the minimum error for a step to be more than half a
    second or you set it to zero).

    >


  11. Re: Time reset


    David Woolley wrote:
    > Andy Helten wrote:
    >
    >>>> offset never went >128ms). With time steps enabled the drift value
    >>>> settles <90ppm (and again, no step actually occurs).
    >>>>

    >
    > 90ms is a relatively bad static frequency error; a good machine will be
    > around 10ms. That won't help a clean cold start.
    >
    > I didn't check, but did you have the default min and maxpoll values. A
    > high minpoll might make it difficult to get the loop to converge from
    > there without overflows.
    >


    My current problem is that drift settles at 82ppm (what I called <90 in
    previous email) in one run and then 32ppm in another run (with a reboot
    between). This is similar to the problem I had with stepping disabled
    where drift would go from +500ppm in one run and then swing all the way
    to -500ppm in another run (usually with a reboot between). I am not
    going to spend another minute troubleshooting this problem until we get
    an updated linux kernel. I will dig into it more deeply if the new
    kernel exhibits this same drift instability.

    Our system is considered "real-time" and thus has many constraints on
    it, namely that it will run in an isolated environment with no Internet
    connection. Our setup runs one machine with NTP as a local stratum 1
    server using an IRIG-B time source. On that machine I have minpoll set
    to the lowest (16 seconds). I had to do this so that NTP would begin
    serving sync requests in a reasonable amount. Startup time is another
    constraint and we have other boards running as NTP clients that must
    sync with the NTP server before they can finish initialization. I don't
    set maxpoll on the server because I've never caught the server changing
    the polling interval from 16 seconds -- maybe it's a reference clock
    feature.

    All other boards in the system run as NTP clients and I use "minpoll 5
    maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
    think the idea was to improve NTP reaction time to changes in the
    "synchronization environment". I'm not sure whether those poll settings
    achieve that, but it sounds like you are suggesting a lower minpoll may
    speed convergence in cases of higher drift.

    Andy

  12. Re: Time reset

    andy.helten@dot21rts.com (Andy Helten) writes:


    >David Woolley wrote:
    >> Andy Helten wrote:
    >>
    >>>>> offset never went >128ms). With time steps enabled the drift value
    >>>>> settles <90ppm (and again, no step actually occurs).
    >>>>>

    >>
    >> 90ms is a relatively bad static frequency error; a good machine will be
    >> around 10ms. That won't help a clean cold start.
    >>
    >> I didn't check, but did you have the default min and maxpoll values. A
    >> high minpoll might make it difficult to get the loop to converge from
    >> there without overflows.
    >>


    >My current problem is that drift settles at 82ppm (what I called <90 in
    >previous email) in one run and then 32ppm in another run (with a reboot
    >between). This is similar to the problem I had with stepping disabled
    >where drift would go from +500ppm in one run and then swing all the way
    >to -500ppm in another run (usually with a reboot between). I am not
    >going to spend another minute troubleshooting this problem until we get
    >an updated linux kernel. I will dig into it more deeply if the new
    >kernel exhibits this same drift instability.



    That is an incredibly unstable clock. It is hard to imagine that this is a
    kernel problem. This is on one of your machines? It is not the server
    connected to the IRIG-B is it?

    >Our system is considered "real-time" and thus has many constraints on
    >it, namely that it will run in an isolated environment with no Internet
    >connection. Our setup runs one machine with NTP as a local stratum 1
    >server using an IRIG-B time source. On that machine I have minpoll set


    No need for internet if you have a local clock.


    >to the lowest (16 seconds). I had to do this so that NTP would begin
    >serving sync requests in a reasonable amount. Startup time is another
    >constraint and we have other boards running as NTP clients that must
    >sync with the NTP server before they can finish initialization. I don't
    >set maxpoll on the server because I've never caught the server changing
    >the polling interval from 16 seconds -- maybe it's a reference clock
    >feature.


    >All other boards in the system run as NTP clients and I use "minpoll 5
    >maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
    >think the idea was to improve NTP reaction time to changes in the
    >"synchronization environment". I'm not sure whether those poll settings
    >achieve that, but it sounds like you are suggesting a lower minpoll may
    >speed convergence in cases of higher drift.


    No. He meant if you had minpoll say 8 or 10 it would make settling down
    long if the ssytem did not start with a good drift value.
    However, even minpoll 5 means one data sample every 4 hours roughly(since
    ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
    convergence. And even minpoll 4, the minimum, is only one sample every 2
    hrs.



  13. Re: Time reset


    Unruh wrote:
    > andy.helten@dot21rts.com (Andy Helten) writes:
    >
    >> My current problem is that drift settles at 82ppm (what I called <90 in
    >> previous email) in one run and then 32ppm in another run (with a reboot
    >> between). This is similar to the problem I had with stepping disabled
    >> where drift would go from +500ppm in one run and then swing all the way
    >> to -500ppm in another run (usually with a reboot between). I am not
    >> going to spend another minute troubleshooting this problem until we get
    >> an updated linux kernel. I will dig into it more deeply if the new
    >> kernel exhibits this same drift instability.
    >>

    >
    >
    > That is an incredibly unstable clock. It is hard to imagine that this is a
    > kernel problem. This is on one of your machines? It is not the server
    > connected to the IRIG-B is it?
    >
    >


    I'm fairly certain the board's oscillator is stable. I wrote a simple
    perl script that keyed of a PPS print from a GPS-to-IRIGB box. When the
    PPS time was printed, I grabbed local system time as well as IRIGB time
    from the local IRIGB PMC. Using this approach, the system oscillator's
    drift (without NTP running) was measured to be within the +/-30ppm
    oscillator specifications. This procedure was reliable over several
    runs and was repeated on at least one other board with an IRIG-B receiver.

    Yes, there is a potential for problems in many different areas within
    this setup, however, after much troubleshooting to isolate the problem,
    the 2.6.18 kernel has always been involved in the non-working
    configuration. An older kernel worked fine with the same IRIG-B driver,
    the same version of NTP, but different hardware, so I haven't completely
    exonerated the hardware. At any rate, this has been put on the back
    burner until we can get the latest RedHawk release, which isn't due
    until mid April.

    >> All other boards in the system run as NTP clients and I use "minpoll 5
    >> maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
    >> think the idea was to improve NTP reaction time to changes in the
    >> "synchronization environment". I'm not sure whether those poll settings
    >> achieve that, but it sounds like you are suggesting a lower minpoll may
    >> speed convergence in cases of higher drift.
    >>

    >
    > No. He meant if you had minpoll say 8 or 10 it would make settling down
    > long if the ssytem did not start with a good drift value.
    > However, even minpoll 5 means one data sample every 4 hours roughly(since
    > ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
    > convergence. And even minpoll 4, the minimum, is only one sample every 2
    > hrs.
    >
    >


    Hmmm, clearly the more I learn about NTP, the less I know.

    Andy

  14. Re: Time reset


    >My current problem is that drift settles at 82ppm (what I called <90 in
    >previous email) in one run and then 32ppm in another run (with a reboot
    >between). This is similar to the problem I had with stepping disabled
    >where drift would go from +500ppm in one run and then swing all the way
    >to -500ppm in another run (usually with a reboot between). I am not
    >going to spend another minute troubleshooting this problem until we get
    >an updated linux kernel. I will dig into it more deeply if the new
    >kernel exhibits this same drift instability.


    I think we are talking about two different bugs here.

    The different drifts on reboot are due to a quirk in the tsc
    calibration code in the kernal. Grep your sys log for messages
    like these:
    Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
    Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
    Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
    Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
    Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
    Those bottom bits jumping arround correspond to the different
    drift values.

    If you only have one system, you can pick one and hack your
    kernel to smash in a constant value at the right place.

    Or you can add something like this to your boot line:
    clocksource=acpi_pm
    That's assuming your hardware has acpi and whatever.

    I've been using it for a while. I haven't noticed any quirks,
    but who knows.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  15. Re: Time reset

    Unruh wrote:
    > andy.helten@dot21rts.com (Andy Helten) writes:
    >
    >
    >
    >>David Woolley wrote:
    >>
    >>>Andy Helten wrote:
    >>>
    >>>
    >>>>>>offset never went >128ms). With time steps enabled the drift value
    >>>>>>settles <90ppm (and again, no step actually occurs).
    >>>>>>
    >>>>>
    >>>90ms is a relatively bad static frequency error; a good machine will be
    >>>around 10ms. That won't help a clean cold start.
    >>>
    >>>I didn't check, but did you have the default min and maxpoll values. A
    >>>high minpoll might make it difficult to get the loop to converge from
    >>>there without overflows.
    >>>

    >>

    >
    >>My current problem is that drift settles at 82ppm (what I called <90 in
    >>previous email) in one run and then 32ppm in another run (with a reboot
    >>between). This is similar to the problem I had with stepping disabled
    >>where drift would go from +500ppm in one run and then swing all the way
    >>to -500ppm in another run (usually with a reboot between). I am not
    >>going to spend another minute troubleshooting this problem until we get
    >>an updated linux kernel. I will dig into it more deeply if the new
    >>kernel exhibits this same drift instability.

    >
    >
    >
    > That is an incredibly unstable clock. It is hard to imagine that this is a
    > kernel problem. This is on one of your machines? It is not the server
    > connected to the IRIG-B is it?
    >
    >
    >>Our system is considered "real-time" and thus has many constraints on
    >>it, namely that it will run in an isolated environment with no Internet
    >>connection. Our setup runs one machine with NTP as a local stratum 1
    >>server using an IRIG-B time source. On that machine I have minpoll set

    >
    >
    > No need for internet if you have a local clock.
    >
    >
    >
    >>to the lowest (16 seconds). I had to do this so that NTP would begin
    >>serving sync requests in a reasonable amount. Startup time is another
    >>constraint and we have other boards running as NTP clients that must
    >>sync with the NTP server before they can finish initialization. I don't
    >>set maxpoll on the server because I've never caught the server changing
    >>the polling interval from 16 seconds -- maybe it's a reference clock
    >>feature.

    >
    >
    >>All other boards in the system run as NTP clients and I use "minpoll 5
    >>maxpoll 9" for them. I'm not 100% sure why I chose those values, but I
    >>think the idea was to improve NTP reaction time to changes in the
    >>"synchronization environment". I'm not sure whether those poll settings
    >>achieve that, but it sounds like you are suggesting a lower minpoll may
    >>speed convergence in cases of higher drift.

    >
    >
    > No. He meant if you had minpoll say 8 or 10 it would make settling down
    > long if the ssytem did not start with a good drift value.
    > However, even minpoll 5 means one data sample every 4 hours roughly(since
    > ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
    > convergence. And even minpoll 4, the minimum, is only one sample every 2
    > hrs.
    >
    >


    I must be missing something! Minpoll=5 means 2^5 seconds is the minimum
    poll interval. How are you getting to every four hours from that? ISTR
    that the default minpoll is 6 which gives 2^6 or 64 seconds.

    If the server lines in ntp.conf include the "iburst" keyword, the
    servers will be polled with an initial burst of eight requests sent at
    two second intervals. This fills the pipeline and "pacifies" the
    filter. Thereafter, ntpd adjusts the polling interval as it thinks
    best. Normally the poll interval will increase to somewhere between 256
    and 1024 seconds once the clock is synchronized. In general, the better
    the network connection the higher the maximum poll interval.

    It's interesting to watch the performance of ntpd improve as the network
    quiets down during the hours when most people sleep!



  16. Re: Time reset

    On 3 apr, 23:10, "Richard B. Gilbert" wrote:
    ....
    >
    > DOES the time step backward?
    >
    > If ntpd is working properly it should NOT need to step the time at all
    > with the possible exception of a single step when ntpd is first started.
    >
    > If ntpd is stepping time regularly, you have some other problem. If you
    > find and fix that problem, ntpd should stop stepping the time.
    >
    > There are/were known issues with some Linux systems; during periods of
    > high disk usage, clock interrupts would be lost resulting in a FORWARD
    > step. AFAIK these issues were related to EIDE disks used in PIO mode
    > rather than DMA mode. ISTR reading that the problem has been fixed in
    > recent versions of Linux. YMMV


    I agree that ntpd should not stepping time regularly and that it
    points to a problem if it happens regularly. But we develop an
    appliance and we don't control how customers deploy it. Given the
    adverse effects of stepping time (especially if it moves backwards),
    I'd would have liked to be protected against badly set-up NTP
    infrastructure or time servers that are compromised.

    Jan

  17. Re: Time reset

    jkvbe wrote:
    > On 3 apr, 23:10, "Richard B. Gilbert" wrote:
    > ...
    >
    >>DOES the time step backward?
    >>
    >>If ntpd is working properly it should NOT need to step the time at all
    >>with the possible exception of a single step when ntpd is first started.
    >>
    >>If ntpd is stepping time regularly, you have some other problem. If you
    >>find and fix that problem, ntpd should stop stepping the time.
    >>
    >>There are/were known issues with some Linux systems; during periods of
    >>high disk usage, clock interrupts would be lost resulting in a FORWARD
    >>step. AFAIK these issues were related to EIDE disks used in PIO mode
    >>rather than DMA mode. ISTR reading that the problem has been fixed in
    >>recent versions of Linux. YMMV

    >
    >
    > I agree that ntpd should not stepping time regularly and that it
    > points to a problem if it happens regularly. But we develop an
    > appliance and we don't control how customers deploy it. Given the
    > adverse effects of stepping time (especially if it moves backwards),
    > I'd would have liked to be protected against badly set-up NTP
    > infrastructure or time servers that are compromised.
    >
    > Jan


    It seems to me that, in the circumstance you describe, supplying correct
    time is the customer's problem!

    Having read this newsgroup for the last four or five years, I'm aware
    that people do some very strange things with computer clocks. I'm
    thinking, in particlar, of at least one individual who deliberately set
    his clock to an incorrect time in order to see if Ntpd would correct it.
    Ntpd did so, of course, but he was not happy with the way it was done or
    the amount of time it took!

    If it's not under your control, it's not your responsibilty! Your
    instructions for the appliance should point this out pretty explicitly;
    e.g. "IF YOUR TIME SERVERS CAUSE TIME TO STEP, THE FOLLOWING ADVERSE
    CONSEQUENCES CAN BE EXPECTED TO OCCUR:
    It is YOUR responsibility to ensure that this does not happen!"

    The only halfway legitimate thing I can think of that would cause time
    to step would be a leap second.


  18. Re: Time reset

    Richard B. Gilbert wrote:

    > that people do some very strange things with computer clocks. I'm
    > thinking, in particlar, of at least one individual who deliberately set
    > his clock to an incorrect time in order to see if Ntpd would correct it.


    Many people do this. It is the naive users' way of testing that ntpd
    "works".

    > Ntpd did so, of course, but he was not happy with the way it was done or
    > the amount of time it took!


  19. Re: Time reset


    Hal wrote:

    >> My current problem is that drift settles at 82ppm (what I called <90 in
    >> previous email) in one run and then 32ppm in another run (with a reboot
    >> between). This is similar to the problem I had with stepping disabled
    >> where drift would go from +500ppm in one run and then swing all the way
    >> to -500ppm in another run (usually with a reboot between). I am not
    >> going to spend another minute troubleshooting this problem until we get
    >> an updated linux kernel. I will dig into it more deeply if the new
    >> kernel exhibits this same drift instability.
    >>

    >
    > I think we are talking about two different bugs here.
    >
    > The different drifts on reboot are due to a quirk in the tsc
    > calibration code in the kernal. Grep your sys log for messages
    > like these:
    > Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
    > Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
    > Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
    > Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
    > Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
    > Those bottom bits jumping arround correspond to the different
    > drift values.
    >
    > If you only have one system, you can pick one and hack your
    > kernel to smash in a constant value at the right place.
    >
    > Or you can add something like this to your boot line:
    > clocksource=acpi_pm
    > That's assuming your hardware has acpi and whatever.
    >
    > I've been using it for a while. I haven't noticed any quirks,
    > but who knows.
    >
    >


    YES! The slight variation in measured CPU speed seems to explain my
    continued drift instability (where "continued" means even with stepping
    enabled). I was able to retrieve four CPU speed measurements that had
    corresponding NTP loop logs. The table below shows the perfect
    correlation between linux-measured CPU speed and NTP-measured drift.
    Clearly the "real" CPU speed is somewhere around 2000.200 MHz.


    measured CPU speed | measured drift
    (MHz) | (ppm)
    ---------------------------------------
    2000.153 | -23
    2000.215 | 8
    2000.321 | 61
    2000.367 | 84


    As I've stated before, I don't believe the oscillator is really this
    unstable, but I could be wrong. After all, my CPU measurements varied
    much more than yours, especially from one run to the next. However, I'm
    still open to the possibility that linux's approach to speed measurement
    is less than perfect (at least for my version of linux). These
    measurements were on a core 2 duo (2 processors) running RedHawk linux
    2.6.18.8. Hal, can you tell me which version of linux resulted in your
    list of speed measurements?

    I also wonder if the use of two processors has any impact on this
    behavior. I tried forcing CPU affinity for the NTP process, but it
    didn't have any effect on the measured drift value. This means that
    either there truly is no difference between CPUs (as in different
    speed/frequency characteristics) or I wasn't actually moving the process
    between CPUs (using /proc//affinity). I'm assuming both CPUs have
    the same oscillator, so it makes sense that they would measure the same
    drift.

    Thanks,
    Andy

  20. Re: Time reset


    >As I've stated before, I don't believe the oscillator is really this
    >unstable, but I could be wrong. After all, my CPU measurements varied
    >much more than yours, especially from one run to the next. However, I'm
    >still open to the possibility that linux's approach to speed measurement
    >is less than perfect (at least for my version of linux). These
    >measurements were on a core 2 duo (2 processors) running RedHawk linux
    >2.6.18.8. Hal, can you tell me which version of linux resulted in your
    >list of speed measurements?


    Your crystal is probably fine.

    At one point, I hacked my kernel to call the calibration routine
    several times and printout the answer. A batch of answers from
    the same time (and hopefully same temperature) had the same sort
    of scatter.

    I'm running 2.6.23 wih a few local hacks. 2.6.19 has similar problems.


    --
    These are my opinions, not necessarily my employer's. I hate spam.


+ Reply to Thread
Page 1 of 2 1 2 LastLast