Re: high precision tracking: trying to understandsudden jumps - NTP

This is a discussion on Re: high precision tracking: trying to understandsudden jumps - NTP ; On Sun, 30 Mar 2008, starlight@binnacle.cx wrote: > At 04:51 PM 3/30/2008 -0700, Bill Unruh wrote: >> Are those on the same day? > > Yes, same day. Uncorrelated to anything I can identify > or each other. Same story ...

+ Reply to Thread
Results 1 to 9 of 9

Thread: Re: high precision tracking: trying to understandsudden jumps

  1. Re: high precision tracking: trying to understandsudden jumps

    On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:

    > At 04:51 PM 3/30/2008 -0700, Bill Unruh wrote:
    >> Are those on the same day?

    >
    > Yes, same day. Uncorrelated to anything I can identify
    > or each other. Same story on all the boxes. Running
    > a hefty multi-system compile with heavy NFS and Samba
    > traffic does not produce these events, though it disturbs
    > the Windows boxes slightly when CPU goes to 100%.
    >
    >> Which "linux" and which "windows" are those graphs since you
    >> have 2 linux and 2 windows clients.

    >
    > That's the dual-core AMD 2.4GHz Athlon Tyan mobo whitebox
    > runing Centos 4.5 SMP kernel. Similar results on the
    > Dell Dimension 2400 2.4GHz Intel P4 running Centos 4.5
    > mono-processor kernel.
    >
    > Windows is a dual-core 3.4GHz Pentium D Tyan mobo whitebox
    > running 2003 R2 SP2 standard server.
    >
    >> As I said, seeing the
    >> peerstats files would be helpful (offset and roundtrip)

    >
    > Might try them later, but I can't belive a high-quality
    > SMC switch is causing multi-millisecond delays. Just not
    > possible. Pings are all about 400 microseconds, consistent
    > but slightly different on each system. Round trip is
    > 800 microseconds. Attaching the output from a bulk 'ntpq -p'
    > 'ntptrace' script I have below. Note that's 'ntptrace'
    > version 4.1 since the 4.2 script has useless offset info.


    I have had weird latencies on some switches here.
    And since all your machines are experiencing this, that switch is the only
    commonality (or the ntp server). Do you have the peerstats on the server as
    well to make sure that there are not some weird delays there.



    >
    >> Also these graphs seem to have cut off the spikes. Are the
    >> spikes actaully higher or is that an illusion?

    >
    > Higher. Sometimes 1ms, sometimes 5-6ms.
    >
    >> (Note the spikes are hundreds of usec, not many msec)

    >
    > That would be the ~1ms example, check out the other one.
    >


    I am also really really really disturbed that you have so many servers. You
    are trying to test out one specific server. The others are simply liable to
    confuse everything. For example ntp could for some bizarre reason, suddenly
    decide to use one of those other sites as the preferred server and give a
    glitch.

    And what are all those CDMA servers? Set your system up with one single
    source, the one you want to test.


    >
    >
    >
    >
    > remote refid st t when poll reach delay offset jitter
    > ================================================== ============================
    > Endrun CDMA
    > LOCAL(0) LOCAL(0) 10 l 18 64 377 0.000 0.000 0.015
    > *HOPF_S(0) .CDMA. 0 l 6 16 377 0.000 0.000 0.015
    > Centos 32
    > *eachna .CDMA. 1 u 3 16 377 0.683 -0.004 0.009
    > -tock.usno.navy. .USNO. 1 u 452 1024 377 20.678 1.432 2.822
    > +navobs1.wustl.e .GPS. 1 u 479 1024 377 50.136 -1.513 0.164
    > +time.nist.gov .ACTS. 1 u 471 1024 377 66.528 -1.708 0.156
    > -tick.ucla.edu .GPS. 1 u 432 1024 377 87.372 3.296 0.085
    > Ultra 10
    > *172.29.87.3 .CDMA. 1 u 11 16 377 0.869 -0.016 0.042
    > 172.29.87.15: stratum 2, offset -0.000007, synch distance 0.00783
    > 172.29.87.3: stratum 1, offset -0.000018, synch distance 0.00038, refid 'CDMA'
    > Ultra 80
    > *172.29.87.3 .CDMA. 1 u 4 16 377 0.942 -0.012 0.012
    > 172.29.87.17: stratum 2, offset -0.000038, synch distance 0.00685
    > 172.29.87.3: stratum 1, offset -0.000017, synch distance 0.00038, refid 'CDMA'
    > 44p
    > *172.29.87.3 .CDMA. 1 u 13 16 377 0.809 -0.001 0.016
    > 172.29.87.13: stratum 2, offset -0.000014, synch distance 0.00627
    > 172.29.87.3: stratum 1, offset -0.000018, synch distance 0.00038, refid 'CDMA'
    > Centos 64
    > *172.29.87.3 .CDMA. 1 u 12 16 377 0.664 0.003 0.487
    > 172.29.87.19: stratum 2, offset -0.000009, synch distance 0.00720
    > 172.29.87.3: stratum 1, offset -0.000018, synch distance 0.00038, refid 'CDMA'
    > W2K3 64
    > *172.29.87.3 .CDMA. 1 u 4 16 377 0.734 0.053 0.014
    > 172.29.87.20: stratum 2, offset -0.000060, synch distance 0.00650
    > 172.29.87.3: stratum 1, offset -0.000019, synch distance 0.00038, refid 'CDMA'
    > XP 32 laptop
    > *172.29.87.3 .CDMA. 1 u 7 16 377 0.819 0.468 0.256
    > 172.29.87.12: stratum 2, offset -0.000173, synch distance 0.00655
    > 172.29.87.3: stratum 1, offset -0.000017, synch distance 0.00038, refid 'CDMA'
    >


    --
    William G. Unruh | Canadian Institute for| Tel: +1(604)822-3273
    Physics&Astronomy | Advanced Research | Fax: +1(604)822-5324
    UBC, Vancouver,BC | Program in Cosmology | unruh@physics.ubc.ca
    Canada V6T 1Z1 | and Gravity | www.theory.physics.ubc.ca/

  2. Re: high precision tracking: trying to understand sudden jumps

    Bill Unruh wrote:
    > On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:
    >


    You appear to be quoting an off list reply with no indication of
    permission, although it is just possible that the email gateway
    forwarded it to email subscribers without forwarding it to the usenet
    group proper.

    Incidentally, what he's done is to run together the peers information
    from many machines, so there is only one CDMA source. On the other
    hand, it doesn't look like it is a CDMA appliance, or if it is, it has
    been badly implemented, as I would not expect to see a local clock
    driver on an appliance device.

    The delays are rather large for the paragon of perfection of a network
    that was described.

    He probably needs to be aware that normal applications on the Windows
    boxes will see times with a resolution that is rather poorer than can be
    seen by ntptrace, as ntptrace takes advantage of the ntpd tick
    interpolation, but normal applications will see times with a resolution
    of one clock tick.

  3. Re: high precision tracking: trying to understand sudden jumps

    On 2008-03-31, David Woolley wrote:

    > Bill Unruh wrote:
    >
    >> On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:

    >
    > You appear to be quoting an off list reply with no indication of
    > permission, although it is just possible that the email gateway
    > forwarded it to email subscribers without forwarding it to the usenet
    > group proper.


    What you are suggesting is not possible.

    The Usenet news-group is just another subscriber to the questions list.

    --
    Steve Kostecke
    NTP Public Services Project - http://support.ntp.org/

  4. Re: high precision tracking: trying to understand sudden jumps

    Steve Kostecke wrote:
    > On 2008-03-31, David Woolley wrote:
    >
    >> Bill Unruh wrote:
    >>
    >>> On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:

    >> You appear to be quoting an off list reply with no indication of
    >> permission, although it is just possible that the email gateway
    >> forwarded it to email subscribers without forwarding it to the usenet
    >> group proper.

    >
    > What you are suggesting is not possible.
    >
    > The Usenet news-group is just another subscriber to the questions list.


    It's certainly very possible that the missing article was private email
    only, although possibly by mistake. The mailing list doesn't seem to be
    a simple subscriber, as an example quoted before showed no sign of
    attachments in the usenet version, but the mail archive version that I
    was pointed to mentioned that attachments (a PGP signature) had been
    suppressed.

    I assume you mean the usenet gateway is a subscriber, as usenet groups
    can't subscribe to mailing lists on their own. In that case, it is at
    least theoretically possible that the gateway suppresses the message on
    the usenet side, but if it is an ordinary subscriber on the mailing list
    side, the message will still go to other mailing list subscribers. One
    obvious case in which this would happen is if there was a duplicate
    message ID.

    I haven't checked the mail archives, but I did check Google groups, and
    it hasn't seen the missing message.
    >


  5. Re: high precision tracking: trying to understand sudden jumps

    David Woolley schrieb:
    > Bill Unruh wrote:
    >> On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:
    >>

    >
    > You appear to be quoting an off list reply with no indication of
    > permission, although it is just possible that the email gateway
    > forwarded it to email subscribers without forwarding it to the usenet
    > group proper.
    >
    > Incidentally, what he's done is to run together the peers information
    > from many machines, so there is only one CDMA source. On the other
    > hand, it doesn't look like it is a CDMA appliance, or if it is, it has
    > been badly implemented, as I would not expect to see a local clock
    > driver on an appliance device.


    We have that in our NTP appliances as well. You can configure it to any stratum
    level you want and it is used as a last resort fallback in case the receiver
    lost reception and the (also configurable) so-called trust time has passed
    without the signal coming back. This results in the time server replying with
    stratum 12 (for example) after a while and ensures that everybody has the same
    time, although it might be wrong. If a user does not want that, they can simply
    set the local clock stratum to 15 and the server will not be accepted anymore.

    Can you please let me know why you consider this a "bad implementation"?

    Regards,
    Heiko

    >[...]


  6. Re: high precision tracking: trying to understand sudden jumps

    David Woolley writes:

    >Bill Unruh wrote:
    >> On Sun, 30 Mar 2008, starlight@binnacle.cx wrote:
    >>


    >You appear to be quoting an off list reply with no indication of
    >permission, although it is just possible that the email gateway
    >forwarded it to email subscribers without forwarding it to the usenet
    >group proper.


    He went off list because he was banned on list for a day because of his
    attempts to post graphs on line (He did not realises, as he said in his
    posted post, that he could not post graphs online. He does now.)


    >Incidentally, what he's done is to run together the peers information
    >from many machines, so there is only one CDMA source. On the other
    >hand, it doesn't look like it is a CDMA appliance, or if it is, it has
    >been badly implemented, as I would not expect to see a local clock
    >driver on an appliance device.


    Ah, perhaps. Even then his list looks weird.


    >The delays are rather large for the paragon of perfection of a network
    >that was described.


    Yes, that was one reason I wanted to see his peerstats file as well.
    loopstats has gone through the clock_filter and the selection algorithm and
    gives a poor representation of what is actually on the net. But at 16sec
    poll it is a pretty large file for one day. But he can graph it.



    >He probably needs to be aware that normal applications on the Windows
    >boxes will see times with a resolution that is rather poorer than can be
    >seen by ntptrace, as ntptrace takes advantage of the ntpd tick
    >interpolation, but normal applications will see times with a resolution
    >of one clock tick.


  7. Re: high precision tracking: trying to understand sudden jumps

    Heiko Gerstung wrote:

    > time has passed without the signal coming back. This results in the time
    > server replying with stratum 12 (for example) after a while and ensures
    > that everybody has the same time, although it might be wrong. If a user
    > does not want that, they can simply set the local clock stratum to 15
    > and the server will not be accepted anymore.
    >
    > Can you please let me know why you consider this a "bad implementation"?


    Because the protocol fails to signal the loss of the time source
    properly when one has a local clock configured. As such, I believe that
    enabling a local clock should always be an opt in choice. Basically,
    when it falls back to the local clock, root dispersion goes to zero,
    when the true situation is that root dispersion is growing without bound.

    Things can go seriously wrong if there is more than one local clock
    source on a network, as it becomes possible for them to outvote the real
    time.

  8. Re: high precision tracking: trying to understand sudden jumps

    David Woolley wrote:
    > Heiko Gerstung wrote:
    >
    >> time has passed without the signal coming back. This results in the
    >> time server replying with stratum 12 (for example) after a while and
    >> ensures that everybody has the same time, although it might be wrong.
    >> If a user does not want that, they can simply set the local clock
    >> stratum to 15 and the server will not be accepted anymore.
    >>
    >> Can you please let me know why you consider this a "bad implementation"?

    >
    >
    > Because the protocol fails to signal the loss of the time source
    > properly when one has a local clock configured. As such, I believe that
    > enabling a local clock should always be an opt in choice. Basically,
    > when it falls back to the local clock, root dispersion goes to zero,
    > when the true situation is that root dispersion is growing without bound.
    >
    > Things can go seriously wrong if there is more than one local clock
    > source on a network, as it becomes possible for them to outvote the real
    > time.


    Local clock IS an opt in choice. If you don't configure it, it doesn't
    serve time. Stratum is taken into account in selecting a time source.
    I can't swear to it but I'd be surprised if three stratum 10 servers
    could out vote one stratum 2 server.


  9. Re: high precision tracking: trying to understand sudden jumps

    Richard B. Gilbert wrote:
    > Stratum is taken into account in selecting a time source.
    > I can't swear to it but I'd be surprised if three stratum 10 servers
    > could out vote one stratum 2 server.
    >

    At least for RFC1305, stratum is not considered (except in as much as
    refid is not checked for stratum 1) until after the intersection
    algorithm has removed false tickers.

    I believe there have been cases on the newsgroup in which people have
    peered systems using the local clock and then had the system form a
    clique which rejects real sources of time.

    The main thing that would probably mitigate against this if the local
    clocks were in appliances is that the local clock gets a falsely narrow
    error tolerance band. With the peering configuration, the bands
    overlap, but with multiple appliances, they would have to drift at very
    similar rates to stay compatible. The narrow tolerance is why local
    clocks that agree make it particularly difficult for a good clock to be
    accepted, as the good clock has to be very close to the clique to not be
    rejected.

+ Reply to Thread