Frequency Error on Sun4v - NTP

This is a discussion on Frequency Error on Sun4v - NTP ; We are testing Solaris 10 images on a variety of hardware. We discovered a problem which seems to only appear on T5120 (Sun4V architecture) and darned if I can figure out why. Every 7 - 12 minutes I get these ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Frequency Error on Sun4v

  1. Frequency Error on Sun4v

    We are testing Solaris 10 images on a variety of hardware. We discovered a
    problem which seems to only appear on T5120 (Sun4V architecture) and darned
    if I can figure out why.

    Every 7 - 12 minutes I get these log entries
    19 Jun 09:41:41 ntpd[21419]: system event 'event_peer/strat_chg' (0x04)
    status 'sync_alarm, sync_ntp, 15 events,event_clock_reset' (0xc6f5)
    19 Jun 09:41:41 ntpd[21419]: synchronized to 47.xxx.yyy.zzz, stratum 2
    19 Jun 09:41:41 ntpd[21419]: frequency error 512 PPM exceeds tolerance 500
    PPM
    19 Jun 09:41:41 ntpd[21419]: system event 'event_sync_chg' (0x03) status
    'leap_none, sync_ntp, 15 events, event_peer/strat_chg' (0x6f4)
    19 Jun 09:41:41 ntpd[21419]: system event 'event_peer/strat_chg' (0x04)
    status 'leap_none, sync_ntp, 15 events, event_sync_chg' (0x6f3)

    Drift file is pegged at 500.000

    # ntpq -c rv
    assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
    version="ntpd 4.2.4p4@1.1520-o Wed Jun 18 14:32:44 EDT 2008 (2)",
    processor="sun4v", system="SunOS/5.10", leap=00, stratum=3,
    precision=-21, rootdelay=2.758, rootdispersion=951.915, peer=15662,
    refid=47.xxx.yyy.zzz,
    reftime=cc0515c0.4664ede2 Thu, Jun 19 2008 13:28:32.274, poll=6,
    clock=cc0515e5.eae3cd9e Thu, Jun 19 2008 13:29:09.917, state=4,
    offset=836.968, frequency=500.000, jitter=90.088, noise=217.608,
    stability=83.496, tai=0

    # cat ntp.conf
    # Server statements, of higher level strata NTP servers.
    server ntp-ott-1.ca.nortel.com
    server ntp-ott-2.ca.nortel.com
    server ntp-ott-3.ca.nortel.com
    # - Driftfile
    driftfile /etc/inet/ntpd.drift
    # - Logging options
    logconfig all
    logfile /var/adm/ntpd.log
    # Access Control:
    # o Defaut action: Don't trust anyone for time and modifications.
    restrict default nomodify nopeer
    # o Completely trust your own machine
    restrict 47.aaa.bbb.ccc # Restrict local host
    restrict 127.0.0.1 # Local host loopback


    I would be very appreciative if someone can help me identify the cause of
    the problem and suggest a solution.

    Thanks
    Conan



  2. Re: Frequency Error on Sun4v

    Conan,

    Have you seen http://suport.ntp.org/Support and the 'Troubleshooting'
    section there?
    --
    Harlan Stenn
    http://ntpforum.isc.org - be a member!

  3. Re: Frequency Error on Sun4v

    Conan wrote:
    > We are testing Solaris 10 images on a variety of hardware. We discovered a
    > problem which seems to only appear on T5120 (Sun4V architecture) and darned
    > if I can figure out why.
    >
    > Every 7 - 12 minutes I get these log entries
    > 19 Jun 09:41:41 ntpd[21419]: system event 'event_peer/strat_chg' (0x04)
    > status 'sync_alarm, sync_ntp, 15 events,event_clock_reset' (0xc6f5)
    > 19 Jun 09:41:41 ntpd[21419]: synchronized to 47.xxx.yyy.zzz, stratum 2
    > 19 Jun 09:41:41 ntpd[21419]: frequency error 512 PPM exceeds tolerance 500
    > PPM
    > 19 Jun 09:41:41 ntpd[21419]: system event 'event_sync_chg' (0x03) status
    > 'leap_none, sync_ntp, 15 events, event_peer/strat_chg' (0x6f4)
    > 19 Jun 09:41:41 ntpd[21419]: system event 'event_peer/strat_chg' (0x04)
    > status 'leap_none, sync_ntp, 15 events, event_sync_chg' (0x6f3)
    >
    > Drift file is pegged at 500.000
    >
    > # ntpq -c rv
    > assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
    > version="ntpd 4.2.4p4@1.1520-o Wed Jun 18 14:32:44 EDT 2008 (2)",
    > processor="sun4v", system="SunOS/5.10", leap=00, stratum=3,
    > precision=-21, rootdelay=2.758, rootdispersion=951.915, peer=15662,
    > refid=47.xxx.yyy.zzz,
    > reftime=cc0515c0.4664ede2 Thu, Jun 19 2008 13:28:32.274, poll=6,
    > clock=cc0515e5.eae3cd9e Thu, Jun 19 2008 13:29:09.917, state=4,
    > offset=836.968, frequency=500.000, jitter=90.088, noise=217.608,
    > stability=83.496, tai=0
    >
    > # cat ntp.conf
    > # Server statements, of higher level strata NTP servers.
    > server ntp-ott-1.ca.nortel.com
    > server ntp-ott-2.ca.nortel.com
    > server ntp-ott-3.ca.nortel.com
    > # - Driftfile
    > driftfile /etc/inet/ntpd.drift
    > # - Logging options
    > logconfig all
    > logfile /var/adm/ntpd.log
    > # Access Control:
    > # o Defaut action: Don't trust anyone for time and modifications.
    > restrict default nomodify nopeer
    > # o Completely trust your own machine
    > restrict 47.aaa.bbb.ccc # Restrict local host
    > restrict 127.0.0.1 # Local host loopback
    >
    >
    > I would be very appreciative if someone can help me identify the cause of
    > the problem and suggest a solution.
    >
    > Thanks
    > Conan
    >
    >


    It's almost certainly a hardware problem. Ntpd is telling you that the
    clock is gaining, or losing, more than about 43 seconds (500 parts per
    million) per day. 500 PPM is the maximum that ntpd can handle.

  4. Re: Frequency Error on Sun4v

    [drift > 500 ppm]

    >It's almost certainly a hardware problem. Ntpd is telling you that the
    >clock is gaining, or losing, more than about 43 seconds (500 parts per
    >million) per day. 500 PPM is the maximum that ntpd can handle.


    It could easily be a software screwup.

    Unless you know it works on that particular type of hardware,
    I'd give software equal probability.

    --
    These are my opinions, not necessarily my employer's. I hate spam.


  5. Re: Frequency Error on Sun4v

    Hal Murray wrote:
    > [drift > 500 ppm]
    >
    >> It's almost certainly a hardware problem. Ntpd is telling you that the
    >> clock is gaining, or losing, more than about 43 seconds (500 parts per
    >> million) per day. 500 PPM is the maximum that ntpd can handle.

    >
    > It could easily be a software screwup.
    >
    > Unless you know it works on that particular type of hardware,
    > I'd give software equal probability.
    >



    Do not even think about using NTP on a T1000, T2000, or T5120 until
    you have the latest firmware patch installed (nah-nah, wasn't hardware
    OR software. Or maybe both?). There is a bug in all three that
    causes the firmware to report an incorrect clock frequency to the
    kernel on boot up. Interestingly, the bug in the T1000 and T2000 is
    different from the bug in the T5120. See my blog post at
    http://blogs.sun.com/blu/entry/sprea...um_emi_and_the for an
    explanation of the T5120 issue.


    Brian Utterback

  6. Re: Frequency Error on Sun4v

    Brian,

    Naughty Sun. Cheaper to wiggle the clock rather than shield the box. I
    can understand the need to do this for the CPU clock, but I thought the
    timer interrupt was driven by a different oscillator. The problem is
    that the phase jitter would interfere with the TSC (or equivalent) timer
    interpolation.

    By reporting the average, there is some averaging interval involved.
    That adds an extra poll to the impulse response and could result in
    ringing/overshot. If the averaging interval is short, like one second,
    no problem. If much longer, there could be a problem. Sped sprectum
    (sic) is a good idea and frequency modulation resulting in phase jitter
    should be filtered out by the NTP mitigation and discipline algorithms,
    as long as the resulting phase jitter samples are identical and
    independently distributed.

    Problems with Sunses configured with slew-only and relatively large
    offsets have been reported preciously. This suggests the Solaris
    adjtime() syscall has been modified from the original Unix design,
    whichis a linear slew. In general, people seem to just put up with it.
    This problem should not occur with the kernel ntp_adjtime() call, but it
    is not used when slew-only is configured.

    Dave

    Brian Utterback wrote:
    > Hal Murray wrote:
    >
    >> [drift > 500 ppm]
    >>
    >>> It's almost certainly a hardware problem. Ntpd is telling you that
    >>> the clock is gaining, or losing, more than about 43 seconds (500
    >>> parts per million) per day. 500 PPM is the maximum that ntpd can
    >>> handle.

    >>
    >>
    >> It could easily be a software screwup.
    >>
    >> Unless you know it works on that particular type of hardware,
    >> I'd give software equal probability.
    >>

    >
    >
    > Do not even think about using NTP on a T1000, T2000, or T5120 until you
    > have the latest firmware patch installed (nah-nah, wasn't hardware OR
    > software. Or maybe both?). There is a bug in all three that causes the
    > firmware to report an incorrect clock frequency to the kernel on boot
    > up. Interestingly, the bug in the T1000 and T2000 is different from the
    > bug in the T5120. See my blog post at
    > http://blogs.sun.com/blu/entry/sprea...um_emi_and_the for an
    > explanation of the T5120 issue.
    >
    >
    > Brian Utterback


  7. Re: Frequency Error on Sun4v

    Brian Utterback wrote:
    > Do not even think about using NTP on a T1000, T2000, or T5120 until
    > you have the latest firmware patch installed (nah-nah, wasn't
    > hardware OR software. Or maybe both?). There is a bug in all three
    > that causes the firmware to report an incorrect clock frequency to
    > the kernel on boot up. Interestingly, the bug in the T1000 and T2000
    > is different from the bug in the T5120. See my blog post at
    > http://blogs.sun.com/blu/entry/sprea...um_emi_and_the for an
    > explanation of the T5120 issue.


    Interesting write-up. Not sure I understand all of it of course, but
    interesting regardless. Since no good deed goes unpunished and since
    I have an insatiable curiousity I will ask

    *) is that just the T5120 or is it more generally the T5X20?

    *) is the T5X40 affected

    sincerely,

    rick jones
    --
    oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  8. Re: Frequency Error on Sun4v

    >
    > Do not even think about using NTP on a T1000, T2000, or T5120 until
    > you have the latest firmware patch installed (nah-nah, wasn't hardware
    > OR software. Or maybe both?). There is a bug in all three that
    > causes the firmware to report an incorrect clock frequency to the
    > kernel on boot up. Interestingly, the bug in the T1000 and T2000 is
    > different from the bug in the T5120. See my blog post at
    > http://blogs.sun.com/blu/entry/sprea...um_emi_and_the for an
    > explanation of the T5120 issue.
    >
    >
    > Brian Utterback


    Thanks Brian.

    Can you provide a firmware rev number where the fix was first introduced ?
    Is there a firmware tunable to disable frequency modulation ?

    Thanks,

    Fran Horan
    JHU/APL



  9. Re: Frequency Error on Sun4v

    Rick Jones wrote:

    > *) is that just the T5120 or is it more generally the T5X20?
    >
    > *) is the T5X40 affected


    I will affect any system that has spread-spectrum EMI mitigation. So,
    I would guess that yes, they would all be affected, but I don't know
    for sure, I only had access to the one system and was not involved at
    all in the firmware fix.

    Brian Utterback

  10. Re: Frequency Error on Sun4v

    Fran Horan wrote:

    > Thanks Brian.
    >
    > Can you provide a firmware rev number where the fix was first introduced ?
    > Is there a firmware tunable to disable frequency modulation ?
    >
    > Thanks,
    >
    > Fran Horan
    > JHU/APL
    >
    >


    No, I don't know when it was introduced, but since the problem was the
    spread-spectrum was not being taken into account, for any platform
    that has spread-spectrum support then the question is not when it was
    introduced, but when it was fixed. All version prior to the fixed
    version would have the problem.

    To answer your second question, there is a tunable and you can turn
    the spread-spectrum off, but I do not know what it is. Sorry.

    Brian Utterback

+ Reply to Thread