Significant clock skew in cluster environment - NTP

This is a discussion on Significant clock skew in cluster environment - NTP ; Hello all. I'd appreciate some help. I'm the defacto admin for a small research cluster in an academic institution. All hosts are running GNU/Linux under a 2.6.22-family kernel. The ntp version in use is 4.2.4 p4. My campus runs two ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Significant clock skew in cluster environment

  1. Significant clock skew in cluster environment

    Hello all.

    I'd appreciate some help. I'm the defacto admin for a small research
    cluster in an academic institution. All hosts are running GNU/Linux
    under a 2.6.22-family kernel. The ntp version in use is 4.2.4 p4. My
    campus runs two ntp servers. My cluster's headnode uses the two campus
    ntp servers as its sources. Internal cluster nodes then use the cluster
    headnode as their (only) ntp time source. The internal cluster nodes
    have no route to the internet, only the headnode does.

    I'm seeing a problem wherein internal cluster nodes develop significant
    clock skew over time. By "significant" I mean up to 700 seconds over
    two weeks of uptime. I am checking this using "ntpq -p" and looking at
    the offset field. The only thing I can think of is that some of the
    machines, including the headnode, are configured to use the Linux
    "ondemand" CPU frequency governor. These processors are older AMD
    Opteron 246/248 chips capable of dynamic frequency management. However,
    I also have nodes with older AMD Athlon processors that do not employ
    dynamic frequency management which also exhibit this phenomenon.

    Additionally, on the headnode I am seeing in the ntpd syslog output
    messages like:

    ntpd[5642]: frequency error 509 PPM exceeds tolerance 500 PPM

    But there are no such log entries on any of the internal nodes.

    Is there any issue with dynamic processor frequency control negatively
    affecting ntp?

    If this is not it, I can give the basic contents of my ntp.conf files.
    None of these machines are running onboard firewalls, and ntpd is being
    started through the init system.

    ---
    On the head node:

    Two sets of server directives in the form:

    server a.b.c.d iburst
    restrict a.b.c.d nomodify notrap nopeer noquery

    where a.b.c.d is one of the campus ntp servers' IP addresses.
    Thereafter there is:

    restrict default ignore
    restrict 127.0.0.1
    restrict h.i.j.k mask l.m.n.o nomodify nopeer notrap

    where h.i.j.k and l.m.n.o are correctly defined to allow all the
    internal cluster hosts to query this machine
    followed by:

    restrict h.i.j.p mask 255.255.255.255

    where p is the head node's internal cluster IP address.

    ---
    On the internal cluster nodes (all use the identical file):

    server h.i.j.p iburst

    where h.i.j.p is the headnode's IP address
    followed by:

    restrict default ignore
    restrict 127.0.0.1
    restrict h.i.j.p mask 255.255.255.255 nopeer
    ---

    Thanks for any help.
    --

    metallurgist@airpost.net

    --
    http://www.fastmail.fm - The professional email service

  2. Re: Significant clock skew in cluster environment


  3. Re: Significant clock skew in cluster environment

    metallurgist@airpost.net wrote:
    >
    > Is there any issue with dynamic processor frequency control negatively
    > affecting ntp?


    The TSC is used to interpolate, in some circumstances, and its
    calibration will be totally confused by variable clock rates.

    >
    > If this is not it, I can give the basic contents of my ntp.conf files.
    > None of these machines are running onboard firewalls, and ntpd is being
    > started through the init system.


    This is not going to be an ntpd configuration problem.
    >


  4. Re: Significant clock skew in cluster environment

    On Jun 24, 3:34*pm, David Woolley
    wrote:
    >
    > The TSC is used to interpolate, in some circumstances, and its
    > calibration will be totally confused by variable clock rates.


    That's not an issue anymore in the OP's kervel version.

    HTH

+ Reply to Thread