NTP Sync Issues - NTP

This is a discussion on NTP Sync Issues - NTP ; We have 3 sites and we are experiencing some strange problems in one of the sites. We use NTP to keep the servers in time and this works fine for 2 of the sites but in one of the site ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: NTP Sync Issues

  1. NTP Sync Issues

    We have 3 sites and we are experiencing some strange problems in one of
    the sites. We use NTP to keep the servers in time and this works fine
    for 2 of the sites but in one of the site we get these errors in the log

    ntpd[30062]: synchronized to server, stratum 3
    ntpd[30062]: no servers reachable
    ntpd[30062]: synchronized to server, stratum 3
    ntpd[30062]: time reset +2.119167 s
    ntpd[30062]: synchronized to server, stratum 3

    For some reason the servers in that site seems to drop back between 2
    and 3 seconds behind the other sites for no apparent reason. Both the
    other sites work without any problem. We have run a packet capture at a
    working site and at the site with the problems and we dont see any
    differences other than the server becoming unsychronized frequently. We
    have checked the main firewall and that is not blocking access and the
    local firewalls are disabled. All our sites are connected via a
    dedicated link and I have tried connecting to ntp servers in the other
    sites and the problem persists. It looks like something local keeps
    changing the time but I can figure out what.



    The ntp.conf is the same for all sites except the servers are different.
    I have tried using burst and iburst but that hasnt worked.



    # Permit time synchronization with our time source, but do not

    # permit the source to query or modify the service on this system.

    restrict default kod nomodify notrap nopeer noquery

    restrict -6 default kod nomodify notrap nopeer noquery



    # Permit all access over the loopback interface. This could

    # be tightened as well, but to do so would effect some of

    # the administrative functions.

    restrict 127.0.0.1

    restrict -6 ::1



    server ipAddres iburst

    restrict networkAddress mask networkMask

    server ipAddres iburst

    restrict networkAddress mask networkMask



    driftfile /var/lib/ntp/drift


    Any help would be greatly appreciated.

    Thanks

  2. Re: NTP Sync Issues

    Adam Johnson wrote:
    > We have 3 sites and we are experiencing some strange problems in one of
    > the sites. We use NTP to keep the servers in time and this works fine
    > for 2 of the sites but in one of the site we get these errors in the log
    >
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: no servers reachable
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: time reset +2.119167 s
    > ntpd[30062]: synchronized to server, stratum 3


    This happens either because of two conflicting time synchronisation
    mechanisms or lost interrupts.

    I guess you are running some Unix-like system, from the process numbers
    in the messages. People failing to identify such systems are usually
    running Red Hat Linux. Linux (and Windows) are vulnerable to losing
    clock interrupts, especially when using IDE devices in non-DMA modes.
    Red Hat, in particular, tends to set the kernel interrput rate to 1000
    Hz, which tends to exacerbate this.

    The typical cause on other Unix systems, e.g. SunOS and at least some
    versions of SCO Unix, is software that resets the software clock from
    the real time clock.

    Tickless Linux systems are too new for much experience of failure modes
    to have been gathered.

    What one can reasonably say is that is is an OS or hardware issue, not
    an NTP one.

  3. Re: NTP Sync Issues

    Adam Johnson wrote:
    > We have 3 sites and we are experiencing some strange problems in one of
    > the sites. We use NTP to keep the servers in time and this works fine
    > for 2 of the sites but in one of the site we get these errors in the log
    >
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: no servers reachable
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: time reset +2.119167 s
    > ntpd[30062]: synchronized to server, stratum 3
    >
    > For some reason the servers in that site seems to drop back between 2
    > and 3 seconds behind the other sites for no apparent reason. Both the
    > other sites work without any problem. We have run a packet capture at a
    > working site and at the site with the problems and we dont see any
    > differences other than the server becoming unsychronized frequently. We
    > have checked the main firewall and that is not blocking access and the
    > local firewalls are disabled. All our sites are connected via a
    > dedicated link and I have tried connecting to ntp servers in the other
    > sites and the problem persists. It looks like something local keeps
    > changing the time but I can figure out what.
    >
    >
    >
    > The ntp.conf is the same for all sites except the servers are different.
    > I have tried using burst and iburst but that hasnt worked.
    >
    >
    >
    > # Permit time synchronization with our time source, but do not
    >
    > # permit the source to query or modify the service on this system.
    >
    > restrict default kod nomodify notrap nopeer noquery
    >
    > restrict -6 default kod nomodify notrap nopeer noquery
    >
    >
    >
    > # Permit all access over the loopback interface. This could
    >
    > # be tightened as well, but to do so would effect some of
    >
    > # the administrative functions.
    >
    > restrict 127.0.0.1
    >
    > restrict -6 ::1
    >
    >
    >
    > server ipAddres iburst
    >
    > restrict networkAddress mask networkMask
    >
    > server ipAddres iburst
    >
    > restrict networkAddress mask networkMask
    >
    >
    >
    > driftfile /var/lib/ntp/drift
    >
    >
    > Any help would be greatly appreciated.
    >
    > Thanks


    If the above accurately describes the REAL configuration, you have
    exactly two upstream servers which is the worst possible configuration!
    When the two disagree, which one should NTPD believe?

    Four servers are the minimum for a robust configuration. Five, seven,
    and nine are the remaining "magic" numbers. Few sites actually need
    more than four or five upstream servers.

    DO NOT use burst! Burst was a special purpose hack intended for sites
    that connect to a server by telephone two or three times a day. Iburst
    is good. Burst, except in the special circumstances it was designed for
    places a heavy and unwarranted load on its servers!

  4. Re: NTP Sync Issues

    Richard B. Gilbert wrote:

    > If the above accurately describes the REAL configuration, you have
    > exactly two upstream servers which is the worst possible configuration!
    > When the two disagree, which one should NTPD believe?
    >
    > Four servers are the minimum for a robust configuration. Five, seven,
    > and nine are the remaining "magic" numbers. Few sites actually need
    > more than four or five upstream servers.
    >
    > DO NOT use burst! Burst was a special purpose hack intended for sites
    > that connect to a server by telephone two or three times a day. Iburst
    > is good. Burst, except in the special circumstances it was designed for
    > places a heavy and unwarranted load on its servers!


    Note that none of these are relevant to this issue, unless you are
    getting both positive and negative steps and at least one of the servers
    is not really synchronised.

    As you say that it only drops back, none of these issues are relevant.

  5. Re: NTP Sync Issues

    Yes you are right we are running RHEL5. The strange thing is that when
    we try to sync the servers from a location that syncs correctly normally
    to the location with the issues then we get the same issues as the local
    servers are experiencing. You say that it could be conflicting time
    synchronisation mechanisms, do you mean that the 2 upstream servers are
    conflicting or that something other than NTP is causing this? Thank you
    for your help!

    Thanks

    Adam

    -----Original Message-----
    From: questions-bounces+a.johnson=wintoncapital.com@lists.ntp.org
    [mailto:questions-bounces+a.johnson=wintoncapital.com@lists.ntp.org] On
    Behalf Of David Woolley
    Sent: 08 June 2008 10:20
    To: questions@lists.ntp.org
    Subject: Re: NTP Sync Issues

    Adam Johnson wrote:
    > We have 3 sites and we are experiencing some strange problems in one

    of
    > the sites. We use NTP to keep the servers in time and this works fine
    > for 2 of the sites but in one of the site we get these errors in the

    log
    >
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: no servers reachable
    > ntpd[30062]: synchronized to server, stratum 3
    > ntpd[30062]: time reset +2.119167 s
    > ntpd[30062]: synchronized to server, stratum 3


    This happens either because of two conflicting time synchronisation
    mechanisms or lost interrupts.

    I guess you are running some Unix-like system, from the process numbers
    in the messages. People failing to identify such systems are usually
    running Red Hat Linux. Linux (and Windows) are vulnerable to losing
    clock interrupts, especially when using IDE devices in non-DMA modes.
    Red Hat, in particular, tends to set the kernel interrput rate to 1000
    Hz, which tends to exacerbate this.

    The typical cause on other Unix systems, e.g. SunOS and at least some
    versions of SCO Unix, is software that resets the software clock from
    the real time clock.

    Tickless Linux systems are too new for much experience of failure modes
    to have been gathered.

    What one can reasonably say is that is is an OS or hardware issue, not
    an NTP one.

    _______________________________________________
    questions mailing list
    questions@lists.ntp.org
    https://lists.ntp.org/mailman/listinfo/questions

  6. Re: NTP Sync Issues

    Adam Johnson wrote:
    []
    > You say that it could be
    > conflicting time synchronisation mechanisms, do you mean that the 2
    > upstream servers are conflicting or that something other than NTP is
    > causing this? Thank you for your help!
    >
    > Thanks
    >
    > Adam


    Adam,

    Two servers prevents NTP working as designed, as it doesn't then have
    enough information to select between the servers. Best to add another two
    or three to make four or five total servers available.

    Cheers,
    David



  7. Re: NTP Sync Issues

    Adam Johnson wrote:
    > Yes you are right we are running RHEL5. The strange thing is that when
    > we try to sync the servers from a location that syncs correctly normally
    > to the location with the issues then we get the same issues as the local


    I didn't understand that.

    > servers are experiencing. You say that it could be conflicting time
    > synchronisation mechanisms, do you mean that the 2 upstream servers are
    > conflicting or that something other than NTP is causing this? Thank you
    > for your help!


    I was actually referring to something other than NTP, although that is
    not generally an issue on Red Hat. Two conflicting servers only happens
    if the server configuration is broken, although, given the number of
    people who use the local clock without understanding the risks, that's
    possibly not that unlikely.

    In a correctly operating NTP system, you can rely on all servers and the
    client being within 1 second of some concept of true time. For public
    servers, and ones based on radio reference clocks, that time is UTC.

    The fix for this is to choose servers which are traceable to the same
    time source and to have enough independent ones that any rogue one is
    outvoted by the good ones.

    However, the result of having two servers on different times is either
    that both get ignored, or that times hop backwards and forwards. As
    your time was always hopping backwards, the indications are for
    something other than ntpd forcibly changing the clock. This will give
    steps at roughly equal intervals.

    More likely on Red Hat, given that your steps are always positive, is
    lost timer interrupts. Lost interrupts tend to be activity related, so
    the interval between steps and size of steps will be more variable. To
    fix that, make sure that IDE drivers use DMA and, if possible, rebuild
    the kernel with HZ set to 100.

  8. Re: NTP Sync Issues - Solved

    We have figured out what the issue was. It turns out that it was because
    of an Altiris deployment server that for some reason was conflicting
    with NTP and adjusting the time on the servers. Thanks to all that
    responded to my question.

    Thanks

    Adam

    -----Original Message-----
    From: questions-bounces+a.johnson=wintoncapital.com@lists.ntp.org
    [mailto:questions-bounces+a.johnson=wintoncapital.com@lists.ntp.org] On
    Behalf Of David Woolley
    Sent: 10 June 2008 07:48
    To: questions@lists.ntp.org
    Subject: Re: NTP Sync Issues

    Adam Johnson wrote:
    > Yes you are right we are running RHEL5. The strange thing is that when
    > we try to sync the servers from a location that syncs correctly

    normally
    > to the location with the issues then we get the same issues as the

    local

    I didn't understand that.

    > servers are experiencing. You say that it could be conflicting time
    > synchronisation mechanisms, do you mean that the 2 upstream servers

    are
    > conflicting or that something other than NTP is causing this? Thank

    you
    > for your help!


    I was actually referring to something other than NTP, although that is
    not generally an issue on Red Hat. Two conflicting servers only happens

    if the server configuration is broken, although, given the number of
    people who use the local clock without understanding the risks, that's
    possibly not that unlikely.

    In a correctly operating NTP system, you can rely on all servers and the

    client being within 1 second of some concept of true time. For public
    servers, and ones based on radio reference clocks, that time is UTC.

    The fix for this is to choose servers which are traceable to the same
    time source and to have enough independent ones that any rogue one is
    outvoted by the good ones.

    However, the result of having two servers on different times is either
    that both get ignored, or that times hop backwards and forwards. As
    your time was always hopping backwards, the indications are for
    something other than ntpd forcibly changing the clock. This will give
    steps at roughly equal intervals.

    More likely on Red Hat, given that your steps are always positive, is
    lost timer interrupts. Lost interrupts tend to be activity related, so
    the interval between steps and size of steps will be more variable. To
    fix that, make sure that IDE drivers use DMA and, if possible, rebuild
    the kernel with HZ set to 100.

    _______________________________________________
    questions mailing list
    questions@lists.ntp.org
    https://lists.ntp.org/mailman/listinfo/questions

+ Reply to Thread