x86 High Precision Event Timers support - Linux

This is a discussion on x86 High Precision Event Timers support - Linux ; Hello, As far as I understand (which is not very far, please do single out all inaccuracies) there is an effort in the x86 world to replace the legacy x86 timer infrastructure: o The PIT (Programmable Interval Timer) such as ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 40

Thread: x86 High Precision Event Timers support

  1. x86 High Precision Event Timers support

    Hello,

    As far as I understand (which is not very far, please do single out all
    inaccuracies) there is an effort in the x86 world to replace the legacy
    x86 timer infrastructure:

    o The PIT (Programmable Interval Timer) such as Intel's 8253 and 8254
    http://en.wikipedia.org/wiki/Intel_8253
    http://www.intel.com/design/archives.../docs/7178.htm

    o The RTC (Real-Time Clock)

    o The (Local??) APIC timer
    (I didn't find much information on this timer.)

    o The ACPI timer, also known as the PM clock
    (Any pointers?)

    Microsoft provides a rationale for the new infrastructure:
    http://www.microsoft.com/whdc/system/CEC/mm-timer.mspx

    Intel provides a spec:
    http://www.intel.com/hardwaredesign/hpetspec.htm


    As far as I understand, the HPET hardware is provided by the southbridge
    chipset? For example, Intel's ICH5.

    (Would the VIA VT82C686B provide an HPET block?)

    My understanding is that the BIOS is supposed to map the HPET addresses
    in memory, and provide the information through an ACPI table at
    boot-time? If the BIOS does not initialize the HPET hardware, the OS
    remains unaware that it is available.

    http://www.ussg.iu.edu/hypermail/lin...ndex.html#0222

    Is there, somewhere, a list of hardware with HPET support?

    Are there implementations that support more than 3 comparators?

    Regards.

  2. Re: x86 High Precision Event Timers support

    In comp.os.linux.development.system Spoon wrote in part:
    > As far as I understand (which is not very far, please do
    > single out all inaccuracies) there is an effort in the x86
    > world to replace the legacy x86 timer infrastructure:


    You forgot the venerable and still extremely precise RDTSC
    instruction available since the original Pentium to read the
    CPU's cycle counter. Typical overhead, 30 clocks vs interrupt
    latency of at least 100 clocks.

    Accuracy still depends on the clock generator. iAFAIK,
    nanosleep(), gettimeofday() and friends use RDTSC to
    interpolate other clocks (APIC prefered over the PIT).

    -- Robert


  3. Re: x86 High Precision Event Timers support

    On Wed, 21 Jun 2006 13:49:29 GMT, Robert Redelmeier
    wrote:

    >In comp.os.linux.development.system Spoon wrote in part:
    >> As far as I understand (which is not very far, please do
    >> single out all inaccuracies) there is an effort in the x86
    >> world to replace the legacy x86 timer infrastructure:

    >
    >You forgot the venerable and still extremely precise RDTSC
    >instruction available since the original Pentium to read the
    >CPU's cycle counter. Typical overhead, 30 clocks vs interrupt
    >latency of at least 100 clocks.


    RDTSC is nice as long as you stay away from Geode processors, which
    seems to enter the SMM in more or less unpredictable ways. Also any
    processor doing some dynamic clock frequency changes in various power
    saving modes will cause problems.

    >Accuracy still depends on the clock generator. iAFAIK,
    >nanosleep(), gettimeofday() and friends use RDTSC to
    >interpolate other clocks (APIC prefered over the PIT).


    The CPU clock frequency is quite temperature dependent. Unless you can
    check the time at least once a day from some reliable source, such as
    the CMOS clock, NTP or some GPS clock, quite significant cumulative
    errors will occur.

    Paul


  4. Re: x86 High Precision Event Timers support

    Paul Keinanen wrote:

    > RDTSC is nice as long as you stay away from Geode processors, which
    > seems to enter the SMM in more or less unpredictable ways. Also any
    > processor doing some dynamic clock frequency changes in various power
    > saving modes will cause problems.


    Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency,
    regardless of temporary reductions in core frequency. Eventually, I
    suppose AMD will do the right thing too.

    --
    Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

  5. Re: x86 High Precision Event Timers support

    Robert Redelmeier wrote:

    > Spoon wrote:
    >
    >> As far as I understand (which is not very far, please do
    >> single out all inaccuracies) there is an effort in the x86
    >> world to replace the legacy x86 timer infrastructure:

    >
    > You forgot the venerable and still extremely precise RDTSC
    > instruction available since the original Pentium to read the
    > CPU's cycle counter. Typical overhead, 30 clocks vs interrupt
    > latency of at least 100 clocks.


    Which reminds me of Rich Brunner's excellent article:
    http://groups.google.com/group/fa.li...ae85a08ebd3aa4

    > Accuracy still depends on the clock generator. iAFAIK,
    > nanosleep(), gettimeofday() and friends use RDTSC to
    > interpolate other clocks (APIC prefered over the PIT).


    I'm playing with the hrtimers infrastructure:
    http://www.tglx.de/hrtimers.html

    I *think* they use HPET, if they find it.
    http://www.tglx.de/projects/hrtimers....16-hrt6.patch


    I'm also wondering: Are there x86-based systems where a card equipped
    with several PITs (e.g. ADLINK's PCI-8554) is a necessity?

    http://www.adlinktech.com/PD/web/PD_detail.php?pid=27

  6. Re: x86 High Precision Event Timers support

    Niels Jørgen Kruse wrote:
    > Paul Keinanen wrote:
    >
    >
    >>RDTSC is nice as long as you stay away from Geode processors, which
    >>seems to enter the SMM in more or less unpredictable ways. Also any
    >>processor doing some dynamic clock frequency changes in various power
    >>saving modes will cause problems.

    >
    >
    > Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency,
    > regardless of temporary reductions in core frequency. Eventually, I
    > suppose AMD will do the right thing too.
    >

    It would be nice if they get around to supporting a high resolution
    timing interface that doesn't require a syscall, works in an SMP
    environment, and supports virtual timing as well as real wall clock
    timing. It's a known technique and has been around for decades.

    Also Intel and AMD need to think about how these things virtualize before
    they put these kind of things in rather than five years after the fact.
    But that's only important if Intel and AMD think virtualization is an
    important part of their business strategy.

    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  7. Re: x86 High Precision Event Timers support

    Niels Jørgen Kruse wrote:
    > Paul Keinanen wrote:
    >
    >>RDTSC is nice as long as you stay away from Geode processors, which
    >>seems to enter the SMM in more or less unpredictable ways. Also any
    >>processor doing some dynamic clock frequency changes in various power
    >>saving modes will cause problems.

    >
    > Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency,
    > regardless of temporary reductions in core frequency. Eventually, I
    > suppose AMD will do the right thing too.


    But since an OS or library that provides timing services cannot rely on
    running on a processor where the RDTSC frequency is fixed, this won't
    simplify any such OSes or libraries, until at some point it becomes
    practical to ignore older processors.

    --
    David Hopwood

  8. Re: x86 High Precision Event Timers support

    In comp.os.linux.development.system David Hopwood wrote in part:
    > But since an OS or library that provides timing services cannot
    > rely on running on a processor where the RDTSC frequency is fixed,
    > this won't simplify any such OSes or libraries, until at some
    > point it becomes practical to ignore older processors.


    This depends very much on the software quality requirements.
    Not everything is a big system that will be used for critical
    purposes. Everything is a compomise -- RDTSC is very fast
    and usually good. OS calls are almost always accurate,
    but slower and usually less precise.

    Horses for courses.

    -- Robert


  9. Re: x86 High Precision Event Timers support

    David Hopwood writes:

    >But since an OS or library that provides timing services cannot rely on
    >running on a processor where the RDTSC frequency is fixed, this won't
    >simplify any such OSes or libraries, until at some point it becomes
    >practical to ignore older processors.


    Unless the OS makes good. If the OS fixes these things up in the other
    cases (hard, I've tried it), then not having to do this on some system is a
    bonus.

    Casper

  10. Re: x86 High Precision Event Timers support

    David Hopwood wrote:

    > Niels Jørgen Kruse wrote:
    > > Paul Keinanen wrote:
    > >
    > >>RDTSC is nice as long as you stay away from Geode processors, which
    > >>seems to enter the SMM in more or less unpredictable ways. Also any
    > >>processor doing some dynamic clock frequency changes in various power
    > >>saving modes will cause problems.

    > >
    > > Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency,
    > > regardless of temporary reductions in core frequency. Eventually, I
    > > suppose AMD will do the right thing too.

    >
    > But since an OS or library that provides timing services cannot rely on
    > running on a processor where the RDTSC frequency is fixed, this won't
    > simplify any such OSes or libraries, until at some point it becomes
    > practical to ignore older processors.


    Currently, MacOS X can assume that. Granted, Marklar was started before
    there were fixed frequency RDTSC processors, so there may be some
    workaround still in there.

    --
    Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

  11. Re: x86 High Precision Event Timers support

    Niels Jørgen Kruse wrote:
    > David Hopwood wrote:
    >
    >> Niels Jørgen Kruse wrote:
    >>> Paul Keinanen wrote:
    >>>
    >>>> RDTSC is nice as long as you stay away from Geode processors, which
    >>>> seems to enter the SMM in more or less unpredictable ways. Also any
    >>>> processor doing some dynamic clock frequency changes in various power
    >>>> saving modes will cause problems.
    >>> Recent Intel CPUs run the RDTSC cyclecounter at a fixed frequency,
    >>> regardless of temporary reductions in core frequency. Eventually, I
    >>> suppose AMD will do the right thing too.

    >> But since an OS or library that provides timing services cannot rely on
    >> running on a processor where the RDTSC frequency is fixed, this won't
    >> simplify any such OSes or libraries, until at some point it becomes
    >> practical to ignore older processors.

    >
    > Currently, MacOS X can assume that. Granted, Marklar was started before
    > there were fixed frequency RDTSC processors, so there may be some
    > workaround still in there.


    There are two main problems here:

    a) The TSC might not run at a fixed frequency, but an OS can know when
    the changes happen, and still use it to provide a fast return value: It
    needs a userlevel library routine which just has to take the current TSC
    count, multiply by the current scale factor (producing a triple-width
    result), shift down by the current shift value, and add the current base
    count. Total time for this operation is not much higher than the RDTSC
    opcode which can easily take 20-30 cycles by itself on some cpus.

    Intuitively, you would like to either reset the TSC count or store the
    current value and subtract it out before the multiplication, but the
    subtraction can instead be included in the base value to be added in
    after the scaling multiplication.

    The OS must of course update the base value and the scale factor each
    time the TSC frequency changes, but as long as there's only a small
    number (two?) of base frequencies to support, the needed scale factors
    can be calculated up front, and you might even get away with just a
    shift if the slow frequency is a binary fraction of the high.

    b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    counts to get out of sync, and this is a much harder problem to fix
    while still delivering sub-us precision and latency.

    Windows punts by using the best available external counter, which might
    fall back all the way to the horrible 1.1 MHz keyboard chip/RAM refresh
    counter designed into the original 1981 model PC. :-(

    Terje
    --
    -
    "almost all programming can be viewed as an exercise in caching"

  12. Re: x86 High Precision Event Timers support

    Terje Mathisen writes:

    >The OS must of course update the base value and the scale factor each
    >time the TSC frequency changes, but as long as there's only a small
    >number (two?) of base frequencies to support, the needed scale factors
    >can be calculated up front, and you might even get away with just a
    >shift if the slow frequency is a binary fraction of the high.


    The number of frequencies can be higher, actually; an typical AMD CPU
    can only do smallish frequency steps, and that makes for quite a few
    frequencies (four-five on typical systems around here)

    >b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >counts to get out of sync, and this is a much harder problem to fix
    >while still delivering sub-us precision and latency.


    yeah, I didn't do multi-cpu/multi-core; the Opteron multi-core CPUs will
    all need to run at the same frequency (though I'm not sure if setting
    the core voltage/frequency of one half of the core affects the other
    half at the same time or that these actions need to be done in
    lockstep); multi-socket adds additional challenges.

    >Windows punts by using the best available external counter, which might
    >fall back all the way to the horrible 1.1 MHz keyboard chip/RAM refresh
    >counter designed into the original 1981 model PC. :-(


    Ugh.

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  13. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:
    >
    > b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    > counts to get out of sync, and this is a much harder problem to fix
    > while still delivering sub-us precision and latency.
    >


    Which one of the problems is that?

    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  14. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:

    > a) The TSC might not run at a fixed frequency, but an OS can know when
    > the changes happen, and still use it to provide a fast return value: It
    > needs a userlevel library routine which just has to take the current TSC
    > count, multiply by the current scale factor (producing a triple-width
    > result), shift down by the current shift value, and add the current base
    > count. Total time for this operation is not much higher than the RDTSC
    > opcode which can easily take 20-30 cycles by itself on some cpus.


    On a Core Duo, the OS X call "mach_absolute_time()" takes ~132 clocks.
    With 3 RDTSCs and the triple-width scaling, I suppose that about fits.

    If the implementation is the general one, that doesn't rely on a fixed
    frequency, it could explain why the result is scaled to nanosecond
    resolution. (A companion call to mach_absolute_time provides a fraction
    for scaling, so if you want nanosecond resolution, you will be doing a
    superfluous scaling.) If a fixed frequency was assumed, the raw
    resolution could have been used in the result, saving a scaling
    operation.

    > b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    > counts to get out of sync, and this is a much harder problem to fix
    > while still delivering sub-us precision and latency.


    If Intel could have spared an extra pin, they could have added a proper
    timebase register incrementing asyncronously on an external timebase
    signal. At a modest frequency like 33MHz, there should be no problem
    distributing a timebase signal to multiple CPUs.

    --
    Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

  15. Re: x86 High Precision Event Timers support

    nospam@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?=) writes:

    >If Intel could have spared an extra pin, they could have added a proper
    >timebase register incrementing asyncronously on an external timebase
    >signal. At a modest frequency like 33MHz, there should be no problem
    >distributing a timebase signal to multiple CPUs.


    We found that the 10MHz used for this purpose on some SPARC processors
    is actually not fast enough; that's perhaps several hundred clock ticks which
    makes using this for precise accounting difficult.

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  16. Re: x86 High Precision Event Timers support

    Casper H.S. Dik wrote:

    > nospam@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?=) writes:
    >
    >>If Intel could have spared an extra pin, they could have added a proper
    >>timebase register incrementing asyncronously on an external timebase
    >>signal. At a modest frequency like 33MHz, there should be no problem
    >>distributing a timebase signal to multiple CPUs.

    >
    > We found that the 10MHz used for this purpose on some SPARC processors
    > is actually not fast enough; that's perhaps several hundred clock ticks
    > which makes using this for precise accounting difficult.


    The frontside bus clock should be sufficiently synchronous and identical on
    all CPUs used within one box - at least now. Xeons have a "real" frontside
    bus, and Opterons have a common hypertransport clock base (200MHz). This
    frequency scales with processor performance, so it should not be off so
    far. That's 10-15 cycles resolution on current CPUs, less than the rdtsc
    instruction takes.

    BTW clock skew: Note that for all practical purposes, the only requirement
    for a distributed timer is that no signal distributes faster than the
    timer.

    --
    Bernd Paysan
    "If you want it done right, you have to do it yourself"
    http://www.jwdt.com/~paysan/

  17. Re: x86 High Precision Event Timers support

    In article <1hhe09f.okrznz18byxeaN%nospam@ab-katrinedal.dk>,
    nospam@ab-katrinedal.dk says...
    >If Intel could have spared an extra pin, they could have added a proper
    >timebase register incrementing asyncronously on an external timebase
    >signal. At a modest frequency like 33MHz, there should be no problem
    >distributing a timebase signal to multiple CPUs.
    >

    And another pin to syncronize (reset) all the counters?

    I think the problem is that the TSC has two definitions: 1) number of
    clock ticks, and 2) absolute time that has passed. Unfortunately, TSC is a
    system level counter. What I would really want is four different counters,
    2 for each thread, and 2 for the system. When the OS starts a new thread
    the counters for that thread would be loaded. The 2 counters are 1 to
    count clock ticks (so if the processor clock changes, counter rate changes,
    this is good for getting (somewhat) consistent execution time), and 1
    counter that follows real world execution time (wall clock time). This
    counter, IMO, doesn't need to be completely accurate, say 100 Mhz (10 ns).

    It also would be nice if there were compare registers (e.g. MIPS), so that
    external hardware wasn't needed for timeslicing.

    - Tim

    NOT speaking for Unisys.


  18. Re: x86 High Precision Event Timers support

    Casper H.S. Dik writes:
    >the Opteron multi-core CPUs will
    >all need to run at the same frequency (though I'm not sure if setting
    >the core voltage/frequency of one half of the core affects the other
    >half at the same time or that these actions need to be done in
    >lockstep)


    On the Dual-Opteron 270 system we have, the two cores in the same
    socket always have the same voltage and the same frequency, but the
    other two in the other socket can be at a different speed.

    We have seen some instability on that system, maybe related to
    speed-changing (the system sometimes crashed when the load (and thus
    the speed) changed, and this went away when we used a kernel that does
    not change speeds).

    Followups set to comp.arch.

    - anton
    --
    M. Anton Ertl Some things have to be seen to be believed
    anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
    http://www.complang.tuwien.ac.at/anton/home.html

  19. Re: x86 High Precision Event Timers support

    Joe Seigh wrote:
    > Terje Mathisen wrote:
    >>
    >> b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >> counts to get out of sync, and this is a much harder problem to fix
    >> while still delivering sub-us precision and latency.
    >>

    >
    > Which one of the problems is that?


    Second, as in (b), was my intention. Sorry if I was unclear!
    >


    Terje
    --
    -
    "almost all programming can be viewed as an exercise in caching"

  20. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:
    > Joe Seigh wrote:
    >
    >> Terje Mathisen wrote:
    >>
    >>>
    >>> b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >>> counts to get out of sync, and this is a much harder problem to fix
    >>> while still delivering sub-us precision and latency.
    >>>

    >>
    >> Which one of the problems is that?

    >
    >
    > Second, as in (b), was my intention. Sorry if I was unclear!
    >


    Well, I assume you're using something like NTP to keep them "in sync".
    You can't actually keep them in absolute sync, just within a certain
    accuracy with a given precision or certainty. You cannot use separate
    clocks for synchronization like you can with a single clock unless you
    accept that synchronizing with multiple clocks will occasionally fail
    and allow erroneous results.

    Is the "problem" you can't use multiple clocks to synchronize with or
    is it something else?


    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

+ Reply to Thread
Page 1 of 2 1 2 LastLast