x86 High Precision Event Timers support - Embedded

This is a discussion on x86 High Precision Event Timers support - Embedded ; On Fri, 23 Jun 2006 09:54:35 +0200, Terje Mathisen wrote: > >b) On a multi-cpu/multi-core system, it is quite possible for the TSC >counts to get out of sync, and this is a much harder problem to fix >while still ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 39 of 39

Thread: x86 High Precision Event Timers support

  1. Re: x86 High Precision Event Timers support

    On Fri, 23 Jun 2006 09:54:35 +0200, Terje Mathisen
    wrote:

    >
    >b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >counts to get out of sync, and this is a much harder problem to fix
    >while still delivering sub-us precision and latency.


    At least with previous dual processor boards, the problem was that the
    reset was not performed exactly at the same time. So even if each
    processor was clocked from the same clock source, you could get
    inconsistent timing, if the RDTSC instruction was sometimes executed
    on processor 1, while in some cases it was executed on processor 2.

    If you can figure out which processor is executing the RDTSC
    instruction, this is not a problem. In Windows NT you could set the
    thread affinity to a specific processor and execute all RDTSC
    instructions in that thread to get consistent timing.

    >Windows punts by using the best available external counter, which might
    >fall back all the way to the horrible 1.1 MHz keyboard chip/RAM refresh
    >counter designed into the original 1981 model PC. :-(


    The 1.19 MHz is used only by the QueryPerformanceCounter service on
    single processors systems. On multiple processor system, this service
    returns the TSC count.

    Paul


  2. Re: x86 High Precision Event Timers support

    Paul Keinanen wrote:
    > On Fri, 23 Jun 2006 09:54:35 +0200, Terje Mathisen
    > wrote:
    >
    >
    >>b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >>counts to get out of sync, and this is a much harder problem to fix
    >>while still delivering sub-us precision and latency.

    >
    >
    > At least with previous dual processor boards, the problem was that the
    > reset was not performed exactly at the same time. So even if each
    > processor was clocked from the same clock source, you could get
    > inconsistent timing, if the RDTSC instruction was sometimes executed
    > on processor 1, while in some cases it was executed on processor 2.
    >
    > If you can figure out which processor is executing the RDTSC
    > instruction, this is not a problem. In Windows NT you could set the
    > thread affinity to a specific processor and execute all RDTSC
    > instructions in that thread to get consistent timing.
    >
    >

    You use per processor TSC scaling factors and offset corrections. You
    need a count of per thread context switches which you read before and
    after reading the cpuid, TSC, scale, and offset. If the before and after
    context switch counts match then everything you've read is for the
    same processor. This is ancient mainframe technology. I don't know
    if Linux has discovered it yet though. Maybe it's still a problem.


    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  3. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:
    >
    >b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >counts to get out of sync, and this is a much harder problem to fix
    >while still delivering sub-us precision and latency.


    I dispute that. Once the chips come out of reset, they're both being fed
    by the same clock signal. I don't believe it is possible for them to drift
    out of sync.

    The old Windows NT used to synchronize the TSCs on multiprocessor systems
    (the TSC is a writable register), but Windows XP does not seem to do that,
    so the TSCs can start out small integer millions of cycles apart. But,
    once that delta is set, the delta should stay constant.

    >Windows punts by using the best available external counter, which might
    >fall back all the way to the horrible 1.1 MHz keyboard chip/RAM refresh
    >counter designed into the original 1981 model PC. :-(


    Actually, Windows makes one of two choices. On a uniprocessor machine, it
    chooses the motherboard timer chip. That used to be 1.193MHz, as you say,
    although XP now runs it at 3x that frequency. But on a multiprocessor
    machine, it uses the cycle counter.
    --
    - Tim Roberts, timr@probo.com
    Providenza & Boekelheide, Inc.

  4. Re: x86 High Precision Event Timers support

    On Fri, 23 Jun 2006 21:06:09 -0400, Joe Seigh
    wrote:

    >You use per processor TSC scaling factors and offset corrections. You
    >need a count of per thread context switches which you read before and
    >after reading the cpuid, TSC, scale, and offset. If the before and after
    >context switch counts match then everything you've read is for the
    >same processor. This is ancient mainframe technology. I don't know
    >if Linux has discovered it yet though. Maybe it's still a problem.


    The problem is that this context switch counters are in most operating
    system in the kernel mode address space only, thus requiring a switch
    from user mode to kernel mode to read the thread context switch count.
    This requires validation of parameters etc. and other time consuming
    things.

    Paul


  5. Re: x86 High Precision Event Timers support

    Paul Keinanen wrote:
    > On Fri, 23 Jun 2006 21:06:09 -0400, Joe Seigh
    > wrote:
    >
    >
    >>You use per processor TSC scaling factors and offset corrections. You
    >>need a count of per thread context switches which you read before and
    >>after reading the cpuid, TSC, scale, and offset. If the before and after
    >>context switch counts match then everything you've read is for the
    >>same processor. This is ancient mainframe technology. I don't know
    >>if Linux has discovered it yet though. Maybe it's still a problem.

    >
    >
    > The problem is that this context switch counters are in most operating
    > system in the kernel mode address space only, thus requiring a switch
    > from user mode to kernel mode to read the thread context switch count.
    > This requires validation of parameters etc. and other time consuming
    > things.
    >


    There's no technical reason this information can't be in user space
    as well. There's may be a problem but it's certainlly not technical
    in nature.


    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  6. Re: x86 High Precision Event Timers support

    Joe Seigh wrote:
    > Well, I assume you're using something like NTP to keep them "in sync".


    Stock NTP is only useful at the 10-100 us level unless you have a Pulse
    Per Second (PPS) source available to every system. Getting into the ns
    domain requires much more heroic efforts, i.e. stuff like replacing the
    motherboard crystal with a Rb or Cs atomic clock, and then phase-locking
    this setup to UTC with a timing-optimized GPS like the now discontinued
    Motorola Oncore UT+.

    Even though this gets you a system clock with maybe 10-15 ns RMS offset
    from true UTC, you still need a relatively slow syscall to get at it,
    unless the OS itself responds to such requests with a user-level library
    function that uses RDTSC to extrapolate from the last system clock update.

    > You can't actually keep them in absolute sync, just within a certain
    > accuracy with a given precision or certainty. You cannot use separate
    > clocks for synchronization like you can with a single clock unless you
    > accept that synchronizing with multiple clocks will occasionally fail
    > and allow erroneous results.
    >
    > Is the "problem" you can't use multiple clocks to synchronize with or
    > is it something else?


    Right. You really want the fastest/cheapest possible timing source,
    which means TSC on x86 cpus, which also means independent clocks in each
    cpu/core.

    If the OS can present the illusion of 'a single shared TSC counter', and
    do it well enough that no user-level program ever notices, then it would
    be a Good Thing (TM).

    Terje
    --
    -
    "almost all programming can be viewed as an exercise in caching"

  7. Re: x86 High Precision Event Timers support

    Tim Roberts wrote:
    > Terje Mathisen wrote:
    >> b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >> counts to get out of sync, and this is a much harder problem to fix
    >> while still delivering sub-us precision and latency.

    >
    > I dispute that. Once the chips come out of reset, they're both being fed
    > by the same clock signal. I don't believe it is possible for them to drift
    > out of sync.


    Did you read the AMD paper that someone posted a link to a couple of
    days ago?

    It goes into specifics describing exactly how this can happen as a
    result of frequency throttling, something that can happen independently
    on each cpu/core.

    Terje
    --
    -
    "almost all programming can be viewed as an exercise in caching"

  8. Re: x86 High Precision Event Timers support

    Robert Redelmeier wrote:
    > In comp.os.linux.development.system David Hopwood wrote in part:
    >
    >>But since an OS or library that provides timing services cannot
    >>rely on running on a processor where the RDTSC frequency is fixed,
    >>this won't simplify any such OSes or libraries, until at some
    >>point it becomes practical to ignore older processors.

    >
    > This depends very much on the software quality requirements.


    Yes, a lot of software is of very poor quality ;-)

    --
    David Hopwood

  9. Re: x86 High Precision Event Timers support

    Casper H.S. Dik wrote:
    >Terje Mathisen writes:
    >>b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >>counts to get out of sync, and this is a much harder problem to fix
    >>while still delivering sub-us precision and latency.

    >
    >yeah, I didn't do multi-cpu/multi-core; the Opteron multi-core CPUs will
    >all need to run at the same frequency (though I'm not sure if setting
    >the core voltage/frequency of one half of the core affects the other
    >half at the same time or that these actions need to be done in
    >lockstep); multi-socket adds additional challenges.


    As I understand it this is true of Opteron but NOT AMD Athlon64 X2, as
    I understand it the X2 allows the cores can be controlled individually
    (presumably this is to get down power usage on desktops).

    At least that's the explaination I've heard for why you can see some
    really funky effects in Windows with X2's (but not Opterons!) unless
    you install a new enough "AMD Athlon 64 X2 Dual Core Processor Driver"
    (available from AMD but not Windows Update for some reason).

    Linux also had problem with X2's and CnQ for a while, probably because
    a lot of this was tested on Opteron... IIRC people produced test
    programs which showed that this effect was real before the patch was
    accepted.

    There's some rumors that a future X2 revision is going to run with the
    same TSC for all cores in a physical package/socket (they should have
    plenty of stable clocks to run it off, perhaps the HT clock).

  10. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:
    > Joe Seigh wrote:
    >>
    >> Is the "problem" you can't use multiple clocks to synchronize with or
    >> is it something else?

    >
    >
    > Right. You really want the fastest/cheapest possible timing source,
    > which means TSC on x86 cpus, which also means independent clocks in each
    > cpu/core.
    >
    > If the OS can present the illusion of 'a single shared TSC counter', and
    > do it well enough that no user-level program ever notices, then it would
    > be a Good Thing (TM).
    >


    You shouldn't use multiple clocks as a synchronization mechanism or arbiter.
    If you do, either you will get occasional errors or you will incur additional
    overhead as the get time code performs synchronization that the user could
    have done more efficiently. E.g. you can implement a much more efficient
    getticket() function than using gettimeofday() as a getticket function
    could ever be.

    Even hardware based solutions such as IBM mainframe TOD clock couldn't
    guarantee synchronization when multiple hardware clocks were present.
    The architecture guaranteed it but the hardware could not. For the
    64 bit TOD clock, the hardware checked on the bit 32 carry out whether
    the clocks were in sync and if not, queued an external interrupt. This
    was approximently once per second, so if the clock drift was bad enough
    the clocks could be out of sync enough in that short interval to have
    erroneous computation occur.


    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  11. Re: x86 High Precision Event Timers support

    Concerning this I've been always wondering why desktop- and server-CPUs like
    the x86s don't generally support a single timer that counts down to zero from
    a given value at core-frequency (on CPUs with clock-stepping at the highest
    frequency of course) and then generate an interrupt.
    Basing on this, the CPU could do everything from scheduling to multi-media
    -timers: Items being scheduled are dispatched to a queue and the interrupt
    is only generated for the first item in the queue.
    Schedulers could be improved f.e. by such a timer: Basing on a minimum sched-
    uling-frequency, this frequency could be raised if the number of proceses or
    thrads or threads raises above a certain limit. Or scheduling-slices could
    have differtent lengths basing on the priority of the thread or process.

  12. Re: x86 High Precision Event Timers support

    Torbjorn Lindgren wrote:
    > Casper H.S. Dik wrote:
    >>Terje Mathisen writes:
    >>>b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >>>counts to get out of sync, and this is a much harder problem to fix
    >>>while still delivering sub-us precision and latency.

    >>
    >>yeah, I didn't do multi-cpu/multi-core; the Opteron multi-core CPUs will
    >>all need to run at the same frequency (though I'm not sure if setting
    >>the core voltage/frequency of one half of the core affects the other
    >>half at the same time or that these actions need to be done in
    >>lockstep); multi-socket adds additional challenges.

    >
    > As I understand it this is true of Opteron but NOT AMD Athlon64 X2, as
    > I understand it the X2 allows the cores can be controlled individually
    > (presumably this is to get down power usage on desktops).


    My understanding is that the Athlon64 X2s still run the two cores at
    the same frequency. It's the laptop parts which can run the cores at
    different frequencies.

    Phil

    --
    http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

  13. Re: x86 High Precision Event Timers support

    Phil Armstrong writes:

    >My understanding is that the Athlon64 X2s still run the two cores at
    >the same frequency. It's the laptop parts which can run the cores at
    >different frequencies.


    My understanding is that the Athlon64 X2 and the Opteron parts are
    basically the same.

    (The latest socket 939 Opteron and the Athlon64 X2 cannot be told apart by
    software; they return the same CPUID values.

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  14. Re: x86 High Precision Event Timers support

    In comp.os.linux.development.system David Hopwood wrote in part:
    > Robert Redelmeier wrote:
    >> This depends very much on the software quality requirements.

    >
    > Yes, a lot of software is of very poor quality ;-)


    Agreed. Sometimes due to a misguided effort at high quality!

    -- Robert

    >


  15. Re: x86 High Precision Event Timers support

    [ Followup-To: set to comp.arch, feel free to disagree ]

    Spoon wrote:

    > I'm also wondering: Are there x86-based systems where a card equipped
    > with several PITs (e.g. ADLINK's PCI-8554) is a necessity?
    >
    > http://www.adlinktech.com/PD/web/PD_detail.php?pid=27


    Could anyone comment?

    When is one Programmable Interval Timer not enough?

  16. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:

    > Tim Roberts wrote:
    >
    >> Terje Mathisen wrote:
    >>
    >>> b) On a multi-cpu/multi-core system, it is quite possible for the TSC
    >>> counts to get out of sync, and this is a much harder problem to fix
    >>> while still delivering sub-us precision and latency.

    >>
    >> I dispute that. Once the chips come out of reset, they're both being
    >> fed by the same clock signal. I don't believe it is possible for them
    >> to drift out of sync.

    >
    > Did you read the AMD paper that someone posted a link to a couple of
    > days ago?
    >
    > It goes into specifics describing exactly how this can happen as a
    > result of frequency throttling, something that can happen independently
    > on each cpu/core.


    I think you are referring to:
    http://groups.google.com/group/fa.li...ae85a08ebd3aa4

  17. Re: x86 High Precision Event Timers support

    Joe Seigh wrote:
    > Terje Mathisen wrote:
    >> If the OS can present the illusion of 'a single shared TSC counter',
    >> and do it well enough that no user-level program ever notices, then it
    >> would be a Good Thing (TM).

    >
    > You shouldn't use multiple clocks as a synchronization mechanism or
    > arbiter.
    > If you do, either you will get occasional errors or you will incur
    > additional overhead


    Sure, and so what?

    Programmers would still like to be able to use the fastest/most precise
    clock available.

    If the cost of providing that is a (very) small chance of sometimes
    giving less accurate results, then so be it.

    I.e. let's assume I'm using this as a way to pace a sending queue, if I
    get a small glitch I might incur an extra lost packet/retransmit, but
    that's OK.

    Using the same type of best-effort timer to directly control radiation
    dosages would be criminal, right?

    Terje

    --
    -
    "almost all programming can be viewed as an exercise in caching"

  18. Re: x86 High Precision Event Timers support

    Terje Mathisen wrote:
    > Joe Seigh wrote:
    >
    >> Terje Mathisen wrote:
    >>
    >>> If the OS can present the illusion of 'a single shared TSC counter',
    >>> and do it well enough that no user-level program ever notices, then
    >>> it would be a Good Thing (TM).

    >>
    >>
    >> You shouldn't use multiple clocks as a synchronization mechanism or
    >> arbiter.
    >> If you do, either you will get occasional errors or you will incur
    >> additional overhead

    >
    >
    > Sure, and so what?
    >
    > Programmers would still like to be able to use the fastest/most precise
    > clock available.
    >
    > If the cost of providing that is a (very) small chance of sometimes
    > giving less accurate results, then so be it.
    >
    > I.e. let's assume I'm using this as a way to pace a sending queue, if I
    > get a small glitch I might incur an extra lost packet/retransmit, but
    > that's OK.


    Yes, you are not using it as a synchronization arbiter.

    --
    Joe Seigh

    When you get lemons, you make lemonade.
    When you get hardware, you make software.

  19. Re: x86 High Precision Event Timers support

    Joe Seigh wrote:

    > There's no technical reason this information can't be in user space
    > as well. There's may be a problem but it's certainlly not technical
    > in nature.


    Such a contex switch indicator/counter would be an extremely useful
    tool for many other purposes as well. Think of restartable algorithms,
    particularly about restartable pseudoatomic instruction sequences, as
    the TO-lock synchronization primitive.

    Best regards
    Piotr Wyderski

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2