[RFC patch 00/18] Trace Clock v2 - Kernel

This is a discussion on [RFC patch 00/18] Trace Clock v2 - Kernel ; Hi, I've cleaned up the LTTng timestamping code, renamed it into "trace clock", ripped apart the tsc_sync.c x86 code, added documentation (printk to the console when the tracing clock is used) about what to do when an unsync TSC is ...

+ Reply to Thread
Page 1 of 6 1 2 3 ... LastLast
Results 1 to 20 of 112

Thread: [RFC patch 00/18] Trace Clock v2

  1. [RFC patch 00/18] Trace Clock v2

    Hi,

    I've cleaned up the LTTng timestamping code, renamed it into "trace clock",
    ripped apart the tsc_sync.c x86 code, added documentation (printk to the console
    when the tracing clock is used) about what to do when an unsync TSC is detected.
    I however kept the cache-line bouncing workaround for now. However, I now
    synchronize the counters every jiffy with a per-cpu timer so it gives an upper
    bound to the time imprecision.

    The trace clock works with a trace_clock_get()/put(), so all the mechanic and
    overhead that might be required to provide correct timestamps on weird systems
    is *only* enabled when tracing is active.

    I plan to stick to this simple solution for now so we can get reliable tracing
    for 95ish% of systems out there, and to keep room for improvement (nice NTP-like
    schemes) for a later version. (sadly, given this is actually v2, I cannot say
    "let's keep that for v2")

    This patchset applies on top of 2.6.28-rc3.

    Mathieu

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [RFC patch 16/18] MIPS create empty sync_core()

    Needed by architecture-independant tsc-sync.c.

    Signed-off-by: Mathieu Desnoyers
    CC: Ralf Baechle
    CC: Peter Zijlstra
    ---
    arch/mips/include/asm/barrier.h | 6 ++++++
    1 file changed, 6 insertions(+)

    Index: linux.trees.git/arch/mips/include/asm/barrier.h
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/include/asm/barrier.h 2008-10-30 20:22:49.000000000 -0400
    +++ linux.trees.git/arch/mips/include/asm/barrier.h 2008-11-07 00:16:28.000000000 -0500
    @@ -152,4 +152,10 @@
    #define smp_llsc_rmb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
    #define smp_llsc_wmb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")

    +/*
    + * MIPS does not have any instruction to serialize instruction execution on the
    + * core.
    + */
    +#define sync_core()
    +
    #endif /* __ASM_BARRIER_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [RFC patch 07/18] Trace clock core

    32 to 64 bits clock extension. Extracts 64 bits tsc from a [1..32]
    bits counter, kept up to date by periodical timer interrupt. Lockless.

    It's actually a specialized version of cnt_32_to_63.h which does the following
    in addition :
    - Uses per-cpu data to keep track of counters.
    - It limits cache-line bouncing
    - I supports machines with non-synchronized TSCs.
    - Does not require read barriers, which can be slow on some architectures.
    - Supports a full 64-bits counter (well, just one bit more than 63 is not really
    a big deal when we talk about timestamp counters). If 2^64 is considered long
    enough between overflows, 2^63 is normally considered long enough too.
    - The periodical update of the value is insured by the infrastructure. There is
    no assumption that the counter is read frequently, because we cannot assume
    that given the events for which tracing is enabled can be dynamically
    selected.
    - Supports counters of various width (32-bits and below) by changing the
    HW_BITS define.

    What cnt_32_to_63.h does that this patch doesn't do :
    - It has a global counter, which removes the need to do an update periodically
    on _each_ cpu. This can be important in a dynamic tick system where CPUs need
    to sleep to save power. It is therefore well suited for systems reading a
    global clock expected to be _exactly_ synchronized across cores (where time
    can never ever go backward).

    Q:

    > do you actually use the RCU internals? or do you just reimplement an RCU
    > algorithm?
    >


    A:

    Nope, I don't use RCU internals in this code. Preempt disable seemed
    like the best way to handle this utterly short code path and I wanted
    the write side to be fast enough to be called periodically. What I do is:

    - Disable preemption at the read-side :
    it makes sure the pointer I get will point to a data structure that
    will never change while I am in the preempt disabled code. (see *)
    - I use per-cpu data to allow the read-side to be as fast as possible
    (only need to disable preemption, does not race against other CPUs and
    won't generate cache line bouncing). It also allows dealing with
    unsynchronized TSCs if needed.
    - Periodical write side : it's called from an IPI running on each CPU.

    (*) We expect the read-side (preempt off region) to last shorter than
    the interval between IPI updates so we can guarantee the data structure
    it uses won't be modified underneath it. Since the IPI update is
    launched each seconds or so (depends on the frequency of the counter we
    are trying to extend), it's more than ok.

    Changelog:

    - Support [1..32] bits -> 64 bits.

    I volountarily limit the code to use at most 32 bits of the hardware clock for
    performance considerations. If this is a problem it could be changed. Also, the
    algorithm is aimed at a 32 bits architecture. The code becomes muuuch simpler on
    a 64 bits arch, since we can do the updates atomically.

    Signed-off-by: Mathieu Desnoyers
    CC: Nicolas Pitre
    CC: Ralf Baechle
    CC: benh@kernel.crashing.org
    CC: paulus@samba.org
    CC: David Miller
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    CC: linux-arch@vger.kernel.org
    ---
    init/Kconfig | 14 +
    kernel/Makefile | 3
    kernel/trace/Makefile | 1
    kernel/trace/trace-clock-32-to-64.c | 286 ++++++++++++++++++++++++++++++++++++
    4 files changed, 302 insertions(+), 2 deletions(-)

    Index: linux.trees.git/kernel/trace/Makefile
    ================================================== =================
    --- linux.trees.git.orig/kernel/trace/Makefile 2008-10-30 20:22:52.000000000 -0400
    +++ linux.trees.git/kernel/trace/Makefile 2008-11-07 00:11:23.000000000 -0500
    @@ -24,5 +24,6 @@ obj-$(CONFIG_NOP_TRACER) += trace_nop.o
    obj-$(CONFIG_STACK_TRACER) += trace_stack.o
    obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
    obj-$(CONFIG_BOOT_TRACER) += trace_boot.o
    +obj-$(CONFIG_HAVE_TRACE_CLOCK_32_TO_64) += trace-clock-32-to-64.o

    libftrace-y := ftrace.o
    Index: linux.trees.git/kernel/trace/trace-clock-32-to-64.c
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/kernel/trace/trace-clock-32-to-64.c 2008-11-07 00:11:06.000000000 -0500
    @@ -0,0 +1,286 @@
    +/*
    + * kernel/trace/trace-clock-32-to-64.c
    + *
    + * (C) Copyright 2006,2007,2008 -
    + * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
    + *
    + * Extends a 32 bits clock source to a full 64 bits count, readable atomically
    + * from any execution context.
    + *
    + * notes :
    + * - trace clock 32->64 bits extended timer-based clock cannot be used for early
    + * tracing in the boot process, as it depends on timer interrupts.
    + * - The timer is only on one CPU to support hotplug.
    + * - We have the choice between schedule_delayed_work_on and an IPI to get each
    + * CPU to write the heartbeat. IPI has been chosen because it is considered
    + * faster than passing through the timer to get the work scheduled on all the
    + * CPUs.
    + */
    +
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include /* FIX for m68k local_irq_enable in on_each_cpu */
    +
    +/*
    + * Number of hardware clock bits. The higher order bits are expected to be 0.
    + * If the hardware clock source has more than 32 bits, the bits higher than the
    + * 32nd will be truncated by a cast to a 32 bits unsigned. Range : 1 - 32.
    + * (too few bits would be unrealistic though, since we depend on the timer to
    + * detect the overflows).
    + */
    +#define HW_BITS 32
    +
    +#define HW_BITMASK ((1ULL << HW_BITS) - 1)
    +#define HW_LSB(hw) ((hw) & HW_BITMASK)
    +#define SW_MSB(sw) ((sw) & ~HW_BITMASK)
    +
    +/* Expected maximum interrupt latency in ms : 15ms, *2 for security */
    +#define EXPECTED_INTERRUPT_LATENCY 30
    +
    +static DEFINE_MUTEX(synthetic_tsc_mutex);
    +static int synthetic_tsc_refcount; /* Number of readers */
    +static int synthetic_tsc_enabled; /* synth. TSC enabled on all online CPUs */
    +
    +atomic_t trace_clock;
    +EXPORT_SYMBOL(trace_clock);
    +
    +static DEFINE_PER_CPU(struct timer_list, tsc_timer);
    +static unsigned int precalc_expire;
    +
    +struct synthetic_tsc_struct {
    + union {
    + u64 val;
    + struct {
    +#ifdef __BIG_ENDIAN
    + u32 msb;
    + u32 lsb;
    +#else
    + u32 lsb;
    + u32 msb;
    +#endif
    + } sel;
    + } tsc[2];
    + unsigned int index; /* Index of the current synth. tsc. */
    +};
    +
    +static DEFINE_PER_CPU(struct synthetic_tsc_struct, synthetic_tsc);
    +
    +/* Called from IPI : either in interrupt or process context */
    +static void update_synthetic_tsc(void)
    +{
    + struct synthetic_tsc_struct *cpu_synth;
    + u32 tsc;
    +
    + preempt_disable();
    + cpu_synth = &per_cpu(synthetic_tsc, smp_processor_id());
    + tsc = trace_clock_read32(); /* Hardware clocksource read */
    +
    + if (tsc < HW_LSB(cpu_synth->tsc[cpu_synth->index].sel.lsb)) {
    + unsigned int new_index = 1 - cpu_synth->index; /* 0 <-> 1 */
    + /*
    + * Overflow
    + * Non atomic update of the non current synthetic TSC, followed
    + * by an atomic index change. There is no write concurrency,
    + * so the index read/write does not need to be atomic.
    + */
    + cpu_synth->tsc[new_index].val =
    + (SW_MSB(cpu_synth->tsc[cpu_synth->index].val)
    + | (u64)tsc) + (1ULL << HW_BITS);
    + cpu_synth->index = new_index; /* atomic change of index */
    + } else {
    + /*
    + * No overflow : We know that the only bits changed are
    + * contained in the 32 LSBs, which can be written to atomically.
    + */
    + cpu_synth->tsc[cpu_synth->index].sel.lsb =
    + SW_MSB(cpu_synth->tsc[cpu_synth->index].sel.lsb) | tsc;
    + }
    + preempt_enable();
    +}
    +
    +/* Called from buffer switch : in _any_ context (even NMI) */
    +u64 notrace trace_clock_read_synthetic_tsc(void)
    +{
    + struct synthetic_tsc_struct *cpu_synth;
    + u64 ret;
    + unsigned int index;
    + u32 tsc;
    +
    + preempt_disable_notrace();
    + cpu_synth = &per_cpu(synthetic_tsc, smp_processor_id());
    + index = cpu_synth->index; /* atomic read */
    + tsc = trace_clock_read32(); /* Hardware clocksource read */
    +
    + /* Overflow detection */
    + if (unlikely(tsc < HW_LSB(cpu_synth->tsc[index].sel.lsb)))
    + ret = (SW_MSB(cpu_synth->tsc[index].val) | (u64)tsc)
    + + (1ULL << HW_BITS);
    + else
    + ret = SW_MSB(cpu_synth->tsc[index].val) | (u64)tsc;
    + preempt_enable_notrace();
    + return ret;
    +}
    +EXPORT_SYMBOL_GPL(trace_clock_read_synthetic_tsc) ;
    +
    +static void synthetic_tsc_ipi(void *info)
    +{
    + update_synthetic_tsc();
    +}
    +
    +/*
    + * tsc_timer_fct : - Timer function synchronizing synthetic TSC.
    + * @data: unused
    + *
    + * Guarantees at least 1 execution before low word of TSC wraps.
    + */
    +static void tsc_timer_fct(unsigned long data)
    +{
    + update_synthetic_tsc();
    +
    + per_cpu(tsc_timer, smp_processor_id()).expires =
    + jiffies + precalc_expire;
    + add_timer_on(&per_cpu(tsc_timer, smp_processor_id()),
    + smp_processor_id());
    +}
    +
    +/*
    + * precalc_stsc_interval: - Precalculates the interval between the clock
    + * wraparounds.
    + */
    +static int __init precalc_stsc_interval(void)
    +{
    + precalc_expire =
    + (HW_BITMASK / ((trace_clock_frequency() / HZ
    + * trace_clock_freq_scale()) << 1)
    + - 1 - (EXPECTED_INTERRUPT_LATENCY * HZ / 1000)) >> 1;
    + WARN_ON(precalc_expire == 0);
    + printk(KERN_DEBUG "Synthetic TSC timer will fire each %u jiffies.\n",
    + precalc_expire);
    + return 0;
    +}
    +
    +static void prepare_synthetic_tsc(int cpu)
    +{
    + struct synthetic_tsc_struct *cpu_synth;
    + u64 local_count;
    +
    + cpu_synth = &per_cpu(synthetic_tsc, cpu);
    + local_count = trace_clock_read_synthetic_tsc();
    + cpu_synth->tsc[0].val = local_count;
    + cpu_synth->index = 0;
    + smp_wmb(); /* Writing in data of CPU about to come up */
    + init_timer(&per_cpu(tsc_timer, cpu));
    + per_cpu(tsc_timer, cpu).function = tsc_timer_fct;
    + per_cpu(tsc_timer, cpu).expires = jiffies + precalc_expire;
    +}
    +
    +static void enable_synthetic_tsc(int cpu)
    +{
    + smp_call_function_single(cpu, synthetic_tsc_ipi, NULL, 1);
    + add_timer_on(&per_cpu(tsc_timer, cpu), cpu);
    +}
    +
    +static void disable_synthetic_tsc(int cpu)
    +{
    + del_timer_sync(&per_cpu(tsc_timer, cpu));
    +}
    +
    +/*
    + * hotcpu_callback - CPU hotplug callback
    + * @nb: notifier block
    + * @action: hotplug action to take
    + * @hcpu: CPU number
    + *
    + * Sets the new CPU's current synthetic TSC to the same value as the
    + * currently running CPU.
    + *
    + * Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD)
    + */
    +static int __cpuinit hotcpu_callback(struct notifier_block *nb,
    + unsigned long action,
    + void *hcpu)
    +{
    + unsigned int hotcpu = (unsigned long)hcpu;
    +
    + switch (action) {
    + case CPU_UP_PREPARE:
    + case CPU_UP_PREPARE_FROZEN:
    + if (synthetic_tsc_refcount)
    + prepare_synthetic_tsc(hotcpu);
    + break;
    + case CPU_ONLINE:
    + case CPU_ONLINE_FROZEN:
    + if (synthetic_tsc_refcount)
    + enable_synthetic_tsc(hotcpu);
    + break;
    +#ifdef CONFIG_HOTPLUG_CPU
    + case CPU_UP_CANCELED:
    + case CPU_UP_CANCELED_FROZEN:
    + case CPU_DEAD:
    + case CPU_DEAD_FROZEN:
    + if (synthetic_tsc_refcount)
    + disable_synthetic_tsc(hotcpu);
    + break;
    +#endif /* CONFIG_HOTPLUG_CPU */
    + }
    + return NOTIFY_OK;
    +}
    +
    +void get_synthetic_tsc(void)
    +{
    + int cpu;
    +
    + get_online_cpus();
    + mutex_lock(&synthetic_tsc_mutex);
    + if (synthetic_tsc_refcount++)
    + goto end;
    +
    + synthetic_tsc_enabled = 1;
    + for_each_online_cpu(cpu) {
    + prepare_synthetic_tsc(cpu);
    + enable_synthetic_tsc(cpu);
    + }
    +end:
    + mutex_unlock(&synthetic_tsc_mutex);
    + put_online_cpus();
    +}
    +EXPORT_SYMBOL_GPL(get_synthetic_tsc);
    +
    +void put_synthetic_tsc(void)
    +{
    + int cpu;
    +
    + get_online_cpus();
    + mutex_lock(&synthetic_tsc_mutex);
    + WARN_ON(synthetic_tsc_refcount <= 0);
    + if (synthetic_tsc_refcount != 1 || !synthetic_tsc_enabled)
    + goto end;
    +
    + for_each_online_cpu(cpu)
    + disable_synthetic_tsc(cpu);
    + synthetic_tsc_enabled = 0;
    +end:
    + synthetic_tsc_refcount--;
    + mutex_unlock(&synthetic_tsc_mutex);
    + put_online_cpus();
    +}
    +EXPORT_SYMBOL_GPL(put_synthetic_tsc);
    +
    +/* Called from CPU 0, before any tracing starts, to init each structure */
    +static int __init init_synthetic_tsc(void)
    +{
    + precalc_stsc_interval();
    + hotcpu_notifier(hotcpu_callback, 3);
    + return 0;
    +}
    +
    +/* Before SMP is up */
    +early_initcall(init_synthetic_tsc);
    Index: linux.trees.git/init/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/init/Kconfig 2008-11-07 00:07:23.000000000 -0500
    +++ linux.trees.git/init/Kconfig 2008-11-07 00:11:06.000000000 -0500
    @@ -340,6 +340,20 @@ config HAVE_UNSTABLE_SCHED_CLOCK
    config HAVE_GET_CYCLES
    def_bool n

    +#
    +# Architectures with a specialized tracing clock should select this.
    +#
    +config HAVE_TRACE_CLOCK
    + def_bool n
    +
    +#
    +# Architectures with only a 32-bits clock source should select this.
    +#
    +config HAVE_TRACE_CLOCK_32_TO_64
    + bool
    + default y if (!HAVE_TRACE_CLOCK)
    + default n if HAVE_TRACE_CLOCK
    +
    config GROUP_SCHED
    bool "Group CPU scheduler"
    depends on EXPERIMENTAL
    Index: linux.trees.git/kernel/Makefile
    ================================================== =================
    --- linux.trees.git.orig/kernel/Makefile 2008-10-30 20:22:52.000000000 -0400
    +++ linux.trees.git/kernel/Makefile 2008-11-07 00:11:52.000000000 -0500
    @@ -88,8 +88,7 @@ obj-$(CONFIG_MARKERS) += marker.o
    obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
    obj-$(CONFIG_LATENCYTOP) += latencytop.o
    obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
    -obj-$(CONFIG_FUNCTION_TRACER) += trace/
    -obj-$(CONFIG_TRACING) += trace/
    +obj-y += trace/
    obj-$(CONFIG_SMP) += sched_cpupri.o

    ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [RFC patch 11/18] LTTng timestamp sh

    This patch adds the timestamping mechanism in the trace-clock.h arch header
    file. The new timestamp functions use the TMU channel 1.

    This code only works if the TMU channel 1 is initialized during the kernel boot.

    Big fat warning(TM) from Mathieu Desnoyers :

    This patch seems to assume TMU channel 1 is setup at boot. Is it always true on
    all SuperH boards ? Is there some Kconfig selection that should be done here ?
    Make sure this patch does not break get_cycles on SuperH before merging.

    From: Giuseppe Cavallaro
    Signed-off-by: Giuseppe Cavallaro
    Signed-off-by: Mathieu Desnoyers
    CC: Paul Mundt
    CC: linux-sh@vger.kernel.org
    ---
    arch/sh/Kconfig | 2 +
    arch/sh/include/asm/timex.h | 7 +++-
    arch/sh/include/asm/trace-clock.h | 61 ++++++++++++++++++++++++++++++++++++++
    3 files changed, 68 insertions(+), 2 deletions(-)

    Index: linux.trees.git/arch/sh/include/asm/timex.h
    ================================================== =================
    --- linux.trees.git.orig/arch/sh/include/asm/timex.h 2008-09-30 11:38:51.000000000 -0400
    +++ linux.trees.git/arch/sh/include/asm/timex.h 2008-11-07 00:12:47.000000000 -0500
    @@ -6,13 +6,16 @@
    #ifndef __ASM_SH_TIMEX_H
    #define __ASM_SH_TIMEX_H

    -#define CLOCK_TICK_RATE (CONFIG_SH_PCLK_FREQ / 4) /* Underlying HZ */
    +#include
    +#include
    +
    +#define CLOCK_TICK_RATE (HZ * 100000UL)

    typedef unsigned long long cycles_t;

    static __inline__ cycles_t get_cycles (void)
    {
    - return 0;
    + return 0xffffffff - ctrl_inl(TMU1_TCNT);
    }

    #endif /* __ASM_SH_TIMEX_H */
    Index: linux.trees.git/arch/sh/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/sh/Kconfig 2008-11-07 00:06:06.000000000 -0500
    +++ linux.trees.git/arch/sh/Kconfig 2008-11-07 00:12:47.000000000 -0500
    @@ -11,6 +11,8 @@ config SUPERH
    select HAVE_CLK
    select HAVE_IDE
    select HAVE_OPROFILE
    + select HAVE_TRACE_CLOCK
    + select HAVE_TRACE_CLOCK_32_TO_64
    select HAVE_GENERIC_DMA_COHERENT
    select HAVE_IOREMAP_PROT if MMU
    help
    Index: linux.trees.git/arch/sh/include/asm/trace-clock.h
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/arch/sh/include/asm/trace-clock.h 2008-11-07 00:12:47.000000000 -0500
    @@ -0,0 +1,61 @@
    +/*
    + * Copyright (C) 2007,2008 Giuseppe Cavallaro
    + * Mathieu Desnoyers
    + *
    + * Trace clock definitions for SuperH.
    + */
    +
    +#ifndef _ASM_SH_TRACE_CLOCK_H
    +#define _ASM_SH_TRACE_CLOCK_H
    +
    +#include
    +#include
    +
    +extern u64 trace_clock_read_synthetic_tsc(void);
    +
    +static inline u32 trace_clock_get_timestamp32(void)
    +{
    + return get_cycles();
    +}
    +
    +static inline u64 trace_clock_get_timestamp64(void)
    +{
    + return trace_clock_read_synthetic_tsc();
    +}
    +
    +static inline void trace_clock_add_timestamp(unsigned long ticks)
    +{ }
    +
    +static inline unsigned int trace_clock_frequency(void)
    +{
    + unsigned long rate;
    + struct clk *tmu1_clk;
    +
    + tmu1_clk = clk_get(NULL, "tmu1_clk");
    + rate = clk_get_rate(tmu1_clk);
    +
    + return (unsigned int)rate;
    +}
    +
    +static inline u32 trace_clock_freq_scale(void)
    +{
    + return 1;
    +}
    +
    +extern void get_synthetic_tsc(void);
    +extern void put_synthetic_tsc(void);
    +
    +static inline void get_trace_clock(void)
    +{
    + get_synthetic_tsc();
    +}
    +
    +static inline void put_trace_clock(void)
    +{
    + put_synthetic_tsc();
    +}
    +
    +static inline void set_trace_clock_is_sync(int state)
    +{
    +}
    +#endif /* _ASM_SH_TRACE_CLOCK_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [RFC patch 10/18] Sparc64 : Trace clock

    Implement sparc64 trace clock.

    Signed-off-by: Mathieu Desnoyers
    CC: David Miller
    CC: linux-arch@vger.kernel.org
    ---
    arch/sparc/include/asm/trace-clock.h | 46 +++++++++++++++++++++++++++++++++++
    arch/sparc64/Kconfig | 1
    2 files changed, 47 insertions(+)

    Index: linux.trees.git/arch/sparc64/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/sparc64/Kconfig 2008-11-07 00:09:35.000000000 -0500
    +++ linux.trees.git/arch/sparc64/Kconfig 2008-11-07 00:12:26.000000000 -0500
    @@ -16,6 +16,7 @@ config SPARC64
    select HAVE_GET_CYCLES
    select HAVE_LMB
    select HAVE_ARCH_KGDB
    + select HAVE_TRACE_CLOCK
    select USE_GENERIC_SMP_HELPERS if SMP
    select HAVE_ARCH_TRACEHOOK
    select ARCH_WANT_OPTIONAL_GPIOLIB
    Index: linux.trees.git/arch/sparc/include/asm/trace-clock.h
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/arch/sparc/include/asm/trace-clock.h 2008-11-07 00:12:04.000000000 -0500
    @@ -0,0 +1,46 @@
    +/*
    + * Copyright (C) 2008, Mathieu Desnoyers
    + *
    + * Trace clock definitions for Sparc64.
    + */
    +
    +#ifndef _ASM_SPARC_TRACE_CLOCK_H
    +#define _ASM_SPARC_TRACE_CLOCK_H
    +
    +#include
    +
    +static inline u32 trace_clock_read32(void)
    +{
    + return get_cycles();
    +}
    +
    +static inline u64 trace_clock_read64(void)
    +{
    + return get_cycles();
    +}
    +
    +static inline void trace_clock_add_timestamp(unsigned long ticks)
    +{ }
    +
    +static inline unsigned int trace_clock_frequency(void)
    +{
    + return get_cycles_rate();
    +}
    +
    +static inline u32 trace_clock_freq_scale(void)
    +{
    + return 1;
    +}
    +
    +static inline void get_trace_clock(void)
    +{
    +}
    +
    +static inline void put_trace_clock(void)
    +{
    +}
    +
    +static inline void set_trace_clock_is_sync(int state)
    +{
    +}
    +#endif /* _ASM_SPARC_TRACE_CLOCK_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [RFC patch 03/18] get_cycles() : sparc64 HAVE_GET_CYCLES

    This patch selects HAVE_GET_CYCLES and makes sure get_cycles_barrier() and
    get_cycles_rate() are implemented.

    Changelog :
    - Use tb_ticks_per_usec * 1000000 in get_cycles_rate().

    Signed-off-by: Mathieu Desnoyers
    Acked-by: David S. Miller
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    CC: linux-arch@vger.kernel.org
    ---
    arch/sparc/include/asm/timex_64.h | 19 ++++++++++++++++++-
    arch/sparc64/Kconfig | 1 +
    arch/sparc64/kernel/time.c | 3 ++-
    3 files changed, 21 insertions(+), 2 deletions(-)

    Index: linux.trees.git/arch/sparc64/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/sparc64/Kconfig 2008-10-30 20:22:50.000000000 -0400
    +++ linux.trees.git/arch/sparc64/Kconfig 2008-11-07 00:09:35.000000000 -0500
    @@ -13,6 +13,7 @@ config SPARC64
    default y
    select HAVE_FUNCTION_TRACER
    select HAVE_IDE
    + select HAVE_GET_CYCLES
    select HAVE_LMB
    select HAVE_ARCH_KGDB
    select USE_GENERIC_SMP_HELPERS if SMP
    Index: linux.trees.git/arch/sparc/include/asm/timex_64.h
    ================================================== =================
    --- linux.trees.git.orig/arch/sparc/include/asm/timex_64.h 2008-09-30 11:38:51.000000000 -0400
    +++ linux.trees.git/arch/sparc/include/asm/timex_64.h 2008-11-07 00:09:35.000000000 -0500
    @@ -12,7 +12,24 @@

    /* Getting on the cycle counter on sparc64. */
    typedef unsigned long cycles_t;
    -#define get_cycles() tick_ops->get_tick()
    +
    +static inline cycles_t get_cycles(void)
    +{
    + return tick_ops->get_tick();
    +}
    +
    +/* get_cycles instruction is synchronized on sparc64 */
    +static inline void get_cycles_barrier(void)
    +{
    + return;
    +}
    +
    +extern unsigned long tb_ticks_per_usec;
    +
    +static inline cycles_t get_cycles_rate(void)
    +{
    + return tb_ticks_per_usec * 1000000UL;
    +}

    #define ARCH_HAS_READ_CURRENT_TIMER

    Index: linux.trees.git/arch/sparc64/kernel/time.c
    ================================================== =================
    --- linux.trees.git.orig/arch/sparc64/kernel/time.c 2008-11-07 00:06:06.000000000 -0500
    +++ linux.trees.git/arch/sparc64/kernel/time.c 2008-11-07 00:09:35.000000000 -0500
    @@ -793,7 +793,8 @@ static void __init setup_clockevent_mult
    sparc64_clockevent.mult = mult;
    }

    -static unsigned long tb_ticks_per_usec __read_mostly;
    +unsigned long tb_ticks_per_usec __read_mostly;
    +EXPORT_SYMBOL_GPL(tb_ticks_per_usec);

    void __delay(unsigned long loops)
    {

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [RFC patch 05/18] get_cycles() : MIPS HAVE_GET_CYCLES_32

    partly reverts commit efb9ca08b5a2374b29938cdcab417ce4feb14b54. Selects
    HAVE_GET_CYCLES_32 only on CPUs where it is safe to use it.

    Currently consider the "_WORKAROUND" cases for 4000 and 4400 to be unsafe, but
    should probably add other sub-architecture to the blacklist.

    Do not define HAVE_GET_CYCLES because MIPS does not provide 64-bit tsc (only
    32-bits).

    Signed-off-by: Mathieu Desnoyers
    CC: Ralf Baechle
    CC: David Miller
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    CC: linux-arch@vger.kernel.org
    ---
    arch/mips/Kconfig | 4 ++++
    arch/mips/include/asm/timex.h | 25 +++++++++++++++++++++++++
    2 files changed, 29 insertions(+)

    Index: linux.trees.git/arch/mips/include/asm/timex.h
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/include/asm/timex.h 2008-10-30 20:22:50.000000000 -0400
    +++ linux.trees.git/arch/mips/include/asm/timex.h 2008-11-07 00:10:10.000000000 -0500
    @@ -29,14 +29,39 @@
    * which isn't an evil thing.
    *
    * We know that all SMP capable CPUs have cycle counters.
    + *
    + * Mathieu Desnoyers
    + * HAVE_GET_CYCLES makes sure that this case is handled properly :
    + *
    + * Ralf Baechle :
    + * This avoids us executing an mfc0 c0_count instruction on processors which
    + * don't have but also on certain R4000 and R4400 versions where reading from
    + * the count register just in the very moment when its value equals c0_compare
    + * will result in the timer interrupt getting lost.
    */

    typedef unsigned int cycles_t;

    +#ifdef HAVE_GET_CYCLES_32
    +static inline cycles_t get_cycles(void)
    +{
    + return read_c0_count();
    +}
    +
    +static inline void get_cycles_barrier(void)
    +{
    +}
    +
    +static inline cycles_t get_cycles_rate(void)
    +{
    + return CLOCK_TICK_RATE;
    +}
    +#else
    static inline cycles_t get_cycles(void)
    {
    return 0;
    }
    +#endif

    #endif /* __KERNEL__ */

    Index: linux.trees.git/arch/mips/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/Kconfig 2008-11-07 00:06:06.000000000 -0500
    +++ linux.trees.git/arch/mips/Kconfig 2008-11-07 00:10:10.000000000 -0500
    @@ -1611,6 +1611,10 @@ config CPU_R4000_WORKAROUNDS
    config CPU_R4400_WORKAROUNDS
    bool

    +config HAVE_GET_CYCLES_32
    + def_bool y
    + depends on !CPU_R4400_WORKAROUNDS
    +
    #
    # Use the generic interrupt handling code in kernel/irq/:
    #

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. [RFC patch 12/18] LTTng - TSC synchronicity test

    Test TSC synchronization across CPUs. Architecture independant and can therefore
    be used on various architectures. Aims at testing the TSC synchronization on a
    running system (not only at early boot), with minimal impact on interrupt
    latency.

    I've written this code before x86 tsc_sync.c existed and given it worked well
    for my needs, I never switched to tsc_sync.c. Although it has the same goal, it
    does it a bit differently :

    tsc_sync looks at the cycle counters on two CPUs to see if one compared to the
    other are going backward when read in loop. The LTTng code synchronizes both
    cores with a counter used as a memory barrier and then reads the two TSCs at a
    delta equal to the cache line exchange. Instruction and data caches are primed.
    This test is repeated in loops to insure we deal with MCE, NMIs which could skew
    the results.

    The problem I see with tsc_sync.c is that is one of the two CPUs is delayed by
    an interrupt handler (for way too long) while the other CPU is doing its
    check_tsc_warp() execution, and if the CPU with the lowest TSC values runs
    first, this code will fail to detect unsynchronized CPUs.

    This sync test code does not have this problem.

    A following patch replaces the x86 tsc_sync.c code by this architecture
    independant code.

    This code also adds the kernel parameter
    force_tsc_sync=1
    which forces resynchronization of CPU TSCs when a CPU is hotplugged.

    Signed-off-by: Mathieu Desnoyers
    CC: Ingo Molnar
    CC: Jan Kiszka
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    ---
    Documentation/kernel-parameters.txt | 4
    init/Kconfig | 7
    kernel/time/Makefile | 1
    kernel/time/tsc-sync.c | 313 ++++++++++++++++++++++++++++++++++++
    4 files changed, 325 insertions(+)

    Index: linux.trees.git/kernel/time/tsc-sync.c
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/kernel/time/tsc-sync.c 2008-11-07 00:13:01.000000000 -0500
    @@ -0,0 +1,313 @@
    +/*
    + * kernel/time/tsc-sync.c
    + *
    + * Test TSC synchronization
    + *
    + * marks the tsc as unstable _and_ keep a simple "_tsc_is_sync" variable, which
    + * is fast to read when a simple test must determine which clock source to use
    + * for kernel tracing.
    + *
    + * - CPU init :
    + *
    + * We check whether all boot CPUs have their TSC's synchronized,
    + * print a warning if not and turn off the TSC clock-source.
    + *
    + * Only two CPUs may participate - they can enter in any order.
    + * ( The serial nature of the boot logic and the CPU hotplug lock
    + * protects against more than 2 CPUs entering this code.
    + *
    + * - When CPUs are up :
    + *
    + * TSC synchronicity of all CPUs can be checked later at run-time by calling
    + * test_tsc_synchronization().
    + *
    + * Copyright 2007, 2008
    + * Mathieu Desnoyers
    + */
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +
    +#define MAX_CYCLES_DELTA 1000ULL
    +
    +/*
    + * Number of loops to take care of MCE, NMIs, SMIs.
    + */
    +#define NR_LOOPS 10
    +
    +static DEFINE_MUTEX(tscsync_mutex);
    +
    +struct sync_data {
    + int nr_waits;
    + int wait_sync;
    + cycles_t tsc_count;
    +} ____cacheline_aligned;
    +
    +/* 0 is master, 1 is slave */
    +static struct sync_data sync_data[2] = {
    + [0 ... 1] = {
    + .nr_waits = 3 * NR_LOOPS + 1,
    + .wait_sync = 3 * NR_LOOPS + 1,
    + },
    +};
    +
    +int _tsc_is_sync = 1;
    +EXPORT_SYMBOL(_tsc_is_sync);
    +
    +static int force_tsc_sync;
    +static cycles_t slave_offset;
    +static int slave_offset_ready; /* for 32-bits architectures */
    +
    +static int __init force_tsc_sync_setup(char *str)
    +{
    + force_tsc_sync = simple_strtoul(str, NULL, 0);
    + return 1;
    +}
    +__setup("force_tsc_sync=", force_tsc_sync_setup);
    +
    +/*
    + * Mark it noinline so we make sure it is not unrolled.
    + * Wait until value is reached.
    + */
    +static noinline void tsc_barrier(long this_cpu)
    +{
    + sync_core();
    + sync_data[this_cpu].wait_sync--;
    + smp_mb(); /* order master/slave sync_data read/write */
    + while (unlikely(sync_data[1 - this_cpu].wait_sync >=
    + sync_data[this_cpu].nr_waits))
    + barrier(); /*
    + * barrier is used because faster and
    + * more predictable than cpu_idle().
    + */
    + smp_mb(); /* order master/slave sync_data read/write */
    + sync_data[this_cpu].nr_waits--;
    + get_cycles_barrier();
    + sync_data[this_cpu].tsc_count = get_cycles();
    + get_cycles_barrier();
    +}
    +
    +/*
    + * Worker thread called on each CPU.
    + * First wait with interrupts enabled, then wait with interrupt disabled,
    + * for precision. We are already bound to one CPU.
    + * this_cpu 0 : master
    + * this_cpu 1 : slave
    + */
    +static void test_sync(void *arg)
    +{
    + long this_cpu = (long)arg;
    + unsigned long flags;
    +
    + local_irq_save(flags);
    + /* Make sure the instructions are in I-CACHE */
    + tsc_barrier(this_cpu);
    + tsc_barrier(this_cpu);
    + sync_data[this_cpu].wait_sync--;
    + smp_mb(); /* order master/slave sync_data read/write */
    + while (unlikely(sync_data[1 - this_cpu].wait_sync >=
    + sync_data[this_cpu].nr_waits))
    + barrier(); /*
    + * barrier is used because faster and
    + * more predictable than cpu_idle().
    + */
    + smp_mb(); /* order master/slave sync_data read/write */
    + sync_data[this_cpu].nr_waits--;
    + /*
    + * Here, only the master will wait for the slave to reach this barrier.
    + * This makes sure that the master, which holds the mutex and will reset
    + * the barriers, waits for the slave to stop using the barrier values
    + * before it continues. This is only done at the complete end of all the
    + * loops. This is why there is a + 1 in original wait_sync value.
    + */
    + if (sync_data[this_cpu].nr_waits == 1)
    + sync_data[this_cpu].wait_sync--;
    + local_irq_restore(flags);
    +}
    +
    +/*
    + * Each CPU (master and target) must decrement the wait_sync value twice (one
    + * for priming in cache), and also once after the get_cycles. After all the
    + * loops, one last synchronization is required to make sure the master waits
    + * for the slave before resetting the barriers.
    + */
    +static void reset_barriers(void)
    +{
    + int i;
    +
    + /*
    + * Wait until slave is done so that we don't overwrite
    + * wait_end_sync prematurely.
    + */
    + smp_mb(); /* order master/slave sync_data read/write */
    + while (unlikely(sync_data[1].wait_sync >= sync_data[0].nr_waits))
    + barrier(); /*
    + * barrier is used because faster and
    + * more predictable than cpu_idle().
    + */
    + smp_mb(); /* order master/slave sync_data read/write */
    +
    + for (i = 0; i < 2; i++) {
    + WARN_ON(sync_data[i].wait_sync != 0);
    + WARN_ON(sync_data[i].nr_waits != 1);
    + sync_data[i].wait_sync = 3 * NR_LOOPS + 1;
    + sync_data[i].nr_waits = 3 * NR_LOOPS + 1;
    + }
    +}
    +
    +/*
    + * Do loops (making sure no unexpected event changes the timing), keep the best
    + * one. The result of each loop is the highest tsc delta between the master CPU
    + * and the slaves. Stop CPU hotplug when this code is executed to make sure we
    + * are concurrency-safe wrt CPU hotplug also using this code. Test TSC
    + * synchronization even if we already "know" CPUs were not synchronized. This
    + * can be used as a test to check if, for some reason, the CPUs eventually got
    + * in sync after a CPU has been unplugged. This code is kept separate from the
    + * CPU hotplug code because the slave CPU executes in an IPI, which we want to
    + * keep as short as possible (this is happening while the system is running).
    + * Therefore, we do not send a single IPI for all the test loops, but rather
    + * send one IPI per loop.
    + */
    +int test_tsc_synchronization(void)
    +{
    + long cpu, master;
    + cycles_t max_diff = 0, diff, best_loop, worse_loop = 0;
    + int i;
    +
    + mutex_lock(&tscsync_mutex);
    + get_online_cpus();
    +
    + printk(KERN_INFO
    + "checking TSC synchronization across all online CPUs:");
    +
    + preempt_disable();
    + master = smp_processor_id();
    + for_each_online_cpu(cpu) {
    + if (master == cpu)
    + continue;
    + best_loop = (cycles_t)ULLONG_MAX;
    + for (i = 0; i < NR_LOOPS; i++) {
    + smp_call_function_single(cpu, test_sync,
    + (void *)1UL, 0);
    + test_sync((void *)0UL);
    + diff = abs(sync_data[1].tsc_count
    + - sync_data[0].tsc_count);
    + best_loop = min(best_loop, diff);
    + worse_loop = max(worse_loop, diff);
    + }
    + reset_barriers();
    + max_diff = max(best_loop, max_diff);
    + }
    + preempt_enable();
    + if (max_diff >= MAX_CYCLES_DELTA) {
    + printk(KERN_WARNING
    + "Measured %llu cycles TSC offset between CPUs,"
    + " turning off TSC clock.\n", (u64)max_diff);
    + mark_tsc_unstable("check_tsc_sync_source failed");
    + _tsc_is_sync = 0;
    + } else {
    + printk(" passed.\n");
    + }
    + put_online_cpus();
    + mutex_unlock(&tscsync_mutex);
    + return max_diff < MAX_CYCLES_DELTA;
    +}
    +EXPORT_SYMBOL_GPL(test_tsc_synchronization);
    +
    +/*
    + * Test synchronicity of a single core when it is hotplugged.
    + * Source CPU calls into this - waits for the freshly booted target CPU to
    + * arrive and then start the measurement:
    + */
    +void __cpuinit check_tsc_sync_source(int cpu)
    +{
    + cycles_t diff, abs_diff,
    + best_loop = (cycles_t)ULLONG_MAX, worse_loop = 0;
    + int i;
    +
    + /*
    + * No need to check if we already know that the TSC is not synchronized:
    + */
    + if (!force_tsc_sync && unsynchronized_tsc()) {
    + /*
    + * Make sure we mark _tsc_is_sync to 0 if the TSC is found
    + * to be unsynchronized for other causes than non-synchronized
    + * TSCs across CPUs.
    + */
    + _tsc_is_sync = 0;
    + set_trace_clock_is_sync(0);
    + return;
    + }
    +
    + printk(KERN_INFO "checking TSC synchronization [CPU#%d -> CPU#%d]:",
    + smp_processor_id(), cpu);
    +
    + for (i = 0; i < NR_LOOPS; i++) {
    + test_sync((void *)0UL);
    + diff = sync_data[1].tsc_count - sync_data[0].tsc_count;
    + abs_diff = abs(diff);
    + best_loop = min(best_loop, abs_diff);
    + worse_loop = max(worse_loop, abs_diff);
    + if (force_tsc_sync && best_loop == abs_diff)
    + slave_offset = diff;
    + }
    + reset_barriers();
    +
    + if (!force_tsc_sync && best_loop >= MAX_CYCLES_DELTA) {
    + printk(" failed.\n");
    + printk(KERN_WARNING
    + "Measured %llu cycles TSC offset between CPUs,"
    + " turning off TSC clock.\n", (u64)best_loop);
    + mark_tsc_unstable("check_tsc_sync_source failed");
    + _tsc_is_sync = 0;
    + set_trace_clock_is_sync(0);
    + } else {
    + printk(" %s.\n", !force_tsc_sync ? "passed" : "forced");
    + }
    + if (force_tsc_sync) {
    + /* order slave_offset and slave_offset_ready writes */
    + smp_wmb();
    + slave_offset_ready = 1;
    + }
    +}
    +
    +/*
    + * Freshly booted CPUs call into this:
    + */
    +void __cpuinit check_tsc_sync_target(void)
    +{
    + int i;
    +
    + if (!force_tsc_sync && unsynchronized_tsc())
    + return;
    +
    + for (i = 0; i < NR_LOOPS; i++)
    + test_sync((void *)1UL);
    +
    + /*
    + * Force slave synchronization if requested.
    + */
    + if (force_tsc_sync) {
    + unsigned long flags;
    + cycles_t new_tsc;
    +
    + while (!slave_offset_ready)
    + cpu_relax();
    + /* order slave_offset and slave_offset_ready reads */
    + smp_rmb();
    + local_irq_save(flags);
    + /*
    + * slave_offset is read when master has finished writing to it,
    + * and is protected by cpu hotplug serialization.
    + */
    + new_tsc = get_cycles() - slave_offset;
    + write_tsc((u32)new_tsc, (u32)((u64)new_tsc >> 32));
    + local_irq_restore(flags);
    + }
    +}
    Index: linux.trees.git/kernel/time/Makefile
    ================================================== =================
    --- linux.trees.git.orig/kernel/time/Makefile 2008-11-07 00:12:55.000000000 -0500
    +++ linux.trees.git/kernel/time/Makefile 2008-11-07 00:13:01.000000000 -0500
    @@ -6,3 +6,4 @@ obj-$(CONFIG_GENERIC_CLOCKEVENTS_BROADCA
    obj-$(CONFIG_TICK_ONESHOT) += tick-oneshot.o
    obj-$(CONFIG_TICK_ONESHOT) += tick-sched.o
    obj-$(CONFIG_TIMER_STATS) += timer_stats.o
    +obj-$(CONFIG_HAVE_UNSYNCHRONIZED_TSC) += tsc-sync.o
    Index: linux.trees.git/init/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/init/Kconfig 2008-11-07 00:12:55.000000000 -0500
    +++ linux.trees.git/init/Kconfig 2008-11-07 00:13:01.000000000 -0500
    @@ -354,6 +354,13 @@ config HAVE_TRACE_CLOCK_32_TO_64
    default y if (!HAVE_TRACE_CLOCK)
    default n if HAVE_TRACE_CLOCK

    +#
    +# Architectures which need to dynamically detect if their TSC is unsynchronized
    +# across cpus should select this.
    +#
    +config HAVE_UNSYNCHRONIZED_TSC
    + def_bool n
    +
    config GROUP_SCHED
    bool "Group CPU scheduler"
    depends on EXPERIMENTAL
    Index: linux.trees.git/Documentation/kernel-parameters.txt
    ================================================== =================
    --- linux.trees.git.orig/Documentation/kernel-parameters.txt 2008-11-07 00:12:55.000000000 -0500
    +++ linux.trees.git/Documentation/kernel-parameters.txt 2008-11-07 00:13:01.000000000 -0500
    @@ -765,6 +765,10 @@ and is between 256 and 4096 characters.
    parameter will force ia64_sal_cache_flush to call
    ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.

    + force_tsc_sync
    + Force TSC resynchronization when SMP CPUs go online.
    + See also idle=poll and disable frequency scaling.
    +
    gamecon.map[2|3]=
    [HW,JOY] Multisystem joystick and NES/SNES/PSX pad
    support via parallel port (up to 5 devices per port)

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. [RFC patch 15/18] MIPS : export hpt frequency for trace_clock.

    Trace_clock needs to export the hpt frequency to modules (e.g. LTTng).

    Signed-off-by: Mathieu Desnoyers
    CC: Ralf Baechle
    ---
    arch/mips/include/asm/timex.h | 2 ++
    arch/mips/kernel/time.c | 1 +
    2 files changed, 3 insertions(+)

    Index: linux.trees.git/arch/mips/include/asm/timex.h
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/include/asm/timex.h 2008-11-07 00:16:05.000000000 -0500
    +++ linux.trees.git/arch/mips/include/asm/timex.h 2008-11-07 00:16:17.000000000 -0500
    @@ -89,6 +89,8 @@ static inline void write_tsc(u32 val1, u
    write_c0_compare(read_c0_count() + DELAY_INTERRUPT);
    }

    +extern unsigned int mips_hpt_frequency;
    +
    #endif /* __KERNEL__ */

    #endif /* _ASM_TIMEX_H */
    Index: linux.trees.git/arch/mips/kernel/time.c
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/kernel/time.c 2008-07-19 09:18:07.000000000 -0400
    +++ linux.trees.git/arch/mips/kernel/time.c 2008-11-07 00:16:17.000000000 -0500
    @@ -70,6 +70,7 @@ EXPORT_SYMBOL(perf_irq);
    */

    unsigned int mips_hpt_frequency;
    +EXPORT_SYMBOL(mips_hpt_frequency);

    void __init clocksource_set_clock(struct clocksource *cs, unsigned int clock)
    {

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. [RFC patch 13/18] x86 : remove arch-specific tsc_sync.c

    Depends on the new arch. independent kernel/time/tsc-sync.c

    Signed-off-by: Mathieu Desnoyers
    CC: Thomas Gleixner
    CC: Ingo Molnar
    CC: H. Peter Anvin
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Peter Zijlstra
    CC: Steven Rostedt
    ---
    arch/x86/Kconfig | 2
    arch/x86/include/asm/tsc.h | 9 +-
    arch/x86/kernel/Makefile | 4
    arch/x86/kernel/tsc_sync.c | 189 ---------------------------------------------
    4 files changed, 12 insertions(+), 192 deletions(-)

    Index: linux.trees.git/arch/x86/kernel/Makefile
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/kernel/Makefile 2008-11-07 00:06:06.000000000 -0500
    +++ linux.trees.git/arch/x86/kernel/Makefile 2008-11-07 00:15:13.000000000 -0500
    @@ -56,9 +56,9 @@ obj-$(CONFIG_PCI) += early-quirks.o
    apm-y := apm_32.o
    obj-$(CONFIG_APM) += apm.o
    obj-$(CONFIG_X86_SMP) += smp.o
    -obj-$(CONFIG_X86_SMP) += smpboot.o tsc_sync.o ipi.o tlb_$(BITS).o
    +obj-$(CONFIG_X86_SMP) += smpboot.o ipi.o tlb_$(BITS).o
    obj-$(CONFIG_X86_32_SMP) += smpcommon.o
    -obj-$(CONFIG_X86_64_SMP) += tsc_sync.o smpcommon.o
    +obj-$(CONFIG_X86_64_SMP) += smpcommon.o
    obj-$(CONFIG_X86_TRAMPOLINE) += trampoline_$(BITS).o
    obj-$(CONFIG_X86_MPPARSE) += mpparse.o
    obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o
    Index: linux.trees.git/arch/x86/kernel/tsc_sync.c
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/kernel/tsc_sync.c 2008-09-30 11:38:51.000000000 -0400
    +++ /dev/null 1970-01-01 00:00:00.000000000 +0000
    @@ -1,189 +0,0 @@
    -/*
    - * check TSC synchronization.
    - *
    - * Copyright (C) 2006, Red Hat, Inc., Ingo Molnar
    - *
    - * We check whether all boot CPUs have their TSC's synchronized,
    - * print a warning if not and turn off the TSC clock-source.
    - *
    - * The warp-check is point-to-point between two CPUs, the CPU
    - * initiating the bootup is the 'source CPU', the freshly booting
    - * CPU is the 'target CPU'.
    - *
    - * Only two CPUs may participate - they can enter in any order.
    - * ( The serial nature of the boot logic and the CPU hotplug lock
    - * protects against more than 2 CPUs entering this code. )
    - */
    -#include
    -#include
    -#include
    -#include
    -#include
    -#include
    -
    -/*
    - * Entry/exit counters that make sure that both CPUs
    - * run the measurement code at once:
    - */
    -static __cpuinitdata atomic_t start_count;
    -static __cpuinitdata atomic_t stop_count;
    -
    -/*
    - * We use a raw spinlock in this exceptional case, because
    - * we want to have the fastest, inlined, non-debug version
    - * of a critical section, to be able to prove TSC time-warps:
    - */
    -static __cpuinitdata raw_spinlock_t sync_lock = __RAW_SPIN_LOCK_UNLOCKED;
    -static __cpuinitdata cycles_t last_tsc;
    -static __cpuinitdata cycles_t max_warp;
    -static __cpuinitdata int nr_warps;
    -
    -/*
    - * TSC-warp measurement loop running on both CPUs:
    - */
    -static __cpuinit void check_tsc_warp(void)
    -{
    - cycles_t start, now, prev, end;
    - int i;
    -
    - start = get_cycles();
    - /*
    - * The measurement runs for 20 msecs:
    - */
    - end = start + tsc_khz * 20ULL;
    - now = start;
    -
    - for (i = 0; ; i++) {
    - /*
    - * We take the global lock, measure TSC, save the
    - * previous TSC that was measured (possibly on
    - * another CPU) and update the previous TSC timestamp.
    - */
    - __raw_spin_lock(&sync_lock);
    - prev = last_tsc;
    - now = get_cycles();
    - last_tsc = now;
    - __raw_spin_unlock(&sync_lock);
    -
    - /*
    - * Be nice every now and then (and also check whether
    - * measurement is done [we also insert a 10 million
    - * loops safety exit, so we dont lock up in case the
    - * TSC readout is totally broken]):
    - */
    - if (unlikely(!(i & 7))) {
    - if (now > end || i > 10000000)
    - break;
    - cpu_relax();
    - touch_nmi_watchdog();
    - }
    - /*
    - * Outside the critical section we can now see whether
    - * we saw a time-warp of the TSC going backwards:
    - */
    - if (unlikely(prev > now)) {
    - __raw_spin_lock(&sync_lock);
    - max_warp = max(max_warp, prev - now);
    - nr_warps++;
    - __raw_spin_unlock(&sync_lock);
    - }
    - }
    - WARN(!(now-start),
    - "Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
    - now-start, end-start);
    -}
    -
    -/*
    - * Source CPU calls into this - it waits for the freshly booted
    - * target CPU to arrive and then starts the measurement:
    - */
    -void __cpuinit check_tsc_sync_source(int cpu)
    -{
    - int cpus = 2;
    -
    - /*
    - * No need to check if we already know that the TSC is not
    - * synchronized:
    - */
    - if (unsynchronized_tsc())
    - return;
    -
    - printk(KERN_INFO "checking TSC synchronization [CPU#%d -> CPU#%d]:",
    - smp_processor_id(), cpu);
    -
    - /*
    - * Reset it - in case this is a second bootup:
    - */
    - atomic_set(&stop_count, 0);
    -
    - /*
    - * Wait for the target to arrive:
    - */
    - while (atomic_read(&start_count) != cpus-1)
    - cpu_relax();
    - /*
    - * Trigger the target to continue into the measurement too:
    - */
    - atomic_inc(&start_count);
    -
    - check_tsc_warp();
    -
    - while (atomic_read(&stop_count) != cpus-1)
    - cpu_relax();
    -
    - if (nr_warps) {
    - printk("\n");
    - printk(KERN_WARNING "Measured %Ld cycles TSC warp between CPUs,"
    - " turning off TSC clock.\n", max_warp);
    - mark_tsc_unstable("check_tsc_sync_source failed");
    - } else {
    - printk(" passed.\n");
    - }
    -
    - /*
    - * Reset it - just in case we boot another CPU later:
    - */
    - atomic_set(&start_count, 0);
    - nr_warps = 0;
    - max_warp = 0;
    - last_tsc = 0;
    -
    - /*
    - * Let the target continue with the bootup:
    - */
    - atomic_inc(&stop_count);
    -}
    -
    -/*
    - * Freshly booted CPUs call into this:
    - */
    -void __cpuinit check_tsc_sync_target(void)
    -{
    - int cpus = 2;
    -
    - if (unsynchronized_tsc())
    - return;
    -
    - /*
    - * Register this CPU's participation and wait for the
    - * source CPU to start the measurement:
    - */
    - atomic_inc(&start_count);
    - while (atomic_read(&start_count) != cpus)
    - cpu_relax();
    -
    - check_tsc_warp();
    -
    - /*
    - * Ok, we are done:
    - */
    - atomic_inc(&stop_count);
    -
    - /*
    - * Wait for the source CPU to print stuff:
    - */
    - while (atomic_read(&stop_count) != cpus)
    - cpu_relax();
    -}
    -#undef NR_LOOPS
    -
    Index: linux.trees.git/arch/x86/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/Kconfig 2008-11-07 00:09:33.000000000 -0500
    +++ linux.trees.git/arch/x86/Kconfig 2008-11-07 00:15:13.000000000 -0500
    @@ -169,6 +169,7 @@ config X86_SMP
    bool
    depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
    select USE_GENERIC_SMP_HELPERS
    + select HAVE_UNSYNCHRONIZED_TSC
    default y

    config X86_32_SMP
    @@ -178,6 +179,7 @@ config X86_32_SMP
    config X86_64_SMP
    def_bool y
    depends on X86_64 && SMP
    + select HAVE_UNSYNCHRONIZED_TSC

    config X86_HT
    bool
    Index: linux.trees.git/arch/x86/include/asm/tsc.h
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/include/asm/tsc.h 2008-11-07 00:09:33.000000000 -0500
    +++ linux.trees.git/arch/x86/include/asm/tsc.h 2008-11-07 00:15:40.000000000 -0500
    @@ -48,7 +48,7 @@ static __always_inline cycles_t vget_cyc
    extern void tsc_init(void);
    extern void mark_tsc_unstable(char *reason);
    extern int unsynchronized_tsc(void);
    -int check_tsc_unstable(void);
    +extern int check_tsc_unstable(void);

    static inline cycles_t get_cycles_rate(void)
    {
    @@ -71,4 +71,11 @@ extern void check_tsc_sync_target(void);

    extern int notsc_setup(char *);

    +extern int test_tsc_synchronization(void);
    +extern int _tsc_is_sync;
    +static inline int tsc_is_sync(void)
    +{
    + return _tsc_is_sync;
    +}
    +
    #endif /* _ASM_X86_TSC_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. [RFC patch 18/18] x86 trace clock

    X86 trace clock. Depends on tsc_sync to detect if timestamp counters are
    synchronized on the machine.

    I am leaving this poorly scalable solution for now as this is the simplest, yet
    working, solution I found (compared to using the HPET which also scales very
    poorly, probably due to bus contention). This should be a good start and let us
    trace a good amount of machines out there.

    A "Big Fat" (TM) warning is shown on the console when the trace clock is used on
    systems without synchronized TSCs to tell the user to

    - use force_tsc_sync=1
    - use idle=poll
    - disable Powernow or Speedstep

    In order to get accurate and fast timestamps.

    This keeps room for further improvement in a second phase.

    Signed-off-by: Mathieu Desnoyers
    CC: Thomas Gleixner
    CC: Ingo Molnar
    CC: H. Peter Anvin
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Peter Zijlstra
    CC: Steven Rostedt
    ---
    arch/x86/Kconfig | 1
    arch/x86/include/asm/trace-clock.h | 73 ++++++++++
    arch/x86/kernel/Makefile | 1
    arch/x86/kernel/trace-clock.c | 248 +++++++++++++++++++++++++++++++++++++
    4 files changed, 323 insertions(+)

    Index: linux.trees.git/arch/x86/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/Kconfig 2008-11-07 00:15:13.000000000 -0500
    +++ linux.trees.git/arch/x86/Kconfig 2008-11-07 00:17:17.000000000 -0500
    @@ -30,6 +30,7 @@ config X86
    select HAVE_FTRACE_MCOUNT_RECORD
    select HAVE_DYNAMIC_FTRACE
    select HAVE_FUNCTION_TRACER
    + select HAVE_TRACE_CLOCK
    select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
    select HAVE_ARCH_KGDB if !X86_VOYAGER
    select HAVE_ARCH_TRACEHOOK
    Index: linux.trees.git/arch/x86/kernel/Makefile
    ================================================== =================
    --- linux.trees.git.orig/arch/x86/kernel/Makefile 2008-11-07 00:15:13.000000000 -0500
    +++ linux.trees.git/arch/x86/kernel/Makefile 2008-11-07 00:16:57.000000000 -0500
    @@ -36,6 +36,7 @@ obj-y += bootflag.o e820.o
    obj-y += pci-dma.o quirks.o i8237.o topology.o kdebugfs.o
    obj-y += alternative.o i8253.o pci-nommu.o
    obj-y += tsc.o io_delay.o rtc.o
    +obj-y += trace-clock.o

    obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o
    obj-y += process.o
    Index: linux.trees.git/arch/x86/kernel/trace-clock.c
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/arch/x86/kernel/trace-clock.c 2008-11-07 00:16:57.000000000 -0500
    @@ -0,0 +1,248 @@
    +/*
    + * arch/x86/kernel/trace-clock.c
    + *
    + * Trace clock for x86.
    + *
    + * Mathieu Desnoyers , October 2008
    + */
    +
    +#include
    +#include
    +#include
    +#include
    +#include
    +#include
    +
    +static cycles_t trace_clock_last_tsc;
    +static DEFINE_PER_CPU(struct timer_list, update_timer);
    +static DEFINE_MUTEX(async_tsc_mutex);
    +static int async_tsc_refcount; /* Number of readers */
    +static int async_tsc_enabled; /* Async TSC enabled on all online CPUs */
    +
    +int _trace_clock_is_sync = 1;
    +EXPORT_SYMBOL_GPL(_trace_clock_is_sync);
    +
    +/*
    + * Called by check_tsc_sync_source from CPU hotplug.
    + */
    +void set_trace_clock_is_sync(int state)
    +{
    + _trace_clock_is_sync = state;
    +}
    +
    +#if BITS_PER_LONG == 64
    +static cycles_t read_last_tsc(void)
    +{
    + return trace_clock_last_tsc;
    +}
    +#else
    +/*
    + * A cmpxchg64 update can happen concurrently. Based on the assumption that
    + * two cmpxchg64 will never update it to the same value (the count always
    + * increases), reading it twice insures that we read a coherent value with the
    + * same "sequence number".
    + */
    +static cycles_t read_last_tsc(void)
    +{
    + cycles_t val1, val2;
    +
    + val1 = trace_clock_last_tsc;
    + for (; {
    + val2 = val1;
    + barrier();
    + val1 = trace_clock_last_tsc;
    + if (likely(val1 == val2))
    + break;
    + }
    + return val1;
    +}
    +#endif
    +
    +/*
    + * Support for architectures with non-sync TSCs.
    + * When the local TSC is discovered to lag behind the highest TSC counter, we
    + * increment the TSC count of an amount that should be, ideally, lower than the
    + * execution time of this routine, in cycles : this is the granularity we look
    + * for : we must be able to order the events.
    + */
    +notrace cycles_t trace_clock_async_tsc_read(void)
    +{
    + cycles_t new_tsc, last_tsc;
    +
    + WARN_ON(!async_tsc_refcount || !async_tsc_enabled);
    + rdtsc_barrier();
    + new_tsc = get_cycles();
    + rdtsc_barrier();
    + last_tsc = read_last_tsc();
    + do {
    + if (new_tsc < last_tsc)
    + new_tsc = last_tsc + TRACE_CLOCK_MIN_PROBE_DURATION;
    + /*
    + * If cmpxchg fails with a value higher than the new_tsc, don't
    + * retry : the value has been incremented and the events
    + * happened almost at the same time.
    + * We must retry if cmpxchg fails with a lower value :
    + * it means that we are the CPU with highest frequency and
    + * therefore MUST update the value.
    + */
    + last_tsc = cmpxchg64(&trace_clock_last_tsc, last_tsc, new_tsc);
    + } while (unlikely(last_tsc < new_tsc));
    + return new_tsc;
    +}
    +EXPORT_SYMBOL_GPL(trace_clock_async_tsc_read);
    +
    +static void update_timer_ipi(void *info)
    +{
    + (void)trace_clock_async_tsc_read();
    +}
    +
    +/*
    + * update_timer_fct : - Timer function to resync the clocks
    + * @data: unused
    + *
    + * Fires every jiffy.
    + */
    +static void update_timer_fct(unsigned long data)
    +{
    + (void)trace_clock_async_tsc_read();
    +
    + per_cpu(update_timer, smp_processor_id()).expires = jiffies + 1;
    + add_timer_on(&per_cpu(update_timer, smp_processor_id()),
    + smp_processor_id());
    +}
    +
    +static void enable_trace_clock(int cpu)
    +{
    + init_timer(&per_cpu(update_timer, cpu));
    + per_cpu(update_timer, cpu).function = update_timer_fct;
    + per_cpu(update_timer, cpu).expires = jiffies + 1;
    + smp_call_function_single(cpu, update_timer_ipi, NULL, 1);
    + add_timer_on(&per_cpu(update_timer, cpu), cpu);
    +}
    +
    +static void disable_trace_clock(int cpu)
    +{
    + del_timer_sync(&per_cpu(update_timer, cpu));
    +}
    +
    +/*
    + * hotcpu_callback - CPU hotplug callback
    + * @nb: notifier block
    + * @action: hotplug action to take
    + * @hcpu: CPU number
    + *
    + * Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD)
    + */
    +static int __cpuinit hotcpu_callback(struct notifier_block *nb,
    + unsigned long action,
    + void *hcpu)
    +{
    + unsigned int hotcpu = (unsigned long)hcpu;
    + int cpu;
    +
    + mutex_lock(&async_tsc_mutex);
    + switch (action) {
    + case CPU_UP_PREPARE:
    + case CPU_UP_PREPARE_FROZEN:
    + break;
    + case CPU_ONLINE:
    + case CPU_ONLINE_FROZEN:
    + /*
    + * trace_clock_is_sync() is updated by set_trace_clock_is_sync()
    + * code, protected by cpu hotplug disable.
    + * It is ok to let the hotplugged CPU read the timebase before
    + * the CPU_ONLINE notification. It's just there to give a
    + * maximum bound to the TSC error.
    + */
    + if (async_tsc_refcount && !trace_clock_is_sync()) {
    + if (!async_tsc_enabled) {
    + async_tsc_enabled = 1;
    + for_each_online_cpu(cpu)
    + enable_trace_clock(cpu);
    + } else {
    + enable_trace_clock(hotcpu);
    + }
    + }
    + break;
    +#ifdef CONFIG_HOTPLUG_CPU
    + case CPU_UP_CANCELED:
    + case CPU_UP_CANCELED_FROZEN:
    + if (!async_tsc_refcount && num_online_cpus() == 1)
    + set_trace_clock_is_sync(1);
    + break;
    + case CPU_DEAD:
    + case CPU_DEAD_FROZEN:
    + /*
    + * We cannot stop the trace clock on other CPUs when readers are
    + * active even if we go back to a synchronized state (1 CPU)
    + * because the CPU left could be the one lagging behind.
    + */
    + if (async_tsc_refcount && async_tsc_enabled)
    + disable_trace_clock(hotcpu);
    + if (!async_tsc_refcount && num_online_cpus() == 1)
    + set_trace_clock_is_sync(1);
    + break;
    +#endif /* CONFIG_HOTPLUG_CPU */
    + }
    + mutex_unlock(&async_tsc_mutex);
    +
    + return NOTIFY_OK;
    +}
    +
    +void get_trace_clock(void)
    +{
    + int cpu;
    +
    + if (!trace_clock_is_sync()) {
    + printk(KERN_WARNING
    + "Trace clock falls back on cache-line bouncing\n"
    + "workaround due to non-synchronized TSCs.\n"
    + "This workaround preserves event order across CPUs.\n"
    + "Please consider disabling Speedstep or PowerNow and\n"
    + "using kernel parameters "
    + "\"force_tsc_sync=1 idle=poll\"\n"
    + "for accurate and fast tracing clock source.\n");
    + }
    +
    + get_online_cpus();
    + mutex_lock(&async_tsc_mutex);
    + if (async_tsc_refcount++ || trace_clock_is_sync())
    + goto end;
    +
    + async_tsc_enabled = 1;
    + for_each_online_cpu(cpu)
    + enable_trace_clock(cpu);
    +end:
    + mutex_unlock(&async_tsc_mutex);
    + put_online_cpus();
    +}
    +EXPORT_SYMBOL_GPL(get_trace_clock);
    +
    +void put_trace_clock(void)
    +{
    + int cpu;
    +
    + get_online_cpus();
    + mutex_lock(&async_tsc_mutex);
    + WARN_ON(async_tsc_refcount <= 0);
    + if (async_tsc_refcount != 1 || !async_tsc_enabled)
    + goto end;
    +
    + for_each_online_cpu(cpu)
    + disable_trace_clock(cpu);
    + async_tsc_enabled = 0;
    +end:
    + async_tsc_refcount--;
    + if (!async_tsc_refcount && num_online_cpus() == 1)
    + set_trace_clock_is_sync(1);
    + mutex_unlock(&async_tsc_mutex);
    + put_online_cpus();
    +}
    +EXPORT_SYMBOL_GPL(put_trace_clock);
    +
    +static __init int init_unsync_trace_clock(void)
    +{
    + hotcpu_notifier(hotcpu_callback, 4);
    + return 0;
    +}
    +early_initcall(init_unsync_trace_clock);
    Index: linux.trees.git/arch/x86/include/asm/trace-clock.h
    ================================================== =================
    --- /dev/null 1970-01-01 00:00:00.000000000 +0000
    +++ linux.trees.git/arch/x86/include/asm/trace-clock.h 2008-11-07 00:16:57.000000000 -0500
    @@ -0,0 +1,73 @@
    +#ifndef _ASM_X86_TRACE_CLOCK_H
    +#define _ASM_X86_TRACE_CLOCK_H
    +
    +/*
    + * linux/arch/x86/include/asm/trace-clock.h
    + *
    + * Copyright (C) 2005,2006,2008
    + * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
    + *
    + * Trace clock definitions for x86.
    + */
    +
    +#include
    +#include
    +#include
    +#include
    +
    +/* Minimum duration of a probe, in cycles */
    +#define TRACE_CLOCK_MIN_PROBE_DURATION 200
    +
    +extern cycles_t trace_clock_async_tsc_read(void);
    +
    +extern int _trace_clock_is_sync;
    +static inline int trace_clock_is_sync(void)
    +{
    + return _trace_clock_is_sync;
    +}
    +
    +static inline u32 trace_clock_read32(void)
    +{
    + u32 cycles;
    +
    + if (likely(trace_clock_is_sync())) {
    + get_cycles_barrier();
    + cycles = (u32)get_cycles(); /* only need the 32 LSB */
    + get_cycles_barrier();
    + } else
    + cycles = (u32)trace_clock_async_tsc_read();
    + return cycles;
    +}
    +
    +static inline u64 trace_clock_read64(void)
    +{
    + u64 cycles;
    +
    + if (likely(trace_clock_is_sync())) {
    + get_cycles_barrier();
    + cycles = get_cycles();
    + get_cycles_barrier();
    + } else
    + cycles = trace_clock_async_tsc_read();
    + return cycles;
    +}
    +
    +static inline void trace_clock_add_timestamp(unsigned long ticks)
    +{ }
    +
    +static inline unsigned int trace_clock_frequency(void)
    +{
    + return cpu_khz;
    +}
    +
    +static inline u32 trace_clock_freq_scale(void)
    +{
    + return 1000;
    +}
    +
    +extern void get_trace_clock(void);
    +extern void put_trace_clock(void);
    +
    +extern void set_trace_clock_is_sync(int state);
    +
    +#endif /* _ASM_X86_TRACE_CLOCK_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. [RFC patch 14/18] MIPS use tsc_sync.c

    tsc-sync.c is now available to test if TSC is synchronized across cores. Given
    I currently don't have access to a MIPS board myself, help trying to use it
    when CPUs go online and testing the implementation would be welcome.

    Signed-off-by: Mathieu Desnoyers
    CC: Ralf Baechle
    CC: Peter Zijlstra
    ---
    arch/mips/include/asm/timex.h | 26 ++++++++++++++++++++++++++
    arch/mips/kernel/smp.c | 1 +
    2 files changed, 27 insertions(+)

    Index: linux.trees.git/arch/mips/kernel/smp.c
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/kernel/smp.c 2008-11-07 00:06:06.000000000 -0500
    +++ linux.trees.git/arch/mips/kernel/smp.c 2008-11-07 00:16:05.000000000 -0500
    @@ -178,6 +178,7 @@ void __init smp_cpus_done(unsigned int m
    {
    mp_ops->cpus_done();
    synchronise_count_master();
    + test_tsc_synchronization();
    }

    /* called from main before smp_init() */
    Index: linux.trees.git/arch/mips/include/asm/timex.h
    ================================================== =================
    --- linux.trees.git.orig/arch/mips/include/asm/timex.h 2008-11-07 00:10:10.000000000 -0500
    +++ linux.trees.git/arch/mips/include/asm/timex.h 2008-11-07 00:16:05.000000000 -0500
    @@ -56,13 +56,39 @@ static inline cycles_t get_cycles_rate(v
    {
    return CLOCK_TICK_RATE;
    }
    +
    +extern int test_tsc_synchronization(void);
    +extern int _tsc_is_sync;
    +static inline int tsc_is_sync(void)
    +{
    + return _tsc_is_sync;
    +}
    #else
    static inline cycles_t get_cycles(void)
    {
    return 0;
    }
    +static inline int test_tsc_synchronization(void)
    +{
    + return 0;
    +}
    +static inline int tsc_is_sync(void)
    +{
    + return 0;
    +}
    #endif

    +#define DELAY_INTERRUPT 100
    +/*
    + * Only updates 32 LSB.
    + */
    +static inline void write_tsc(u32 val1, u32 val2)
    +{
    + write_c0_count(val1);
    + /* Arrange for an interrupt in a short while */
    + write_c0_compare(read_c0_count() + DELAY_INTERRUPT);
    +}
    +
    #endif /* __KERNEL__ */

    #endif /* _ASM_TIMEX_H */

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC patch 10/18] Sparc64 : Trace clock

    From: Mathieu Desnoyers
    Date: Fri, 07 Nov 2008 00:23:46 -0500

    > Implement sparc64 trace clock.
    >
    > Signed-off-by: Mathieu Desnoyers
    > CC: David Miller
    > CC: linux-arch@vger.kernel.org


    Acked-by: David S. Miller
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. [RFC patch 01/18] get_cycles() : kconfig HAVE_GET_CYCLES

    Create a new "HAVE_GET_CYCLES" architecture option to specify which
    architectures provide 64-bits TSC counters readable with get_cycles(). It's
    principally useful to only enable high-precision tracing code only on such
    architectures and don't even bother building it on architectures which lack such
    support.

    It also requires architectures to provide get_cycles_barrier() and
    get_cycles_rate().

    I mainly use it for the "priority-sifting rwlock" latency tracing code, which
    traces worse-case latency induced by the locking. It also provides the basic
    changes needed for the LTTng timestamping infrastructure.

    Signed-off-by: Mathieu Desnoyers
    CC: David Miller
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    CC: linux-arch@vger.kernel.org
    ---
    init/Kconfig | 10 ++++++++++
    1 file changed, 10 insertions(+)

    Index: linux.trees.git/init/Kconfig
    ================================================== =================
    --- linux.trees.git.orig/init/Kconfig 2008-11-07 00:06:07.000000000 -0500
    +++ linux.trees.git/init/Kconfig 2008-11-07 00:07:23.000000000 -0500
    @@ -330,6 +330,16 @@ config CPUSETS
    config HAVE_UNSTABLE_SCHED_CLOCK
    bool

    +#
    +# Architectures with a 64-bits get_cycles() should select this.
    +# They should also define
    +# get_cycles_barrier() : instruction synchronization barrier if required
    +# get_cycles_rate() : cycle counter rate, in HZ. If 0, TSC are not synchronized
    +# across CPUs or their frequency may vary due to frequency scaling.
    +#
    +config HAVE_GET_CYCLES
    + def_bool n
    +
    config GROUP_SCHED
    bool "Group CPU scheduler"
    depends on EXPERIMENTAL

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [RFC patch 07/18] Trace clock core

    On Fri, 07 Nov 2008 00:23:43 -0500 Mathieu Desnoyers wrote:

    > 32 to 64 bits clock extension. Extracts 64 bits tsc from a [1..32]
    > bits counter, kept up to date by periodical timer interrupt. Lockless.
    >
    > ...
    >
    > +#include /* FIX for m68k local_irq_enable in on_each_cpu */


    What's going on here?

    > +struct synthetic_tsc_struct {
    > + union {
    > + u64 val;
    > + struct {
    > +#ifdef __BIG_ENDIAN
    > + u32 msb;
    > + u32 lsb;
    > +#else
    > + u32 lsb;
    > + u32 msb;
    > +#endif


    One would expect an identifier called "msb" to mean "most significant
    bit" or possible "most significant byte".

    Maybe ms32 and ls32?

    > + } sel;
    > + } tsc[2];
    > + unsigned int index; /* Index of the current synth. tsc. */
    > +};
    > +
    > +static DEFINE_PER_CPU(struct synthetic_tsc_struct, synthetic_tsc);
    > +
    > +/* Called from IPI : either in interrupt or process context */


    IPI handlers should always be called with local interrupts disabled.

    > +static void update_synthetic_tsc(void)
    > +{
    > + struct synthetic_tsc_struct *cpu_synth;
    > + u32 tsc;
    > +
    > + preempt_disable();


    which would make this unnecessary.

    > + cpu_synth = &per_cpu(synthetic_tsc, smp_processor_id());
    > + tsc = trace_clock_read32(); /* Hardware clocksource read */
    > +
    > + if (tsc < HW_LSB(cpu_synth->tsc[cpu_synth->index].sel.lsb)) {
    > + unsigned int new_index = 1 - cpu_synth->index; /* 0 <-> 1 */
    > + /*
    > + * Overflow
    > + * Non atomic update of the non current synthetic TSC, followed
    > + * by an atomic index change. There is no write concurrency,
    > + * so the index read/write does not need to be atomic.
    > + */
    > + cpu_synth->tsc[new_index].val =
    > + (SW_MSB(cpu_synth->tsc[cpu_synth->index].val)
    > + | (u64)tsc) + (1ULL << HW_BITS);
    > + cpu_synth->index = new_index; /* atomic change of index */
    > + } else {
    > + /*
    > + * No overflow : We know that the only bits changed are
    > + * contained in the 32 LSBs, which can be written to atomically.
    > + */
    > + cpu_synth->tsc[cpu_synth->index].sel.lsb =
    > + SW_MSB(cpu_synth->tsc[cpu_synth->index].sel.lsb) | tsc;
    > + }
    > + preempt_enable();
    > +}


    Is there something we should be fixing in m68k?

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. [RFC patch 08/18] cnt32_to_63 should use smp_rmb()

    Assume the time source is a global clock which insures that time will never
    *ever* go backward. Use a smp_rmb() to make sure the cnt_lo value is read before
    __m_cnt_hi.

    Remove the now unnecessary volatile statement. The barrier takes care of memory
    ordering.

    Mathieu:
    > Yup, you are right. However, the case where one CPU sees the clock source
    > a little bit off-sync (late) still poses a problem. Example follows :
    >
    > CPU A B
    > read __m_cnt_hi (0x80000000)
    > read hw cnt low (0x00000001)
    > (wrap detected :
    > (s32)(0x80000000 ^ 0x1) < 0)
    > write __m_cnt_hi = 0x00000001
    > return 0x0000000100000001
    > read __m_cnt_hi (0x00000001)
    > (late) read hw cnt low (0xFFFFFFFA)
    > (wrap detected :
    > (s32)(0x00000001 ^ 0xFFFFFFFA) <

    +0)
    > write __m_cnt_hi = 0x80000001
    > return 0x80000001FFFFFFFA
    > (time jumps)
    > A similar situation can be generated by out-of-order hi/low bits reads.


    Nicolas:
    This, of course, should and can be prevented. No big deal.

    Signed-off-by: Mathieu Desnoyers
    CC: Nicolas Pitre
    CC: Ralf Baechle
    CC: benh@kernel.crashing.org
    CC: paulus@samba.org
    CC: David Miller
    CC: Linus Torvalds
    CC: Andrew Morton
    CC: Ingo Molnar
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Steven Rostedt
    CC: linux-arch@vger.kernel.org
    ---
    include/linux/cnt32_to_63.h | 7 ++++++-
    1 file changed, 6 insertions(+), 1 deletion(-)

    Index: linux-2.6-lttng/include/linux/cnt32_to_63.h
    ================================================== =================
    --- linux-2.6-lttng.orig/include/linux/cnt32_to_63.h 2008-11-04 01:39:03.000000000 -0500
    +++ linux-2.6-lttng/include/linux/cnt32_to_63.h 2008-11-04 01:48:50.000000000 -0500
    @@ -65,12 +65,17 @@ union cnt32_to_63 {
    * implicitly by making the multiplier even, therefore saving on a runtime
    * clear-bit instruction. Otherwise caller must remember to clear the top
    * bit explicitly.
    + *
    + * Assume the time source is a global clock read from memory mapped I/O which
    + * insures that time will never *ever* go backward. Using a smp_rmb() to make
    + * sure the __m_cnt_hi value is read before the cnt_lo mmio read.
    */
    #define cnt32_to_63(cnt_lo) \
    ({ \
    - static volatile u32 __m_cnt_hi; \
    + static u32 __m_cnt_hi; \
    union cnt32_to_63 __x; \
    __x.hi = __m_cnt_hi; \
    + smp_rmb(); /* read __m_cnt_hi before mmio cnt_lo */ \
    __x.lo = (cnt_lo); \
    if (unlikely((s32)(__x.hi ^ __x.lo) < 0)) \
    __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31); \

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()

    On Fri, 07 Nov 2008 00:23:44 -0500 Mathieu Desnoyers wrote:

    > #define cnt32_to_63(cnt_lo) \
    > ({ \
    > - static volatile u32 __m_cnt_hi; \
    > + static u32 __m_cnt_hi; \
    > union cnt32_to_63 __x; \
    > __x.hi = __m_cnt_hi; \
    > + smp_rmb(); /* read __m_cnt_hi before mmio cnt_lo */ \
    > __x.lo = (cnt_lo); \
    > if (unlikely((s32)(__x.hi ^ __x.lo) < 0)) \
    > __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31); \


    Oh dear. We have a macro which secretly maintains
    per-instantiation-site global state? And doesn't even implement locking
    to protect that state?

    I mean, the darned thing is called from sched_clock(), which can be
    concurrently called on separate CPUs and which can be called from
    interrupt context (with an arbitrary nesting level!) while it was running
    in process context.

    Who let that thing into Linux?


    Look:

    /*
    * Caller must provide locking to protect *caller_state
    */
    u32 cnt32_to_63(u32 *caller_state, u32 cnt_lo);

    But even that looks pretty crappy.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [RFC patch 07/18] Trace clock core

    * Andrew Morton (akpm@linux-foundation.org) wrote:
    > On Fri, 07 Nov 2008 00:23:43 -0500 Mathieu Desnoyers wrote:
    >
    > > 32 to 64 bits clock extension. Extracts 64 bits tsc from a [1..32]
    > > bits counter, kept up to date by periodical timer interrupt. Lockless.
    > >
    > > ...
    > >
    > > +#include /* FIX for m68k local_irq_enable in on_each_cpu */

    >
    > What's going on here?
    >


    Hi Andrew,

    When I wrote this comment (kernel ~2.6.25), the situation was (and it
    still looks valid, although I haven't tried to fix it since then) :

    linux/smp.h :
    on_each_cpu() (!SMP)
    local_irq_enable()

    but, on m68k :

    asm-m68k/system.h defines this ugly macro :

    /* interrupt control.. */
    #if 0
    #define local_irq_enable() asm volatile ("andiw %0,%%sr": : "i" (ALLOWINT) : "memory")
    #else
    #include
    #define local_irq_enable() ({ \
    if (MACH_IS_Q40 || !hardirq_count()) \
    asm volatile ("andiw %0,%%sr": : "i" (ALLOWINT) : "memory"); \
    })
    #endif

    Which uses !hardirq_count(), which is defined by sched.h. However, I did
    try in the past to include sched.h in asm-m68k/system.h, but it ended up
    doing a recursive inclusion.

    > > +struct synthetic_tsc_struct {
    > > + union {
    > > + u64 val;
    > > + struct {
    > > +#ifdef __BIG_ENDIAN
    > > + u32 msb;
    > > + u32 lsb;
    > > +#else
    > > + u32 lsb;
    > > + u32 msb;
    > > +#endif

    >
    > One would expect an identifier called "msb" to mean "most significant
    > bit" or possible "most significant byte".
    >
    > Maybe ms32 and ls32?
    >


    Yep, seems clearer.

    > > + } sel;
    > > + } tsc[2];
    > > + unsigned int index; /* Index of the current synth. tsc. */
    > > +};
    > > +
    > > +static DEFINE_PER_CPU(struct synthetic_tsc_struct, synthetic_tsc);
    > > +
    > > +/* Called from IPI : either in interrupt or process context */

    >
    > IPI handlers should always be called with local interrupts disabled.
    >
    > > +static void update_synthetic_tsc(void)
    > > +{
    > > + struct synthetic_tsc_struct *cpu_synth;
    > > + u32 tsc;
    > > +
    > > + preempt_disable();

    >
    > which would make this unnecessary.
    >


    Ah, yes, right.

    > > + cpu_synth = &per_cpu(synthetic_tsc, smp_processor_id());
    > > + tsc = trace_clock_read32(); /* Hardware clocksource read */
    > > +
    > > + if (tsc < HW_LSB(cpu_synth->tsc[cpu_synth->index].sel.lsb)) {
    > > + unsigned int new_index = 1 - cpu_synth->index; /* 0 <-> 1 */
    > > + /*
    > > + * Overflow
    > > + * Non atomic update of the non current synthetic TSC, followed
    > > + * by an atomic index change. There is no write concurrency,
    > > + * so the index read/write does not need to be atomic.
    > > + */
    > > + cpu_synth->tsc[new_index].val =
    > > + (SW_MSB(cpu_synth->tsc[cpu_synth->index].val)
    > > + | (u64)tsc) + (1ULL << HW_BITS);
    > > + cpu_synth->index = new_index; /* atomic change of index */
    > > + } else {
    > > + /*
    > > + * No overflow : We know that the only bits changed are
    > > + * contained in the 32 LSBs, which can be written to atomically.
    > > + */
    > > + cpu_synth->tsc[cpu_synth->index].sel.lsb =
    > > + SW_MSB(cpu_synth->tsc[cpu_synth->index].sel.lsb) | tsc;
    > > + }
    > > + preempt_enable();
    > > +}

    >
    > Is there something we should be fixing in m68k?
    >


    Yes, but I fear it's going to go deep into include hell :-(

    Mathieu

    --
    Mathieu Desnoyers
    OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [RFC patch 07/18] Trace clock core

    On Fri, 7 Nov 2008 01:16:43 -0500 Mathieu Desnoyers wrote:

    > > Is there something we should be fixing in m68k?
    > >

    >
    > Yes, but I fear it's going to go deep into include hell :-(


    Oh, OK. I thought that the comment meant that m68k's on_each_cpu()
    behaves differently at runtime from other architectures (and wrongly).

    If it's just some compile-time #include snafu then that's far less
    of a concern.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()

    On Thu, 6 Nov 2008, Andrew Morton wrote:

    > On Fri, 07 Nov 2008 00:23:44 -0500 Mathieu Desnoyers wrote:
    >
    > > #define cnt32_to_63(cnt_lo) \
    > > ({ \
    > > - static volatile u32 __m_cnt_hi; \
    > > + static u32 __m_cnt_hi; \
    > > union cnt32_to_63 __x; \
    > > __x.hi = __m_cnt_hi; \
    > > + smp_rmb(); /* read __m_cnt_hi before mmio cnt_lo */ \
    > > __x.lo = (cnt_lo); \
    > > if (unlikely((s32)(__x.hi ^ __x.lo) < 0)) \
    > > __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31); \

    >
    > Oh dear. We have a macro which secretly maintains
    > per-instantiation-site global state? And doesn't even implement locking
    > to protect that state?


    Please do me a favor and look for those very unfrequent posts I've sent
    to lkml lately. I've explained it all at least 3 times so far, to Peter
    Zijlstra, to David Howells, to Mathieu Desnoyers, and now to you.

    > I mean, the darned thing is called from sched_clock(), which can be
    > concurrently called on separate CPUs and which can be called from
    > interrupt context (with an arbitrary nesting level!) while it was running
    > in process context.


    Yes! And this is so on *purpose*. Please take some time to read the
    comment that goes along with it, and if you're still not convinced then
    look for those explanation emails I've already posted.

    > /*
    > * Caller must provide locking to protect *caller_state
    > */


    NO! This is meant to be LOCK FREE!


    Nicolas
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 6 1 2 3 ... LastLast