[RFC patch 00/18] Trace Clock v2 - Kernel
This is a discussion on [RFC patch 00/18] Trace Clock v2 - Kernel ; On Fri, 07 Nov 2008 03:12:18 -0500 (EST) Nicolas Pitre wrote:
> On Thu, 6 Nov 2008, Andrew Morton wrote:
>
> > On Fri, 07 Nov 2008 00:23:44 -0500 Mathieu Desnoyers wrote:
> >
> > > #define cnt32_to_63(cnt_lo) ...
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
On Fri, 07 Nov 2008 03:12:18 -0500 (EST) Nicolas Pitre wrote:
> On Thu, 6 Nov 2008, Andrew Morton wrote:
>
> > On Fri, 07 Nov 2008 00:23:44 -0500 Mathieu Desnoyers wrote:
> >
> > > #define cnt32_to_63(cnt_lo) \
> > > ({ \
> > > - static volatile u32 __m_cnt_hi; \
> > > + static u32 __m_cnt_hi; \
> > > union cnt32_to_63 __x; \
> > > __x.hi = __m_cnt_hi; \
> > > + smp_rmb(); /* read __m_cnt_hi before mmio cnt_lo */ \
> > > __x.lo = (cnt_lo); \
> > > if (unlikely((s32)(__x.hi ^ __x.lo) < 0)) \
> > > __m_cnt_hi = __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31); \
> >
> > Oh dear. We have a macro which secretly maintains
> > per-instantiation-site global state? And doesn't even implement locking
> > to protect that state?
>
> Please do me a favor and look for those very unfrequent posts I've sent
> to lkml lately.
No. Reading the kernel code (and, at a pinch, the changelogs) should
suffice. If it does not suffice, the kernel code is in error.
> I've explained it all at least 3 times so far, to Peter
> Zijlstra, to David Howells, to Mathieu Desnoyers, and now to you.
If four heads have exploded (thus far) over one piece of code, perhaps
the blame doesn't lie with those heads.
> > I mean, the darned thing is called from sched_clock(), which can be
> > concurrently called on separate CPUs and which can be called from
> > interrupt context (with an arbitrary nesting level!) while it was running
> > in process context.
>
> Yes! And this is so on *purpose*. Please take some time to read the
> comment that goes along with it,
OK.
> and if you're still not convinced then
> look for those explanation emails I've already posted.
No.
> > /*
> > * Caller must provide locking to protect *caller_state
> > */
>
> NO! This is meant to be LOCK FREE!
We have a macro which must only have a single usage in any particular
kernel build (and nothing to detect a violation of that).
We have a macro which secretly hides internal state, on a per-expansion-site
basis, no less.
It apparently tries to avoid races via ordering tricks, as long
as it is called with sufficient frequency. But nothing guarantees
that it _is_ called sufficiently frequently?
There is absolutely no reason why the first two of these quite bad things
needed to be done. In fact there is no reason why it needed to be
implemented as a macro at all.
As I said in the text which you deleted and ignored, this would be
better if it was implemented as a C function which requires that the
caller explicitly pass in a reference to the state storage.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Mathieu Desnoyers wrote:
> Assume the time source is a global clock which insures that time will never
> *ever* go backward. Use a smp_rmb() to make sure the cnt_lo value is read before
> __m_cnt_hi.
If you have an smp_rmb(), then don't you need an smp_wmb()/smp_mb() to match
it to make it work? And is your assumption valid that smp_rmb() will affect
memory vs the I/O access to read the clock? There's no requirement that
cnt_lo will have been read from an MMIO location at all.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Nicolas Pitre wrote:
> > I mean, the darned thing is called from sched_clock(), which can be
> > concurrently called on separate CPUs and which can be called from
> > interrupt context (with an arbitrary nesting level!) while it was running
> > in process context.
>
> Yes! And this is so on *purpose*. Please take some time to read the
> comment that goes along with it, and if you're still not convinced then
> look for those explanation emails I've already posted.
I agree with Nicolas on this. It's abominably clever, but I think he's right.
The one place I remain unconvinced is over the issue of preemption of a process
that is in the middle of cnt32_to_63(), where if the preempted process is
asleep for long enough, I think it can wind time backwards when it resumes, but
that's not a problem for the one place I want to use it (sched_clock()) because
that is (almost) always called with preemption disabled in one way or another.
The one place it isn't is a debugging case that I'm not too worried about.
> > /*
> > * Caller must provide locking to protect *caller_state
> > */
>
> NO! This is meant to be LOCK FREE!
Absolutely.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Andrew Morton wrote:
> I mean, the darned thing is called from sched_clock(), which can be
> concurrently called on separate CPUs and which can be called from
> interrupt context (with an arbitrary nesting level!) while it was running
> in process context.
>
> Who let that thing into Linux?
Having crawled all over it, and argued with Nicolas and Panasonic about it, I
think it's safe in sched_clock(), provided sched_clock() never gets preempted -
which appears to be the case.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Andrew Morton wrote:
> We have a macro which must only have a single usage in any particular
> kernel build (and nothing to detect a violation of that).
That's not true. It's a macro containing a _static_ local variable, therefore
the macro may be used multiple times, and each time it's used the compiler
will allocate a new piece of storage.
> It apparently tries to avoid races via ordering tricks, as long
> as it is called with sufficient frequency. But nothing guarantees
> that it _is_ called sufficiently frequently?
The comment attached to it clearly states this restriction. Therefore the
caller must guarantee it. That is something Mathieu's code and my code must
deal with, not Nicolas's.
> There is absolutely no reason why the first two of these quite bad things
> needed to be done. In fact there is no reason why it needed to be
> implemented as a macro at all.
There's a very good reason to implement it as either a macro or an inline
function: it's faster. Moving the whole thing out of line would impose an
additional function call overhead - with a 64-bit return value on 32-bit
platforms. For my case - sched_clock() - I'm willing to burn a bit of extra
space to get the extra speed.
> As I said in the text which you deleted and ignored, this would be
> better if it was implemented as a C function which requires that the
> caller explicitly pass in a reference to the state storage.
I'd be quite happy if it was:
static inline u64 cnt32_to_63(u32 cnt_lo, u32 *__m_cnt_hi)
{
union cnt32_to_63 __x;
__x.hi = *__m_cnt_hi;
__x.lo = cnt_lo;
if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
*__m_cnt_hi =
__x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
return __x.val;
}
I imagine this would compile pretty much the same as the macro. I think it
would make it more obvious about the independence of the storage.
Alternatively, perhaps Nicolas just needs to mention this in the comment more
clearly.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 17/18] MIPS : Trace clock
On Fri, 2008-11-07 at 00:23 -0500, Mathieu Desnoyers wrote:
> Note for Peter Zijlstra :
> You should probably have a look at lockdep.c raw_spinlock_t lockdep_lock usage.
> I suspect it may be used with preemption enabled in graph_lock(). (not sure
> though, but it's worth double-checking.
Are you worried about the graph_lock() instance in
lookup_chain_cache() ?
That is the locking for validate_chain,
__lock_acquire()
validate_chain()
look_up_chain_cache()
graph_lock()
check_prevs_add()
check_prev_add()
graph_unlock()
graph_lock()
graph_unlock()
which is all done without modifying IRQ state.
However, __lock_acquire() is only called with IRQs disabled:
lock_acquire()
raw_local_irq_save()
__lock_acquire()
lock_release()
raw_local_irq_save()
__lock_release()
lock_release_nested()
__lock_acquire()
lock_release_non_nested()
__lock_acquire()
lock_set_subclass()
raw_local_irq_save()
__lock_set_subclass()
__lock_acquire()
So I think we're good.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 04/18] get_cycles() : powerpc64 HAVE_GET_CYCLES
On Fri, Nov 07, 2008 at 12:23:40AM -0500, Mathieu Desnoyers wrote:
>This patch selects HAVE_GET_CYCLES and makes sure get_cycles_barrier() and
>get_cycles_rate() are implemented.
>
>Signed-off-by: Mathieu Desnoyers
>CC: benh@kernel.crashing.org
>CC: paulus@samba.org
>CC: David Miller
>CC: Linus Torvalds
>CC: Andrew Morton
>CC: Ingo Molnar
>CC: Peter Zijlstra
>CC: Thomas Gleixner
>CC: Steven Rostedt
>CC: linux-arch@vger.kernel.org
>---
> arch/powerpc/Kconfig | 1 +
> arch/powerpc/include/asm/timex.h | 11 +++++++++++
> 2 files changed, 12 insertions(+)
>
>Index: linux.trees.git/arch/powerpc/Kconfig
>================================================== =================
>--- linux.trees.git.orig/arch/powerpc/Kconfig 2008-11-07 00:09:44.000000000 -0500
>+++ linux.trees.git/arch/powerpc/Kconfig 2008-11-07 00:09:46.000000000 -0500
>@@ -121,6 +121,7 @@ config PPC
> select HAVE_DMA_ATTRS if PPC64
> select USE_GENERIC_SMP_HELPERS if SMP
> select HAVE_OPROFILE
>+ select HAVE_GET_CYCLES if PPC64
So maybe it's just me because it's Friday and I'm on vacation, but I don't
see anything overly specific to ppc64 here. In fact, you use get_cycles_rate
for all of powerpc in a later patch in the series.
Is there something special about HAVE_GET_CYCLES that I'm missing that would
make it only apply to ppc64 and not ppc32?
josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
On Fri, 7 Nov 2008, David Howells wrote:
> Andrew Morton wrote:
>
> > As I said in the text which you deleted and ignored, this would be
> > better if it was implemented as a C function which requires that the
> > caller explicitly pass in a reference to the state storage.
The whole purpose of that thing is to be utterly fast and lightweight.
Having an out of line C call would trash the major advantage of this
code.
> I'd be quite happy if it was:
>
> static inline u64 cnt32_to_63(u32 cnt_lo, u32 *__m_cnt_hi)
> {
> union cnt32_to_63 __x;
> __x.hi = *__m_cnt_hi;
> __x.lo = cnt_lo;
> if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
> *__m_cnt_hi =
> __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
> return __x.val;
> }
>
> I imagine this would compile pretty much the same as the macro.
Depends. As everybody has noticed now, the read ordering is important,
and if gcc decides to not inline this for whatever reason then the
ordering is lost. This is why this was a macro to start with.
> I think it
> would make it more obvious about the independence of the storage.
I don't think having the associated storage be outside the macro make
any sense either. There is simply no valid reason for having it shared
between multiple invokations of the macro, as well as making its
interface more complex for no gain.
> Alternatively, perhaps Nicolas just needs to mention this in the comment more
> clearly.
I wrote that code so to me it is cristal clear already. Any suggestions
as to how this could be improved?
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
On Fri, 07 Nov 2008 10:01:01 -0500 (EST) Nicolas Pitre wrote:
> On Fri, 7 Nov 2008, David Howells wrote:
>
> > Andrew Morton wrote:
> >
> > > As I said in the text which you deleted and ignored, this would be
> > > better if it was implemented as a C function which requires that the
> > > caller explicitly pass in a reference to the state storage.
>
> The whole purpose of that thing is to be utterly fast and lightweight.
Well I'm glad it wasn't designed to demonstrate tastefulness.
btw, do you know how damned irritating and frustrating it is for a code
reviewer to have his comments deliberately ignored and deleted in
replies?
> Having an out of line C call would trash the major advantage of this
> code.
Not really.
> > I'd be quite happy if it was:
> >
> > static inline u64 cnt32_to_63(u32 cnt_lo, u32 *__m_cnt_hi)
> > {
> > union cnt32_to_63 __x;
> > __x.hi = *__m_cnt_hi;
> > __x.lo = cnt_lo;
> > if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
> > *__m_cnt_hi =
> > __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
> > return __x.val;
> > }
> >
> > I imagine this would compile pretty much the same as the macro.
>
> Depends. As everybody has noticed now, the read ordering is important,
> and if gcc decides to not inline this
If gcc did that then it would need to generate static instances of
inlined functions within individual compilation units. It would be a
disaster for the kernel. For a start, functions which are "inlined" in kernel
modules wouldn't be able to access their static storage and modprobing
them would fail.
> for whatever reason then the
> ordering is lost.
Uninlining won't affect any ordering I can see.
> This is why this was a macro to start with.
>
> > I think it
> > would make it more obvious about the independence of the storage.
>
> I don't think having the associated storage be outside the macro make
> any sense either. There is simply no valid reason for having it shared
> between multiple invokations of the macro, as well as making its
> interface more complex for no gain.
oh god.
> > Alternatively, perhaps Nicolas just needs to mention this in the comment more
> > clearly.
>
> I wrote that code so to me it is cristal clear already. Any suggestions
> as to how this could be improved?
>
Does mn10300's get_cycles() really count backwards? The first two
callsites I looked at (crypto/tcrypt.c and fs/ext4/mballoc.c) assume
that it is an upcounter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Nicolas Pitre wrote:
> The whole purpose of that thing is to be utterly fast and lightweight.
> Having an out of line C call would trash the major advantage of this
> code.
No argument there.
> > I imagine this would compile pretty much the same as the macro.
Having said that, I realise it's wrong. The macro potentially takes a h/w
read operation (cnt_lo) and does it at a place of its choosing - which the
compiler may not be permitted to move if cnt_lo resolves to a bit of volatile
inline asm with memory constraints. Converting it to an inline function
forces cnt_lo to be resolved first.
> Depends. As everybody has noticed now, the read ordering is important,
> and if gcc decides to not inline this for whatever reason then the
> ordering is lost. This is why this was a macro to start with.
Good point. I wonder if you should've put a compiler barrier in there to at
least make the point.
> I don't think having the associated storage be outside the macro make any
> sense either.
It can have a comment attached to it to say what it represents. On the other
hand, it'd probably need further comments attaching to it to fend off people
who start thinking they can make use of this variable in other ways...
> > Alternatively, perhaps Nicolas just needs to mention this in the comment
> > more clearly.
>
> I wrote that code so to me it is cristal clear already. Any suggestions
> as to how this could be improved?
It ought to be, but clearly it isn't. Sometimes the obvious is all too easy to
overlook. I'll think about it, but perhaps something like:
* This macro uses a static internal variable to retain the upper counter.
* This has two consequences: firstly, it may be used in multiple ways by
* different callers for different things without interference; and secondly,
* each caller will get its own, independent counter, and so an out of line
* wrapper must be used if multiple callers want to use the same pair of
* counters.
It's a bit heavy-handed, but...
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 07/18] Trace clock core
* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Fri, 7 Nov 2008 01:16:43 -0500 Mathieu Desnoyers wrote:
>
> > > Is there something we should be fixing in m68k?
> > >
> >
> > Yes, but I fear it's going to go deep into include hell :-(
>
> Oh, OK. I thought that the comment meant that m68k's on_each_cpu()
> behaves differently at runtime from other architectures (and wrongly).
>
> If it's just some compile-time #include snafu then that's far less
> of a concern.
>
Should I simply remove this comment then ?
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Andrew Morton wrote:
> If gcc did that then it would need to generate static instances of
> inlined functions within individual compilation units. It would be a
> disaster for the kernel. For a start, functions which are "inlined" in kernel
> modules wouldn't be able to access their static storage and modprobing
> them would fail.
Do you expect a static inline function that lives in a header file and that
has a static variable in it to share that static variable over all instances
of that function in a program? Or do you expect the static variable to be
limited at the file level? Or just at the invocation level?
> Does mn10300's get_cycles() really count backwards?
Yes, because the value is generated by a pair of cascaded 16-bit hardware
down-counters.
> The first two callsites I looked at (crypto/tcrypt.c and fs/ext4/mballoc.c)
> assume that it is an upcounter.
Hmmm... I didn't occur to me that get_cycles() was available for use outside
of arch code. Possibly it wasn't so used when I first came up with the code.
I should probably make it count the other way.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 07/18] Trace clock core
On Fri, 7 Nov 2008 11:12:38 -0500 Mathieu Desnoyers wrote:
> * Andrew Morton (akpm@linux-foundation.org) wrote:
> > On Fri, 7 Nov 2008 01:16:43 -0500 Mathieu Desnoyers wrote:
> >
> > > > Is there something we should be fixing in m68k?
> > > >
> > >
> > > Yes, but I fear it's going to go deep into include hell :-(
> >
> > Oh, OK. I thought that the comment meant that m68k's on_each_cpu()
> > behaves differently at runtime from other architectures (and wrongly).
> >
> > If it's just some compile-time #include snafu then that's far less
> > of a concern.
> >
>
> Should I simply remove this comment then ?
>
umm, it could perhaps be clarified - mention that it's needed for an
include order problem.
It's a bit odd. Surely by the time we've included these:
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
someone has already included sched.h, and the definition of
_LINUX_SCHED_H will cause the later inclusion to not change anything?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
On Fri, 07 Nov 2008 16:21:55 +0000 David Howells wrote:
> Andrew Morton wrote:
>
> > If gcc did that then it would need to generate static instances of
> > inlined functions within individual compilation units. It would be a
> > disaster for the kernel. For a start, functions which are "inlined" in kernel
> > modules wouldn't be able to access their static storage and modprobing
> > them would fail.
>
> Do you expect a static inline function that lives in a header file and that
> has a static variable in it to share that static variable over all instances
> of that function in a program? Or do you expect the static variable to be
> limited at the file level? Or just at the invocation level?
I'd expect it to behave in the same way as it would if the function was
implemented out-of-line.
But it occurs to me that the modrobe-doesnt-work thing would happen if
the function _is_ inlined anyway, so we won't be doing that.
Whatever. Killing this many puppies because gcc may do something so
bizarrely wrong isn't justifiable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
* David Howells (dhowells@redhat.com) wrote:
> Andrew Morton wrote:
>
> > We have a macro which must only have a single usage in any particular
> > kernel build (and nothing to detect a violation of that).
>
> That's not true. It's a macro containing a _static_ local variable, therefore
> the macro may be used multiple times, and each time it's used the compiler
> will allocate a new piece of storage.
>
> > It apparently tries to avoid races via ordering tricks, as long
> > as it is called with sufficient frequency. But nothing guarantees
> > that it _is_ called sufficiently frequently?
>
> The comment attached to it clearly states this restriction. Therefore the
> caller must guarantee it. That is something Mathieu's code and my code must
> deal with, not Nicolas's.
>
> > There is absolutely no reason why the first two of these quite bad things
> > needed to be done. In fact there is no reason why it needed to be
> > implemented as a macro at all.
>
> There's a very good reason to implement it as either a macro or an inline
> function: it's faster. Moving the whole thing out of line would impose an
> additional function call overhead - with a 64-bit return value on 32-bit
> platforms. For my case - sched_clock() - I'm willing to burn a bit of extra
> space to get the extra speed.
>
> > As I said in the text which you deleted and ignored, this would be
> > better if it was implemented as a C function which requires that the
> > caller explicitly pass in a reference to the state storage.
>
> I'd be quite happy if it was:
>
> static inline u64 cnt32_to_63(u32 cnt_lo, u32 *__m_cnt_hi)
> {
> union cnt32_to_63 __x;
> __x.hi = *__m_cnt_hi;
> __x.lo = cnt_lo;
> if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
> *__m_cnt_hi =
> __x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
> return __x.val;
> }
>
Almost there. At least, with this kind of implementation, we would not
have to resort to various tricks to make sure a single code path is
called at a certain frequency. We would simply have to make sure the
__m_cnt_hi value is updated at a certain frequency. Thinking about
"data" rather than "code" makes much more sense.
The only missing thing here is the correct ordering. The problem is, as
I presented in more depth in my previous discussion with Nicolas, that
the __m_cnt_hi value has to be read before cnt_lo. First off, using this
macro with get_cycles() is simply buggy, because the macro expects
_perfect_ order of timestamps, no skew whatsoever, or otherwise time
could jump. This macro is therefore good only for mmio reads. One should
use per-cpu variables to keep the state of get_cycles() reads (as I did
in my other patch).
The following approach should work :
static inline u64 cnt32_to_63(u32 io_addr, u32 *__m_cnt_hi)
{
union cnt32_to_63 __x;
__x.hi = *__m_cnt_hi; /* memory read for high bits internal state */
smp_rmb(); /*
* read high bits before low bits insures time
* does not go backward.
*/
__x.lo = readl(cnt_lo); /* mmio read */
if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
*__m_cnt_hi =
__x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
return __x.val;
}
But any get_cycles() user of cnt32_to_63() should be shot down. The
bright side is : there is no way get_cycles() can be used with this
new code. 
e.g. of incorrect users for arm (unless they are UP only, but that seems
like a weird design argument) :
mach-sa1100/include/mach/SA-1100.h:#define OSCR __REG(0x90000010)
/* OS timer Counter Reg. */
mach-sa1100/generic.c: unsigned long long v = cnt32_to_63(OSCR);
mach-pxa/include/mach/pxa-regs.h:#define OSCR __REG(0x40A00010) /* OS
Timer Counter Register */
mach-pxa/time.c: unsigned long long v = cnt32_to_63(OSCR);
Correct user :
mach-versatile/core.c: unsigned long long v =
cnt32_to_63(readl(VERSATILE_REFCOUNTER));
The new call would look like :
/* Hi 32-bits of versatile refcounter state, kept for cnt32_to_64. */
static u32 versatile_refcounter_hi;
unsigned long long v = cnt32_to_64(VERSATILE_REFCOUNTER, refcounter_hi);
Mathieu
> I imagine this would compile pretty much the same as the macro. I think it
> would make it more obvious about the independence of the storage.
>
> Alternatively, perhaps Nicolas just needs to mention this in the comment more
> clearly.
>
> David
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
On Fri, 7 Nov 2008, Andrew Morton wrote:
> On Fri, 07 Nov 2008 10:01:01 -0500 (EST) Nicolas Pitre wrote:
>
> > On Fri, 7 Nov 2008, David Howells wrote:
> >
> > > Andrew Morton wrote:
> > >
> > > > As I said in the text which you deleted and ignored, this would be
> > > > better if it was implemented as a C function which requires that the
> > > > caller explicitly pass in a reference to the state storage.
> >
> > The whole purpose of that thing is to be utterly fast and lightweight.
>
> Well I'm glad it wasn't designed to demonstrate tastefulness.
Fast tricks aren't always meant to be beautiful. That,s why we have
abstraction layers.
> btw, do you know how damned irritating and frustrating it is for a code
> reviewer to have his comments deliberately ignored and deleted in
> replies?
Do you know how irritating and frustrating it is when reviewers don't
care reading the damn comments along with the code? Maybe it wasn't the
case, but you gave the impression of jumping to conclusion without even
bothering about the associated explanation in the code until I directed
you at it.
> > Having an out of line C call would trash the major advantage of this
> > code.
>
> Not really.
Your opinion.
> > > I imagine this would compile pretty much the same as the macro.
> >
> > Depends. As everybody has noticed now, the read ordering is important,
> > and if gcc decides to not inline this
>
> If gcc did that then it would need to generate static instances of
> inlined functions within individual compilation units. It would be a
> disaster for the kernel. For a start, functions which are "inlined" in kernel
> modules wouldn't be able to access their static storage and modprobing
> them would fail.
That doesn't mean that access ordering is preserved within the function,
unless the interface is pointer based, but that won't work if the
counter is accessed through some special instruction rather than a
memory location.
> > for whatever reason then the
> > ordering is lost.
>
> Uninlining won't affect any ordering I can see.
See above.
> > This is why this was a macro to start with.
> >
> > > I think it
> > > would make it more obvious about the independence of the storage.
> >
> > I don't think having the associated storage be outside the macro make
> > any sense either. There is simply no valid reason for having it shared
> > between multiple invokations of the macro, as well as making its
> > interface more complex for no gain.
>
> oh god.
Thank you. ;-)
> > > Alternatively, perhaps Nicolas just needs to mention this in the comment more
> > > clearly.
> >
> > I wrote that code so to me it is cristal clear already. Any suggestions
> > as to how this could be improved?
> >
>
> Does mn10300's get_cycles() really count backwards? The first two
> callsites I looked at (crypto/tcrypt.c and fs/ext4/mballoc.c) assume
> that it is an upcounter.
I know nothing about mn10300.
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Nicolas Pitre wrote:
> > Well I'm glad it wasn't designed to demonstrate tastefulness.
>
> Fast tricks aren't always meant to be beautiful. That,s why we have
> abstraction layers.
And comments. The comment attached to cnt32_to_63() is well written and
informative.
> I know nothing about mn10300.
That's aimed my way.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
* David Howells (dhowells@redhat.com) wrote:
> Nicolas Pitre wrote:
>
> > > I mean, the darned thing is called from sched_clock(), which can be
> > > concurrently called on separate CPUs and which can be called from
> > > interrupt context (with an arbitrary nesting level!) while it was running
> > > in process context.
> >
> > Yes! And this is so on *purpose*. Please take some time to read the
> > comment that goes along with it, and if you're still not convinced then
> > look for those explanation emails I've already posted.
>
> I agree with Nicolas on this. It's abominably clever, but I think he's right.
>
> The one place I remain unconvinced is over the issue of preemption of a process
> that is in the middle of cnt32_to_63(), where if the preempted process is
> asleep for long enough, I think it can wind time backwards when it resumes, but
> that's not a problem for the one place I want to use it (sched_clock()) because
> that is (almost) always called with preemption disabled in one way or another.
>
> The one place it isn't is a debugging case that I'm not too worried about.
>
I am also concerned about the non-preemption off case.
Then I think the function should document that it must be called with
preempt disabled.
Mathieu
> > > /*
> > > * Caller must provide locking to protect *caller_state
> > > */
> >
> > NO! This is meant to be LOCK FREE!
>
> Absolutely.
>
> David
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
Mathieu Desnoyers wrote:
> First off, using this macro with get_cycles() is simply buggy, because the
> macro expects _perfect_ order of timestamps, no skew whatsoever, or
> otherwise time could jump.
Erm... Why can't I pass it get_cycles()? Are you saying that sched_clock()
in MN10300 is wrong for it's use of get_cycles() with cnt32_to_63()?
> __x.lo = readl(cnt_lo); /* mmio read */
readl() might insert an extra barrier instruction. Not only that, io_addr
must be unsigned long.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [RFC patch 08/18] cnt32_to_63 should use smp_rmb()
* David Howells (dhowells@redhat.com) wrote:
> Mathieu Desnoyers wrote:
>
> > First off, using this macro with get_cycles() is simply buggy, because the
> > macro expects _perfect_ order of timestamps, no skew whatsoever, or
> > otherwise time could jump.
>
> Erm... Why can't I pass it get_cycles()? Are you saying that sched_clock()
> in MN10300 is wrong for it's use of get_cycles() with cnt32_to_63()?
>
Yes. Do you think the synchronization of the cycles counters is
_perfect_ across CPUs so that there is no possible way whatsoever that
two cycle counter values appear to go backward between CPUs ? (also
taking in account delays in __m_cnt_hi write-back...)
As I showed in my previous example, if you are unlucky enough to hit the
spot where the cycle counters go backward at the time warp edge, time
will jump of 2^32, so about 4.29s at 1GHz.
> > __x.lo = readl(cnt_lo); /* mmio read */
>
> readl() might insert an extra barrier instruction. Not only that, io_addr
> must be unsigned long.
If we expect the only correct use-case to be with readl(), I don't see
the problem with added synchronization.
>
Ah, right, then the parameters should be updated accordingly.
static inline u64 cnt32_to_63(unsigned long io_addr, u32 *__m_cnt_hi)
{
union cnt32_to_63 __x;
__x.hi = *__m_cnt_hi; /* memory read for high bits internal state */
rmb(); /*
* read high bits before low bits insures time
* does not go backward. Sync across
* CPUs and for interrupts.
*/
__x.lo = readl(io_addr); /* mmio read */
if (unlikely((s32)(__x.hi ^ __x.lo) < 0))
*__m_cnt_hi =
__x.hi = (__x.hi ^ 0x80000000) + (__x.hi >> 31);
return __x.val;
}
Mathieu
> David
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/