On Mon, 3 Dec 2007, Karsten Behrmann wrote:

> Ah, the old "double-checked locking" thing. Take what I say here
> with a grain of salt: I'm a plain programmer, not very versed in
> different architectures, so I just pick up what I hear.
>
> essentially, the problem with the scheme is that plain variable
> writes may not be ordered as expected, making them unsuitable to
> communicate between threads and processors, depending on compiler
> and architecture.
>
> Let's take the example
>
> pthread_mutex_lock();
> if (!is_set) {
> a = 3;
> b = 4;
> c = a * b;
> is_set = true;
> }
> pthread_mutex_unlock();
>
> Now, an optimizing compiler might put a and b into registers,
> compute a * b while storing a 1 to is_set, and then store a, b, and
> c from registers into memory. As I said, I'm not actually sure
> where compilers are allowed to do this.
>
> Also, on some architectures, the caching structure may cause writes
> to is_set to show up earlier on another CPU than writes to other
> parts of memory, if you're unlucky.


As alfred said, the locking implementation is supposed to hide these
problems so that most code doesn't need to worry about them. I don't
know what pthread does, but in for kernel mutexes on i386 with CC=gcc,
the first problem is handled by putting a "memory" clobber in the asm
for atomic_cmpset to force the compiler to load and store all memory
variables (not just the ones in args for the asm), while the second
problem is handled implicitly by using a locked instruction for cmpset
and/or other magic i386 cache behaviour. The atomic cmpset is, more
precisely, atomic_cmpset_acq for locking and atomic_cmpset_rel for
unlocking. It is acquire/release semantics that handles the second
problem. On i386's acquire/release semantics happen almost automatically
so at the lowest level there is only atomic_cmpset_int and everything
else is #defined as that. Some sloppy (buggy?) callers of the atomic_
cmpset family use only atomic_cmpset_int, and one of the examples in
atomic.9 doesn't mention acq or rel. It isn't clear what a non-
acquiring non-releasing cmpset is good for.

> In fact, in your patch, you sometimes actually set the variable
> getting checked before setting the data it is supposed to protect ;-)
>
>> - _MUTEX_LOCK(&gmt_mutex);
>> if (!gmt_is_set) {
>> - gmt_is_set = TRUE;
>> + _MUTEX_LOCK(&gmt_mutex);
>> + if (!gmt_is_set) {
>> + gmt_is_set = TRUE;

>
> XXX - at this point, gmt_is_set is TRUE but gmtptr is not initialized yet.


This is a bug. It shows how tricky locking becomes again if you don't
use the normal mechanism of always acquiring the lock first. With the
normal mechanism, the ordering inside locked regions doesn't matter since
variables are only accessed inside of locked regions, but the optimized
version has accesses outside of locked regions.

Setting gmt_is_set after initializing gmtptr isn't much better, since
the unlock hasn't had a chance to force everything to memory so
gmt_is_set might reach memory before gmtptr. I think the order needs
to be something like:

gmtptr = ...
unlock
lock
gmt_is_set = TRUE;
unlock,

or slightly shorter using explicit barrier operations instead of implicit
ones in unlock.

Even if we could guarantee that gmtptr and gmt_is_set are not written
prior to the unlock, there would still be a problem with the optimized
version unless we could guarantee the order of the writes triggered
by the unlock. The unoptimizated version is not affected by this
complication either.

Bruce
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"