Check out "A Formal Specification of Intel Itanium Processor Family Memory
Ordering" (http://www.intel.com/design/itanium/...s/25142901.pdf). It
describes in excruciating detail how reordering of memory operations can be
observed by other processors. Example A.1 (in Appendix A) is a simple
example of how writes can be reordered, and A.2 is an example of what can be
done to enforce a desired ordering through use of aquire and release
semantics. B.5 is an interesting example of how competing writes to the
same address on parallel processors can lead to one processor seeing it's
own write occur twice.

Arch D. Robison's article "Understanding memory consistency is essential"
(http://www.windevnet.com/documents/s=7545/ddj0304d/) explains some of these
concepts a lot more clearly and generally than the specification above.

While trying to understand these concepts I found it useful to hold a model
of a hypothetical processor in my head; one which allows extensive
reordering of writes as seen by an external observer (ie. another
processor). The reordering could be for a multitude of reasons (and
combinations), such as compiler optimisations, on-the-fly processor
instruction reordering, cache-lines being flushed in a particular order (and
different to the order of the store instructions), and write combining
buffers delaying writebacks. I found that the more aggressive a model I
kept in mind the easier it was to understand why without aquire and release
semantics that it is, dare I say, impossible that a safe optimisation (a la
DCLP) exists.

Why this doesn't break lots of code: since the issue here is how memory
accesses are ordered across multiple processors, it's only when two
processors are accessing the same variables that things can go haywire, and
this is where locks come in. What I mean is that the reordering doesn't
really introduce any additional requirements for thread-safe code, it just
causes code that doesn't use locks appropriately to breaks in different
ways. Just as code that does not use locks can be thread-unsafe it can be
even more thread-unsafe when reordering is involved.

Regards,

Steven

-----Original Message-----
From: owner-openssl-dev@openssl.org [mailtowner-openssl-dev@openssl.org]
On Behalf Of David C. Partridge
Sent: Thursday, 7 April 2005 12:56 AM
To: openssl-dev@openssl.org
Subject: RE: OpenSSL use of DCLP may not be thread-safe on multiple
processors

ARGHHHHH!!!!!

Are you absolutely sure that this is the case - that's scary - I thought
that the whole issue of SMP cache coherency and write order was solved years
ago.

I mean that if the order of memory write visibility between processors can't
be g'teed, than a whole lot MORE than just DCLP crashes and burns ... How
in that case can anyone write safe MP code?

D.


__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org

__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org