> 1) In openssl-0.9.8/crypto/des/cfb_enc.c line 170 there is "memcpy
> (ovec,ovec+num,8);" and since ovec and ovec+num will overlap sometimes,
> this function relies on undocumented/undefined behavior of memcpy?

The original reason for choosing of memcpy was a) it's comonly inlined
by compilers [most notably gcc], but not memmove, b) I fail to imagine
how it can fail with overlapping regions if num is guaranteed to be
positive, even if the routine is super-optimized, inlined, whatever. Can

> If I use the Intel C++ Compiler 9.0 for EM64T with /O2 or higher, it
> replaces the above memcpy with the optimized function __intel_fast_memcpy,
> which breaks DES in OpenSSL.

For reference, note that Linux version avoids __intel_fast_memcpy with
-Dmemcpy=__builtin_memcpy, because libirc.a caused griefs when linked
into shared library. __intel_fast_memcpy feels as overkill in OpenSSL
context and inlined code [movs or unrolled loop] should do better job.
Can you try to compile with -Dmemcpy=__builtin_memcpy

> It seems like memcpy should be replaced with memmove here?

Does it mean that you've tried to replace it with memmove and can
confirm that DES works if compiled with ICC /O2 or higher? It actually
smells more like compiler bug than memcpy vs. memmove issue...

> 2) On Win64 platforms, a socket is now a 64-bit pointer but SSL_set_fd and
> BIO_set_fd accept only 32-bit integers. Can't this cause problems if the
> pointer points higher than the lowest 4 gig address space?

There is explicit comment about this in e_os.h. The socket value
constitutes offset in a table [it's per-process kernel-side table] of
limited size, less than 2GB, and therefore it's safe to use int to
accomodate the value.

> 3) Is AES really a lot faster on Win64/x64 compared to the i586 asm
> version or am I doing something wrong?

1. Who says that AES is assembler empowered on Win32? It's not, not yet:-)
2. What's wrong with 64-bit code being faster then 32-bit one? 64-bit
code has access to wider register bank, 8 extra registers, and in AES
case there is no need to spill any registers to stack in every loop
spin. Less instructions, no wasted bus bandwidth -> better performance.

__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org