>> Do note "[when] num [as in memcpy(ovec,ovec+num,8)] is guaranteed to
>> be positive." Question was can you imagine memcpy implementation that
>> would fail to handle overlapping regions when source address is
>> *larger* than destination? Question was *not* if you can imagine
>> memcpy implementation that would fail to handle arbitrary overlapping
>> regions.

> Yes.
> void * memcpy(void * dst, const void * src, size_t len) {
> char * d = ((char *) dst) + len;
> const char * s = ((const char *) src) + len;
> while (len-- > 0) {
> *--d = *--s;
> }
> return dst;
> }
> This is a fully conformant implementation of memcpy. Not sure why you'd
> implement it this way, but it's legal.

Question is not how I implement it, but why ICC would. What would be a
performance reason to implement something similar to this... But
whatever, memmove it is...

>> See a). Inlining is believed/expected to be faster than call to a
>> function.

> This is not always true. If the inlining causes the code size to bloat
> and no longer fit into cache, for example. Also, shared copies of the
> function can share branch prediction information.

Well, if one uses designated intrinsic function, compiler has a chance
to evaluate the trade-off and "decide" when it's appropriate to inline
or call a function, while in case of memmove you're bound to call...

> It is true in this case, I mention. At least on the x86.

"This case?" Two 32-bit loads + two 32-bit stores [both gcc and icc 8
manage to inline it like this] vs. call to a function to copy 8 bytes?
But as said, whatever, memmove for cfb_enc is it... A.

__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org