> -----Original Message-----
> From: owner-openssl-dev@openssl.org

[mailtowner-openssl-dev@openssl.org]
> On Behalf Of Andy Polyakov
> Sent: Wednesday, April 06, 2005 5:34 PM
> To: openssl-dev@openssl.org
> Subject: Re: RC4 optimize for em64t
>=20
> >>>Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb
> >>>(%rdi,%r10),%r14b and make inter-register move between r8 and r14
> >>>conditional?
> >>>
> >>
> >> I will try it.

> >
> > I have tried it, not performance gain.

>=20
> Does it mean that it's same or does it mean that it's slower? Was it
> cmov or was it jump over mov instruction? BTW, what is the
> latency/throughput for Intel cmov anyway? I can't find information
> anywhere...


Using cmov here slows down a lot.
move the mov r13b, (%rdi, %rdi) to conditional has the same speed...

>=20
> Another question. Why rotations are 32-bit? Did you try 64-bit

rotations
> and found them slow? If so, for how much?


Changing to 64 bit ror will slow the throughput to around 480Mb/s
>=20
> You may wonder why all these questions. I want to understand the code

to
> make it regular enough to express assembler unrolled loop in perl loop
> terms. It make it easier for us to maintain and I'm even ready to
> sacrifice few percents of performance for more regular looking code.
>=20
> >>>BTW, 272MBps at 3.6GHz? I get 262MBps out of [as just mentioned
> >>>virtually identical] 32-bit code at 2.4GHz P4... A.
> >>
> >> In fact, Your implement on EM64t isn't that slow if
> >> we change the inc and dec to add and sub.
> >>
> >> With that change the throughput boost from 272Mb/s to 396Mb/s.

>=20
> For *now* I'm committing only this change to CVS and will have closer
> look at unrolled loop later on [some time next week]. BTW, there is
> aCnother idea I'd like to try, so I'm likely to send you some code for
> benchmarking on EM64T hardware. A.



I am glad to do the test for you.
I have tested changing inc and dec in 32 bit code to add and sub and
see a %2 performance gain on a P4.=20
It is a bit strange you see slowdown. Change inc to add will only
benefit on P4 in theory.

Zou Nan hai
__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org