Re: PPC bn_div_words routine rewrite
Forgive my lack of knowledge in your existing code. But it is really
designed with optimization in mind? What was the driving force for
the C function?
If it is optimized what is the time required?
I jumped way to early at the "fast" conclusion I must admit. Because I
really never had speed in mind. As I explained my goal is to make it
easy to understand. If it has any performance advantage it is purely
a side effect. (You never answer my comment about performance in my
last email so I can only guess what the design intent was for you
I mean if you choose to optimize my code for speed, it's perfectly
doable and I have full comfidence anyone else who have read this email
thread can do it. But again, I have no idea how much time you spend
on your routine so I guess I should refrain from dissing it. My
mistake once again.
What else will you be teaching me today? =3D)
On 7/8/05, Andy Polyakov <email@example.com> wrote:[color=blue][color=green]
> > Please do not use previously mentioned routine, it missed 1 corner
> > case where 32=3Dnum_bits_word(d)
> > Revised routine that passes (cd test; make bntest).[/color]
> Does it mean that previous version didn't actually pass the test? I mean
> if it did on your CPU, but not mine, probably we could learn something
> else about ways PPC can be implemented...
> > All I had to do is add one more instruction to the routine.
> > Please test on your ppc32 machines.
> > Once we are all happy,[/color]
> Is this your agenda? Make everybody happy:-):-):-) Good luck:-):-):-)
> > it's a matter of adding the core dump at the beginning.
> > Thus you have a fast,[/color]
> 32*(div latency + mul latency) is fast? If I call BN_bn2dec in loop it
> spins 4 times slower than with current implementation. Well, at least on
> computer I have access to...
> > easy to understand, predictable bn_div_words, as
> > opposed to that monster in 0.9.8.[/color]
> Hostility again? Are you saying that nobody understands current
> implementation and that it produces unpredictable results? I disagree:-)
> > Other architectures will benefit if this C function is used in bn_asm.c[/color]
> How? And which architectures exactly? Virtually all 32-bit
> architectures, including PPC32, opt for
> (BN_ULONG)(((((BN_ULLONG)h)<<BN_BITS2)|l)/(BN_ULLONG)d). A.
> OpenSSL Project [url]http://www.openssl.org[/url]
> Development Mailing List [email]firstname.lastname@example.org[/email]
> Automated List Manager [email]email@example.com[/email]
OpenSSL Project [url]http://www.openssl.org[/url]
Development Mailing List [email]firstname.lastname@example.org[/email]
Automated List Manager [email]email@example.com[/email]