C function corresponding to assembly routine below.
It's provided to ease review of the assembly.
Other architectures will benefit if this C function is used in bn_asm.c

Regards,
David

unsigned long div_words (unsigned long h,=20
unsigned long l,
unsigned long d)
{
unsigned long i_h; /* intermediate dividend */
unsigned long i_q; /* quotient of i/d */
unsigned long i_r; /* remainder of i/d */

unsigned long i_cntr;
unsigned long i_carry;
unsigned long i_overflow;

unsigned long ret_q; /* return quotient */

/* cannot divide by zero */
if (d =3D=3D 0) return 0xffffffff;

/* do simple 32-bit divide */
if (h =3D=3D 0) return l/d;
=20
i_q =3D h/d;
i_r =3D h - (i_q*d);
ret_q =3D i_q;

i_cntr =3D 32;

while (i_cntr--)
{
i_carry =3D (l & 0x80000000) ? 1:0;
l =3D l << 1;

i_overflow =3D (i_r & 0x80000000) ? 1:0;
i_h =3D (i_r << 1) | i_carry;
i_q =3D i_h/d;
i_q =3D i_q + i_overflow;
i_s =3D i_q*d;
i_r =3D i_h - (i_q*d);

ret_q =3D (ret_q << 1) | i_q;

}

return ret_q;
}

On 7/7/05, David Ho wrote:
> Please do not use previously mentioned routine, it missed 1 corner
> case where 32=3Dnum_bits_word(d)
>=20
> Revised routine that passes (cd test; make bntest).
> All I had to do is add one more instruction to the routine.
>=20
> Please test on your ppc32 machines.
>=20
> Once we are all happy, it's a matter of adding the core dump at the begin=

ning.
> Thus you have a fast, easy to understand, predictable bn_div_words, as
> opposed to that monster in 0.9.8.
>=20
> #
> # Handcrafted version of bn_div_words
> #
> # r3 =3D h
> # r4 =3D l
> # r5 =3D d
>=20
> cmplwi 0,r5,0 # compare r5 and 0
> bc BO_IF_NOT,CR0_EQ,.Lppcasm_div1 # proceed if d!=3D0
> li r3,-1 # d=3D0 return -1
> bclr BO_ALWAYS,CR0_LT
> .Lppcasm_div1:
> cmplwi 0,r3,0 # compare r3 and 0
> bc BO_IF_NOT,CR0_EQ,.Lppcasm_div2 # proceed if h !=3D 0
> divwu r3,r4,r5 # ret_q =3D l/d
> bclr BO_ALWAYS,CR0_LT # return result in r3
> .Lppcasm_div2:
> divwu r9,r3,r5 # i_q =3D h/d
> mullw r10,r9,r5 # i_r =3D h - (i_q*d)
> subf r10,r10,r3
> mr r3,r9 # req_q =3D i_q
> .Lppcasm_set_ctr:
> li r12,32 # ctr =3D bitsizeof(d)
> mtctr r12
> .Lppcasm_div_loop:
> addc r4,r4,r4 # l =3D l << 1 -> i_carry
> adde r11,r10,r10 # i_h =3D (i_r << 1) | i_carry
> divwu r9,r11,r5 # i_q =3D i_h/d
> addze r9,r9 # very important! - DKWH
> mullw r10,r9,r5 # i_r =3D i_h - (i_q*d)
> subf r10,r10,r11
> add r3,r3,r3 # ret_q =3D ret_q << 1 | i_q
> add r3,r3,r9
> bc BO_dCTR_NZERO,CR0_EQ,.Lppcasm_div_loop
> .Lppc_div_end:
> bclr BO_ALWAYS,CR0_LT # return result in r3
> .long 0x00000000
>=20
>=20
> Regards,
> David
>=20
>=20
> On 7/5/05, Peter Waltenberg wrote:
> >
> > Thanks for finding and fixing this. Particularly for finding and fixin=

g it
> > before 0.9.8 hit the streets.
> >
> > Peter
> >
> > Peter Waltenberg
> > Architect
> > IBM Crypto for C Team
> > IBM/Tivoli Gold Coast Office
> >
> >
> >
> >
> > Andy Polyakov
> > Sent by: owner-openssl-dev@openssl.org
> >
> > 06/07/2005 07:49 AM
> >
> > Please respond to
> > openssl-dev
> >
> >
> > To openssl-dev@openssl.org
> >
> > cc linuxppc-embedded@ozlabs.org
> >
> > Subject Re: PPC bn_div_words routine rewrite
> >
> >
> >
> >
> >
> > > Okay, having actually did what Andy suggested, i.e. the one liner fix
> > > in the assembly code, bn_div_words returns the correct results.

> >
> > Note that the final version, one committed to all relevant OpenSSL
> > branches since couple of days ago and one which actually made to just
> > released 0.9.8, is a bit different from originally suggested one-line
> > fix, see for example
> > http://cvs.openssl.org/chngview?cn=3D14199.
> >
> > > At this point, my conclusion is, up to openssl-0.9.8-beta6, the ppc=

32
> > > bn_div_words routine generated from crypto/bn/ppc.pl is still busted=

..
> >
> > Yes. Though it should be noted that 0.9.8 was inadvertently avoiding t=

he
> > bug condition. Recall that original problem report was for 0.9.7.
> >
> > > Why do you signal an overflow condition when it appears functions th=

at
> > > call bn_div_words do not check for overflow conditions?

> >
> > That's question to IBM. By the time they submitted the code, I've
> > explicitly asked what would be appropriate way to generate *fatal*
> > condition at that point, i.e. one which would result in a core dump, a=

nd
> > it came out as division by 0 instruction. By that time I had no access
> > to any PPC machine and had to just go with it. Now it actually came as
> > surprise that division by 0 does not raise an exception, but silently
> > returns implementation-specific value... A.
> > __________________________________________________ ____________________
> > OpenSSL Project http://www.openssl.org
> > Development Mailing List openssl-dev@openssl.org
> > Automated List Manager majordomo@openssl.org
> >
> >

>

__________________________________________________ ____________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majordomo@openssl.org