Integer multiply performance of UltraSPARC chips  SUN
This is a discussion on Integer multiply performance of UltraSPARC chips  SUN ; There are a few very negative comments about the performance of
UltraSPARC processors on the pages of the GNU multi precision library
(GMP), which is used for multiplying large integers.
Here, where there are some benchmarks:
http://www.swox.com/gmp/gmpbench.html
it says "UltraSPARC ...

Integer multiply performance of UltraSPARC chips
There are a few very negative comments about the performance of
UltraSPARC processors on the pages of the GNU multi precision library
(GMP), which is used for multiplying large integers.
Here, where there are some benchmarks:
http://www.swox.com/gmp/gmpbench.html
it says "UltraSPARC 3's terrible scores are a result of its uniquely
poor integer multiply support (unsuitable architectural support +
simplistic integer multiply implementation)."
Then here, where it talks about the performance of 32 vs 64 bit
processors for computing with very large integers (>> 64 bits):
http://www.swox.com/gmp/32vs64.html
it says:
"Now, UltraSPARC is a particularly poor example for showing the
superiority of 64bit processors for this problem domain, since this
processor has a uniquely poor instruction set for bignum operations."
Certainly the benchmarks with gmp are very poor with UltraSPARC
processors, but I wonder how much of this is due to a misunderstanding
on the part of the GMP developers. I would have expected the use of 64
bit instructions to considerably improve performance in this task, but
on my Ultra 80 the gains are only a few percent.
Looking at the source distibution
http://ftp.sunet.se/pub/gnu/gmp/gmp4.1.4.tar.gz
there is a README (in the directory gmp4.1.4/mpn/sparc64) with again
some very negative comments about the chips.
I'd be interested from anyone who knows more about the chips to comment.
If the assembler routines are broken badly, perhaps they might advise
the GMP developers of this.
That library is used as part of some expensive commercial software 
Mathematica being one example.

Dave K
http://www.southminsterbranchline.org.uk/
Please note my email address changes periodically to avoid spam.
It is always of the form: monthyear@domain. Hitting reply will work
for a couple of months only. Later set it manually. The month is
always written in 3 letters (e.g. Jan, not January etc)

Re: Integer multiply performance of UltraSPARC chips
Dave writes:
>"Now, UltraSPARC is a particularly poor example for showing the
>superiority of 64bit processors for this problem domain, since this
>processor has a uniquely poor instruction set for bignum operations."
I think they're complaining about the lack of a 64x64>128 bit multiply.
Casper

Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Re: Integer multiply performance of UltraSPARC chips
Casper H.S. Dik wrote:
> Dave writes:
>
>
>>"Now, UltraSPARC is a particularly poor example for showing the
>>superiority of 64bit processors for this problem domain, since this
>>processor has a uniquely poor instruction set for bignum operations."
>
>
> I think they're complaining about the lack of a 64x64>128 bit multiply.
>
> Casper
If so, it is far from the only gripe of the gmp developers, with at one
point them using floating point instructions rather than integers ones!
Also they talk about "Integer conditional move instructions". I should
have not put the word multiply in the subject line really, but I will
not change it now.
Here are some comments from one of the README files, with a note that
the UltraSPARC 3 is slower (I assume compared to UltraSPARC 2).
 From gmp4.1.4/mpn/sparc64/README 
The 64bit integer multiply instruction mulx takes from 5 cycles to 35
cycles, depending on the position of the most significant bit of the
first source operand. When used for 32x32>64 multiplication, it needs
20 cycles. Furthermore, it stalls the processor while executing. We
stay away from that instruction, and instead use floatingpoint operations.
Integer conditional move instructions cannot dualissue with other
integer instructions. No conditional move can issue 15 cycles after a
load. (Or something such bizarre.) Useless.
The UltraSPARC3 pipeline seems similar, but is somewhat more rigid.
Branches execute slower, and there may be other new stalls. Integer
multiply doesn't halt the CPU and also has a much lower latency. But
it's still not pipelined, and thus useless for our needs.

Dave K
http://www.southminsterbranchline.org.uk/
Please note my email address changes periodically to avoid spam.
It is always of the form: monthyear@domain. Hitting reply will work
for a couple of months only. Later set it manually. The month is
always written in 3 letters (e.g. Jan, not January etc)