Eric Young (eay@cryptsoft.com)
Fri, 15 May 1998 16:02:00 +1000 (EST)
This CPU expension does look nice.
I'll ignore the applicabilty to symetric ciphers and message digests, which
all tend to be a jumble of data flow dependancies :-).
For Bignum multiplication, it looks very nice.
It has 32 128bit registers.
One nice instruction is one that will take the 32 bytes from 2 16 byte
registers and place 16 of them into arbitary bytes in a third register.
It appears very very simple to re-order byte all over the palce.
For 128*128 -> 256 bit multiplicaions, we have
 ((32bit)(8 16bit*16bit->4 32bit)+
         (8 16bit*16bit->4 32bit)))+4*32bit => 4 32bit
and a probably more usefull
 ((32bit)(16 8bit*8bit -> 16bit)+
         (16 8bit*8bit -> 16bit)+
         (16 8bit*8bit -> 16bit)+
         (16 8bit*8bit -> 16bit))+4*32bit => 4 * 32bit
instructions.  The problem with the first is that there can be overflow.
For carry propogation over 32bit words, there are vector comparison functions
which operate on 4 32bit numbers at a time.
Depending on how well it pipelines things.....
eric
The following archive was created by hippie-mail 7.98617-22 on Fri Aug 21 1998 - 17:17:24 ADT