1Multi-Precision Arithmetic in SoftwareLecture 11Floating Point OperationsFloating Point Operations23Fig. 17.3 The ANSI/IEEE standard floating-point number representation formats.4Table 17.1 Some features of the ANSI/IEEE standard floatingpoint number representation formats5Fig. 17.4 Denormals in the IEEE single-precision format.67Fig. 18.5 Block diagram of a floating-point multiplier.8Fig. 18.6 Block diagram of a floating-point divider.Fig. 18.1 Block diagram of a floating-point adder/subtractor.9Fig. 18.2 One bit-slice of a single-stage pre-shifter.Fig. 18.3 Four-stage combinational shifter for preshiftingan operand by 0 to 15 bits.Fig. 18.4 Leading zeros/ones counting versus prediction.10Arithmetic Operationsin SoftwareLittle-Endian vs. Big-Endian Representation A0 B1 C2 D3 E4 F5 67 8916LSBMSBMSB = A0B1C2D3E4F567LSB = 89Big-EndianLittle-EndianLSB = 890MAX67F5E4D3C2B1MSB = A0addressLittle-Endian vs. Big-Endian Camps Big-EndianLittle-Endian0MAXaddressMSBLSB. . .LSBMSB. . .Motorola 68xx, 680x0IntelIBMHewlett-PackardDEC VAXInternet TCP/IPSun SuperSPARCBi-EndianMotorola Power PCSilicon Graphics MIPSRS 23211Origin of the termsLittle-Endian vs. Big-EndianJonathan Swift, Gulliver’s Travels• A law requiring all citizens of Lilliput to break their soft-eggs at the little ends only• A civil war breaking between the Little Endians and the Big-Endians, resulting in the Big Endians taking refuge on a nearby island, the kingdom of Blefuscu• Satire over holy wars between Protestant Church of England and the Catholic Church of FranceLittle-Endian vs. Big-EndianBig-Endian Little-Endian• easier to determine a sign of the number• easier to compare two numbers• easier to divide two numbers• easier to print• easier to write multiple precisionroutines, especially addition and multiplication• easier to load and store multibytenumbersAdvantages and DisadvantagesPointers (1)8967F5E4D3C2B1A0Big-EndianLittle-Endian0MAXaddressint * iptr;(* iptr) = 8967; (* iptr) = 6789;iptr+112Pointers (2)8967F5E4D3C2B1A0Big-EndianLittle-Endian0MAXaddresslong int * lptr;(* lptr) = 8967F5E4; (* lptr) = E4F56789;lptr + 1SOFTWARE MULTIPLICATION. . .A0A1An-2An-1. . .B0B1Bn-2Bn-1. . .C0C1Cn-2Cn-1. . .CnCn+1C2n-2C2n-1x2N bytes = 2n wordsN bytes = n words1 word = l bytes = λ bits. . .A0A1An-1An-2. . . B0B1Bn-1Bn-2D0D1D2. . .C0C1Cn-1Cn-2. . . CnCn+1C2n-1C2n-2D2n-4D2n-3D2n-2. . . . .3 words3 wordsABC2 words2 wordsD0= A0B0D1= A0B1+ A1B0D2= A0B2+ A1B1+ A2B0D2n-4= An-3Bn-1+ An-2Bn-2+ An-1Bn-3D2n-3= An-2Bn-1+ An-1Bn-2D2n-2= An-1Bn-11 word = l bytes = λ bitsPaper-and-Pencil Algorithm of MultiplicationAssertion:lg2n ≤ λx+++++13. . .A0A1An-1An-2. . . A0A1An-1An-2D0D1D2. . .C0C1Cn-1Cn-2. . . CnCn+1C2n-1C2n-2D2n-4D2n-3D2n-2. . . . .3 words3 wordsAAC2 words2 wordsD0= A02D1= 2A0A1D2= 2A0A2+ A12D2n-4= 2An-3An-1+ An-22D2n-3= 2An-2An-1D2n-2= An-121 word = l bytes = λ bitsPaper-and-Pencil Algorithm of SquaringAssertion:lg2n ≤ λx+++++Paper-and-Pencil Algorithm of MultiplicationRun Time Assuming Purely Sequential Execution of InstructionstMUL(N) = tM(1 + (1 + ))paper-and-pencilN2l2tAtM4 lNl - word length in bytesN - operand length in bytestM- time of a single word multiplicationtA- time of a single word additionpaper-and-penciltMUL= θ (N2)Paper-and-Pencil Algorithm of SquaringRun Time Assuming Purely Sequential Execution of Instructionspaper-and-penciltMULpaper-and-penciltMULpaper-and-penciltSQR<< 112=12(1 + )5 + τ4+n (1 + τ)τ==tMtAtime of a single word multiplicationtime of a single word additionFor large npaper-and-penciltSQR≈1214Karatsuba Algorithm of MultiplicationBasic Recursive Step (1)A1A0B0B1n2words= ν bitsABn words = N bytesD0= A0B0D1= A1B1D0D1D2= (A1-A0)(B0-B1)C1C0C3C202ν22ν23ν24νCxA = (A1, A0)2ννννB = (B1, B0)2ννννC = (C3, C2, C1, C0)2ννννKaratsuba Algorithm of MultiplicationBasic Recursive Step (2)C = A B = (A1 2ν+ A0) (B1 2ν+ B0) == A1B122ν+ [(A1-A0)(B0-B1)+ A0B0+A1B1] 2ν+ A0B0D1D1D0D0D2Karatsuba Algorithm of MultiplicationTree of Recursive Callsn = 2α2α-12α-12α-12α-22α-22α-22α-22α-22α-22α-22α-22α-2. . . . . . . . . . . . . . . . . . . . . . . . .21112111211121 11. . . . .15Karatsuba Algorithm of MultiplicationRun Time Assuming Purely Sequential Execution of InstructionstMUL(N) = tM(1 + + (10 - ))KaratsubatAtMlNl - word length in bytesN - operand length in bytestM- time of a single word multiplicationtA- time of a single word additionKaratsubatMUL= θ (N )lg23tC- time of stack operations in every recurrent call of the functionNllg2312tCtM8lg23 - 1Schönhage-Strassen Algorithm of MultiplicationF – Discrete Fourier Transform in Finite Field GF(p)C = A B = F-1(F(A) F(B)) .A = (0, …, 0, An-1, …, A0)F(A) = (α2n-1, . . . , α0)B = (0, …, 0, Bn-1, …, B0)F: βi = Bkω2nikk=0n-1F(A) = (β2n-1, . . . , β0).(αααα2n-1 β2n-1, . . . , αααα0 β0) = F(A) F(B) = F(C) = (γ2n-1, . . . , γ0) .C = (C2n-1, …, C0)F: αi = Akω2nikk=0n-1F: Ci= γkω2n-ikk=02n-11nSchönhage-Strassen Algorithm of MultiplicationRun Time Assuming Purely Sequential Execution of InstructionsSchönhage-StrassentMUL= θ (N lg2N)Optimization for SquaringC = A2= F-1(F(A) F(A)).Schönhage-StrassentMULSchönhage-StrassentSQR=2316Comparison of Software Multiplication AlgorithmsNameComplexityLimitationsOptimizationsfor SquaringPaper-and-pencil(classical)Karatsuba(Karatsuba-Ofman)Shönhage-Strassenθ(n2)θ(n )θ(n ln n)log23nonen=2kn of the special
View Full Document