**Unformatted text preview:**

Accurate Floating Point ProductStef GraillatLIP6/PEQUAN - Université Pierre et Marie Curie (Paris 6)REC’08, Third International Worskhop on Reliable Engineering ComputingSavannah, Georgia, USA, February 20-22, 2008S. Graillat (Univ. Paris 6) Accurate Floating Point Product 1 / 25MotivationsDeterminant of a triangle matrixT =t11t12··· t1nt22t2n......tnn.det(T ) =nYi=1tii.Evaluation of a polynomial when represented by the root product formp(x) = anQni=1(x − xi)S. Graillat (Univ. Paris 6) Accurate Floating Point Product 2 / 25What are Error-Free Transformations (EFT) ?Assume floating point arithmetic adhering IEEE 754 with rounding tonearest with rounding unit u (no underflow nor overflow)Error free transformations are properties and algorithms to compute thegenerated elementary rounding errors,a, b entries ∈ F, a ◦ b = fl(a ◦b) + e, with e ∈ FKey tools for accurate computationfixed length expansions libraries : double-double (Briggs, Bailey, Hida,Li), quad-double (Bailey, Hida, Li)arbitrary length expansions libraries : Priest, Shewchukcompensated algorithms (Kahan, Priest, Ogita-Rump-Oishi,Graillat-Langlois-Louvet)S. Graillat (Univ. Paris 6) Accurate Floating Point Product 3 / 25EFT for the summationx = fl(a ± b) ⇒ a ± b = x + y with y ∈ F,Algorithms of Dekker (1971) and Knuth (1974)Algorithm 1 (EFT of the sum of 2 floating point numbers with|a| ≥ |b|)function [x, y] = FastTwoSum(a, b)x = fl(a + b)y = fl((a − x) + b)Algorithm 2 (EFT of the sum of 2 floating point numbers)function [x, y] = TwoSum(a, b )x = fl(a + b)z = fl(x − a)y = fl((a − (x − z)) + (b − z))S. Graillat (Univ. Paris 6) Accurate Floating Point Product 4 / 25EFT for the product (1/3)x = fl(a · b) ⇒ a · b = x + y with y ∈ F,Algorithm TwoProduct by Veltkamp and Dekker (1971)a = x + y and x and y non overlapping with |y| ≤ |x|.Algorithm 3 (Error-free split of a floating p oint number into twoparts)function [x, y] = Split(a, b )factor = fl(2s+ 1) % u = 2−p, s = dp/2ec = fl(factor · a)x = fl(c −(c −a))y = fl(a − x)S. Graillat (Univ. Paris 6) Accurate Floating Point Product 5 / 25EFT for the product (2/3)Algorithm 4 (EFT of the product of 2 floating point numbers)function [x, y] = TwoProduct(a, b)x = fl(a · b)[a1, a2] = Split(a)[b1, b2] = Split(b)y = fl(a2· b2− (((x − a1· b1) − a2· b1) − a1· b2))S. Graillat (Univ. Paris 6) Accurate Floating Point Product 6 / 25EFT for the product (3/3)Given a, b, c ∈ F,FMA(a, b, c) is the nearest floating point number a ·b + c ∈ FAlgorithm 5 (EFT of the product of 2 floating point numbers)function [x, y] = TwoProductFMA(a, b)x = fl(a · b)y = FMA(a, b, −x)The FMA is available for example on PowerPC, Itanium, Cell processors.S. Graillat (Univ. Paris 6) Accurate Floating Point Product 7 / 25SummaryTheorem 1Let a, b ∈ F and let x, y ∈ F such that [x, y] = TwoSum(a, b). T hen,a + b = x + y, x = fl(a + b), |y| ≤ u|x|, |y| ≤ u|a + b|.The algorithm TwoSum requires 6 flops.Let a, b ∈ F and let x, y ∈ F such that [x, y] = TwoProduct(a, b) . Then,a · b = x + y, x = fl(a · b), |y| ≤ u|x|, |y| ≤ u|a ·b|,The algorithm TwoProduct requires 17 flops.S. Graillat (Univ. Paris 6) Accurate Floating Point Product 8 / 25Classic method for computing productThe classic method for evaluating a pro duct of n numbersa = (a1, a2, . . . , an)p =nYi=1aiis the following algorithm.Algorithm 6 (Product evaluation)function res = Prod(a )p1= a1for i = 2 : npi= fl(pi−1· ai) % rounding error πiendres = pnThis algorithm requires n − 1 flopsS. Graillat (Univ. Paris 6) Accurate Floating Point Product 9 / 25Error analysisγn:=nu1 − nufor n ∈ N.A forward error bound is|a1a2···an− res| ≤ γn−1|a1a2···an|A validated error bound is|a1a2···an− res| ≤ flγn−1|res|1 − 2uS. Graillat (Univ. Paris 6) Accurate Floating Point Product 10 / 25Compensated method for computing productAlgorithm 7 (Product evaluation with a compensated scheme)function res = CompProd(a)p1= a1e1= 0for i = 2 : n[pi, πi] = TwoProduct(pi−1, ai)ei= fl(ei−1ai+ πi)endres = fl(pn+ en)This algorithm requires 19n − 18 flopsS. Graillat (Univ. Paris 6) Accurate Floating Point Product 11 / 25Compensated method for computing productAlgorithm 8 (Product evaluation with a compensated scheme withTwoProductFMA and FMA)function res = CompProdFMA(a)p1= a1e1= 0for i = 2 : n[pi, πi] = TwoProductFMA(pi−1, ai)ei= FMA(ei−1, ai, πi)endres = fl(pn+ en)This algorithm requires 3n − 2 flopsS. Graillat (Univ. Paris 6) Accurate Floating Point Product 12 / 25Error analysisTheorem 2Suppose Algorithm CompProd is applied to fl oating point number ai∈ F,1 ≤ i ≤ n, and set p =Qni=1ai. Then,|res − p| ≤ u|p| + γnγ2n|p|Condition number of the product evaluation :cond(a) = limε→0sup|(a1+ ∆a1) ···(an+ ∆an) − a1···an|ε|a1a2···an|: |∆ai| ≤ ε|ai|A standard computation yieldscond(a) = nS. Graillat (Univ. Paris 6) Accurate Floating Point Product 13 / 25Numerical experimentsLaptop with a Pentium M processor at 1.73GHz with gcc version 4.0.2.Tab.: Measured computing times with Prod normalised to 1.0n Prod CompProd100 1.0 3.5500 1.0 4.41000 1.0 5.010000 1.0 4.9100000 1.0 5.5The theoretical ratio for CompProd is 19.Compensated algorithms are generally faster than the theoreticalperformances→ due to a better instruction-level parallelism.S. Graillat (Univ. Paris 6) Accurate Floating Point Product 14 / 25Validated error boundLemma 1Suppose Algorithm CompProd is applied to fl oating point numbers ai∈ F,1 ≤ i ≤ n and set p =Qni=1ai. Then, the absolute forward error affectingthe product is bounded according to|res − p| ≤ flu|res| +γnγ2n|a1a2···an|1 − (n + 3)u/ (1 −2u).S. Graillat (Univ. Paris 6) Accurate Floating Point Product 15 / 25Faithful rounding (1/3)Floating point predecessor and successor of a real number r satisfyingmin{f : f ∈ R} < r < max{f : f ∈ F} :pred(r) := max{f ∈ F : f < r } and succ(r) := min{f ∈ F : r < f }.Definition 1A floating point number f ∈ F is called a faithful rounding of a realnumber r ∈ R ifpred(f ) < r < succ(f ).We denote this by f ∈ (r). For r ∈ F, this implies that f = r .Faithful rounding means that the computed result is equal to the exactresult if the latter is a floating point number and otherwise is one of thetwo adjacent floating point numbers of the exact result.S. Graillat (Univ. Paris 6) Accurate Floating Point