Floating Point Arithmetic and Dense Linear Algebra Lecture 12 CS 594 Jack Dongarra Question Suppose we want to compute using four decimal arithmetic S 1 000 1 000x104 1 000x104 What s the answer Ariane 5 rocket June 1996 exploded when a 64 bit fl pt number relating to the horizontal velocity of the rocket was converted to a 16 bit signed integer The number was larger than 32 767 and thus the conversion failed 2 500M rocket and cargo 1 Two sources of numerical error 1 Round off error 2 Truncation error 3 Round off Error Caused by representing a number approximately 4 2 Problems created by round off error 28 Americans were killed on February 25 1991 by an Iraqi Scud missile in Dhahran Saudi Arabia The patriot defense system failed to track and intercept the Scud Why 5 Problem with Patriot missile Clock cycle of 1 10 seconds was represented in 24 bit fixed point register created an error of 9 5 x 10 8 seconds 1 decimal 00111101110011001100110011001101 binary The battery was on for 100 consecutive hours thus causing an inaccuracy of 6 3 Problem cont The shift calculated in the ranging system of the missile was 687 meters The target was considered to be out of range at a distance greater than 137 meters 7 Truncation error Error caused by truncating or approximating a mathematical procedure 8 4 Example of Truncation Error Taking only a few terms of a Maclaurin series to approximate If only 3 terms are used 9 Defining Floating Point Arithmetic Representable numbers Scientific notation d d d x rexp sign bit radix r usually 2 or 10 sometimes 16 significand d d d how many base r digits d exponent exp range others Operations arithmetic x how to round result to fit in format comparison conversion between different formats short to long FP numbers FP to integer exception handling what to do for 0 0 2 largest number etc binary decimal conversion for I O when radix not 10 10 5 IEEE Floating Point Arithmetic Standard 754 1985 Normalized Numbers Normalized Nonzero Representable Numbers 1 d d x 2exp Macheps Machine epsilon 2 significand bits relative error in each operation smallest number such that fl 1 1 OV overflow threshold largest number UN underflow threshold smallest number Format bits significand bits macheps exponent bits exponent range Single 32 23 1 2 24 10 7 8 2 126 2127 10 38 Double 64 52 1 2 53 10 16 11 2 1022 21023 10 308 Double 80 64 2 64 10 19 15 2 16382 216383 10 4932 Extended 80 bits on all Intel machines Zero significand and exponent all zero 11 Why bother with 0 later IEEE Floating Point Arithmetic Standard 754 Denorms Denormalized Numbers 0 d d x 2min exp sign bit nonzero significand minimum exponent Fills in gap between UN and 0 Underflow Exception occurs when exact nonzero result is less than underflow threshold UN Ex UN 3 return a denorm or zero 12 6 IEEE Floating Point Arithmetic Standard 754 Infinity Infinity Sign bit zero significand maximum exponent Exception Overflow occurs when exact finite result too large to represent accurately Ex 2 OV return infinity Divide by zero Exception return infinity 1 0 sign of zero important Also return infinity for 3 infinity 2 infinity infinity infinity Result is exact not an exception 13 IEEE Floating Point Arithmetic Standard 754 NAN Not A Number NAN Invalid Sign bit nonzero significand maximum exponent Exception occurs when exact result not a well defined real number 0 0 sqrt 1 infinity infinity infinity infinity 0 infinity NAN 3 NAN 3 Return a NAN in all these cases Two kinds of NANs Quiet propagates without raising an exception Signaling generate an exception when touched good for detecting uninitialized data 14 7 Error Analysis Basic error formula fl a op b a op b 1 d where op one of d macheps assuming no overflow underflow or divide by zero Example adding 4 numbers fl x1 x2 x3 x4 x1 x2 1 d1 x3 1 d2 x4 1 d3 x1 1 d1 1 d2 1 d3 x2 1 d1 1 d2 1 d3 x3 1 d2 1 d3 x4 1 d3 x1 1 e1 x2 1 e2 x3 1 e3 x4 1 e4 where each ei 3 macheps get exact sum of slightly changed summands xi 1 ei Backward Error Analysis algorithm called numerically stable if it gives the exact result for slightly changed inputs 15 Numerical Stability is an algorithm design goal Backward error Approximate solution is exact solution to modified problem How large a modification to original problem is required to give result actually obtained How much data error in initial input would be required to explain all the error in computed results Approximate solution is good if it is exact solution to nearby problem x f f x f Backward error Forward error f x x f 16 8 Sensitivity and Conditioning Problem is insensitive or well conditioned if relative change in input causes commensurate relative change in solution Problem is sensitive or ill conditioned if relative change in solution can be much larger than that in input data Cond Relative change in solution Relative change in input data f x f x f x x x x Problem is sensitive or ill conditioned if cond 1 When function f is evaluated for approximate input x x h instead of true input value of x Absolute error f x h f x h f x Relative error f x h f x f x h f x f x 17 Sensitivity 2 Examples cos 2 and 2 d System of Equations Consider problem of computing cosine function for arguments near 2 Let x 2 and let h be small perturbation to x Then Abs f x h f x h f x Rel f x h f x f x h f x f x a x1 bx2 f c x1 dx2 g absolute error cos x h cos x h sin x h relative error h tan x So small change in x near 2 causes large relative change in cos x regardless of method used cos 1 57079 0 63267949 x 10 5 cos 1 57078 1 64267949 x 10 5 Relative change in output is a quarter million times greater than relative change in input 18 9 Sensitivity 2 Examples cos 2 and 2 d System of Equations Consider problem of computing cosine function for arguments near 2 Let x 2 and let h be small perturbation to x Then a x1 bx2 f c x1 dx2 g absolute error cos x h cos x h sin x h relative error h tan x So small change in x near 2 causes large relative change in cos x regardless of method used cos 1 57079 0 63267949 x 10 5 cos 1 57078 1 64267949 x 10 5 Relative change in output is a quarter million times greater than relative change in input 19 Exception Handling What happens when the exact value is not a real number or too small or too large to represent accurately 5 …
View Full Document
Unlocking...