DOC PREVIEW
UTK CS 594 - Floating Point Arithmetic and Dense Linear Algebra

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Floating Point Arithmetic and Dense Linear Algebra Lecture 12 CS 594 Jack Dongarra 2 Question:  Suppose we want to compute using four decimal arithmetic: S = 1.000 + 1.000x104 – 1.000x104 What’s the answer?  Ariane 5 rocket June 1996 exploded when a 64 bit fl pt number relating to the horizontal velocity of the rocket was converted to a 16 bit signed integer. The number was larger than 32,767, and thus the conversion failed. $500M rocket and cargo2 3 Two sources of numerical error 1) Round off error 2) Truncation error 4 Round off Error  Caused by representing a number approximately3 5 Problems created by round off error  28 Americans were killed on February 25, 1991 by an Iraqi Scud missile in Dhahran, Saudi Arabia.  The patriot defense system failed to track and intercept the Scud. Why? 6 Problem with Patriot missile  Clock cycle of 1/10 seconds was represented in 24-bit fixed point register created an error of 9.5 x 10-8 seconds.  The battery was on for 100 consecutive hours, thus causing an inaccuracy of .1 (decimal) = 00111101110011001100110011001101 (binary)4 7 Problem (cont.)  The shift calculated in the ranging system of the missile was 687 meters.  The target was considered to be out of range at a distance greater than 137 meters. 8 Truncation error  Error caused by truncating or approximating a mathematical procedure.5 9 Example of Truncation Error Taking only a few terms of a Maclaurin series to approximate If only 3 terms are used, 10 Defining Floating Point Arithmetic  Representable numbers Scientific notation: +/- d.d…d x rexp sign bit +/- radix r (usually 2 or 10, sometimes 16) significand d.d…d (how many base-r digits d?) exponent exp (range?) others?  Operations: arithmetic: +,-,x,/,... » how to round result to fit in format comparison (<, =, >) conversion between different formats » short to long FP numbers, FP to integer exception handling » what to do for 0/0, 2*largest_number, etc. binary/decimal conversion » for I/O, when radix not 106 11 IEEE Floating Point Arithmetic Standard 754 (1985) - Normalized Numbers  Normalized Nonzero Representable Numbers: +- 1.d…d x 2exp  Macheps = Machine epsilon = 2-#significand bits = relative error in each operation smallest number  such that fl( 1 +  ) > 1  OV = overflow threshold = largest number  UN = underflow threshold = smallest number  +- Zero: +-, significand and exponent all zero Why bother with -0 later Format # bits #significand bits macheps #exponent bits exponent range ---------- -------- ----------------------- ------------ -------------------- ---------------------- Single 32 23+1 2-24 (~10-7) 8 2-126 - 2127 (~10+-38) Double 64 52+1 2-53 (~10-16) 11 2-1022 - 21023 (~10+-308) Double >=80 >=64 <=2-64(~10-19) >=15 2-16382 - 216383 (~10+-4932) Extended (80 bits on all Intel machines) 12 IEEE Floating Point Arithmetic Standard 754 - “Denorms”  Denormalized Numbers: +-0.d…d x 2min_exp sign bit, nonzero significand, minimum exponent Fills in gap between UN and 0  Underflow Exception occurs when exact nonzero result is less than underflow threshold UN Ex: UN/3 return a denorm, or zero7 13 IEEE Floating Point Arithmetic Standard 754 - +- Infinity +- Infinity: Sign bit, zero significand, maximum exponent Overflow Exception occurs when exact finite result too large to represent accurately Ex: 2*OV return +- infinity Divide by zero Exception return +- infinity = 1/+-0 sign of zero important! Also return +- infinity for 3+infinity, 2*infinity, infinity*infinity Result is exact, not an exception! 14 IEEE Floating Point Arithmetic Standard 754 - NAN (Not A Number) NAN: Sign bit, nonzero significand, maximum exponent Invalid Exception occurs when exact result not a well-defined real number 0/0 sqrt(-1) infinity-infinity, infinity/infinity, 0*infinity NAN + 3 NAN > 3? Return a NAN in all these cases Two kinds of NANs Quiet - propagates without raising an exception Signaling - generate an exception when touched » good for detecting uninitialized data8 15 Error Analysis Basic error formula fl(a op b) = (a op b)*(1 + d) where » op one of +,-,*,/ » |d|  macheps » assuming no overflow, underflow, or divide by zero Example: adding 4 numbers fl(x1+x2+x3+x4) = {[(x1+x2)*(1+d1) + x3]*(1+d2) + x4}*(1+d3) = x1*(1+d1)*(1+d2)*(1+d3) + x2*(1+d1)*(1+d2)*(1+d3) + x3*(1+d2)*(1+d3) + x4*(1+d3) = x1*(1+e1) + x2*(1+e2) + x3*(1+e3) + x4*(1+e4) where each |ei| <~ 3*macheps get exact sum of slightly changed summands xi*(1+ei) Backward Error Analysis - algorithm called numerically stable if it gives the exact result for slightly changed inputs Numerical Stability is an algorithm design goal 16 Backward error  Approximate solution is exact solution to modified problem.  How large a modification to original problem is required to give result actually obtained?  How much data error in initial input would be required to explain all the error in computed results?  Approximate solution is good if it is exact solution to “nearby” problem. f f(x) x Forward error Backward error f’(x) f’ f x’9 17 Sensitivity and Conditioning  Problem is insensitive or well conditioned if relative change in input causes commensurate relative change in solution.  Problem is sensitive or ill-conditioned, if relative change in solution can be much larger than that in input data. Cond = |Relative change in solution| / |Relative change in input data| = |[f(x’) – f(x)]/f(x)| / |(x’ – x)/x|  Problem is sensitive, or ill-conditioned, if cond >> 1.  When function f is evaluated for approximate input x’ = x+h instead of true input value of x.  Absolute error = f(x + h) – f(x) h f’(x) 


View Full Document

UTK CS 594 - Floating Point Arithmetic and Dense Linear Algebra

Documents in this Course
Load more
Download Floating Point Arithmetic and Dense Linear Algebra
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Floating Point Arithmetic and Dense Linear Algebra and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Floating Point Arithmetic and Dense Linear Algebra 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?