DOC PREVIEW
Berkeley COMPSCI C267 - Floating Point Arithmetic

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS267 L13 Floating Point.1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 13: Floating Point ArithmeticJames Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L13 Floating Point.2Demmel Sp 1999Outline° A little history° IEEE floating point formats° Error analysis° Exception handling• Using exception handling to go faster° How to get extra precision cheaply° Cray arithmetic - a pathological example° Dangers of Parallel and Heterogeneous Computing2CS267 L13 Floating Point.3Demmel Sp 1999A little history° Von Neumann and Goldstine - 1947• “Can’t expect to solve most big [n>15] linear systems without carrying many decimaldigits [d>8], otherwise the computed answer would be completely inaccurate.” -WRONG!° Turing - 1949• “Carrying d digits is equivalent to changing the input data in the d-th place and thensolving Ax=b. So if A is only known to d digits, the answer is as accurate as the datadeserves.”• Backward Error Analysis° Rediscovered in 1961 by Wilkinson and publicized° Starting in the 1960s- many papers doing backward error analysis ofvarious algorithms° Many years where each machine did FP arithmetic slightly differently• Both rounding and exception handling differed• Hard to write portable and reliable software• Motivated search for industry-wide standard, beginning late 1970s• First implementation: Intel 8087° ACM Turing Award 1989 to W. Kahan for design of the IEEE FloatingPoint Standards 754 (binary) and 854 (decimal)• Nearly universally implemented in general purpose machinesCS267 L13 Floating Point.4Demmel Sp 1999Defining Floating Point Arithmetic° Representable numbers• Scientific notation: +/- d.d…d x rexp• sign bit +/-• radix r (usually 2 or 10, sometimes 16)• significand d.d…d (how many base-r digits d?)• exponent exp (range?)• others?° Operations:• arithmetic: +,-,x,/,...- how to round result to fit in format• comparison (<, =, >)• conversion between different formats- short to long FP numbers, FP to integer• exception handling- what to do for 0/0, 2*largest_number, etc.• binary/decimal conversion- for I/O, when radix not 10° Language/library support for these operations3CS267 L13 Floating Point.5Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - Normalized Numbers° Normalized Nonzero Representable Numbers: +- 1.d…d x 2exp• Macheps = Machine epsilon = 2-#significand bits = relative error in each operation• OV = overflow threshold = largest number• UN = underflow threshold = smallest number° +- Zero: +-, significand and exponent all zero• Why bother with -0 laterFormat # bits #significand bits macheps #exponent bits exponent range---------- -------- ----------------------- ------------ -------------------- ----------------------Single 32 23+1 2-24 (~10-7) 8 2-126 - 2127 (~10+-38)Double 64 52+1 2-53 (~10-16) 11 2-1022 - 21023 (~10+-308)Double >=80 >=64 <=2-64(~10-19) >=15 2-16382 - 216383 (~10+-4932) Extended (80 bits on all Intel machines)CS267 L13 Floating Point.6Demmel Sp 1999Rules for performing arithmetic° As simple as possible:• Take the exact value, and round it to the nearest floating pointnumber (correct rounding)• Break ties by rounding to nearest floating point number whosebottom bit is zero (rounding to nearest even)• Other rounding options too (up, down, towards 0)° Don’t need exact value to do this!• Early implementors worried it might be too expensive, but it isn’t° Applies to• +,-,*,/• sqrt• conversion between formats• rem(a,b) = remainder of a after dividing by b- a = q*b + rem, q = floor(a/b)- cos(x) = cos(rem(x,2*pi)) for |x| >= 2*pi- cos(x) is exactly periodic, with period rounded(2*pi)4CS267 L13 Floating Point.7Demmel Sp 1999Error Analysis° Basic error formula• fl(a op b) = (a op b)*(1 + d) where- op one of +,-,*,/- |d| <= macheps- assuming no overflow, underflow, or divide by zero° Example: adding 4 numbers• fl(x1+x2+x3+x4) = {[(x1+x2)*(1+d1) + x3]*(1+d2) + x4}*(1+d3) = x1*(1+d1)*(1+d2)*(1+d3) + x2*(1+d1)*(1+d2)*(1+d3) + x3*(1+d2)*(1+d3) + x4*(1+d3) = x1*(1+e1) + x2*(1+e2) + x3*(1+e3) + x4*(1+e4) where each |ei| <~ 3*macheps• get exact sum of slightly changed summands xi*(1+ei)• Backward Error Analysis - algorithm called numerically stable if itgives the exact result for slightly changed inputs• Numerical Stability is an algorithm design goalCS267 L13 Floating Point.8Demmel Sp 1999Example: polynomial evaluation using Horner’s rule° Horner’s rule to evaluate p = Σ ck * xk• p = cn, for k=n-1 downto 0, p = x*p + ck° Numerically Stable° Apply to (x-2)9 = x9 - 18*x8 + … - 512k=0n5CS267 L13 Floating Point.9Demmel Sp 1999Example: polynomial evaluation (continued)° (x-2)9 = x9 - 18*x8 + … - 512° We can compute error bounds using• fl(a op b)=(a op b)*(1+d)CS267 L13 Floating Point.10Demmel Sp 1999What happens when the “exact value” isnot a real number, or is too small or toolarge to represent accurately?You get an “exception”6CS267 L13 Floating Point.11Demmel Sp 1999Exception Handling° What happens when the “exact value” is not a realnumber, or too small or too large to representaccurately?° 5 Exceptions:• Overflow - exact result > OV, too large to represent• Underflow - exact result nonzero and < UN, too small to represent• Divide-by-zero - nonzero/0• Invalid - 0/0, sqrt(-1), …• Inexact - you made a rounding error (very common!)° Possible responses• Stop with error message (unfriendly, not default)• Keep computing (default, but how?)CS267 L13 Floating Point.12Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - “Denorms”° Denormalized Numbers: +-0.d…d x 2min_exp• sign bit, nonzero significand, minimum exponent• Fills in gap between UN and 0° Underflow Exception• occurs when exact nonzero result is less than underflow threshold UN• Ex: UN/3• return a denorm, or zero° Why bother?• Necessary so that following code never divides by zero if (a != b) then x = a/(a-b)7CS267 L13 Floating Point.13Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - +- Infinity° +- Infinity: Sign


View Full Document

Berkeley COMPSCI C267 - Floating Point Arithmetic

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Floating Point Arithmetic
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Floating Point Arithmetic and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Floating Point Arithmetic 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?