Berkeley COMPSCI C267 - Floating Point Arithmetic - D629728

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Floating Point Arithmetic

DOC PREVIEW

Berkeley COMPSCI C267 - Floating Point Arithmetic

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CS267 L13 Floating Point.1Demmel Sp 1999CS 267 Applications of Parallel ComputersLecture 13: Floating Point ArithmeticJames Demmelhttp://www.cs.berkeley.edu/~demmel/cs267_Spr99CS267 L13 Floating Point.2Demmel Sp 1999Outline° A little history° IEEE floating point formats° Error analysis° Exception handling• Using exception handling to go faster° How to get extra precision cheaply° Cray arithmetic - a pathological example° Dangers of Parallel and Heterogeneous Computing2CS267 L13 Floating Point.3Demmel Sp 1999A little history° Von Neumann and Goldstine - 1947• “Can’t expect to solve most big [n>15] linear systems without carrying many decimaldigits [d>8], otherwise the computed answer would be completely inaccurate.” -WRONG!° Turing - 1949• “Carrying d digits is equivalent to changing the input data in the d-th place and thensolving Ax=b. So if A is only known to d digits, the answer is as accurate as the datadeserves.”• Backward Error Analysis° Rediscovered in 1961 by Wilkinson and publicized° Starting in the 1960s- many papers doing backward error analysis ofvarious algorithms° Many years where each machine did FP arithmetic slightly differently• Both rounding and exception handling differed• Hard to write portable and reliable software• Motivated search for industry-wide standard, beginning late 1970s• First implementation: Intel 8087° ACM Turing Award 1989 to W. Kahan for design of the IEEE FloatingPoint Standards 754 (binary) and 854 (decimal)• Nearly universally implemented in general purpose machinesCS267 L13 Floating Point.4Demmel Sp 1999Defining Floating Point Arithmetic° Representable numbers• Scientific notation: +/- d.d…d x rexp• sign bit +/-• radix r (usually 2 or 10, sometimes 16)• significand d.d…d (how many base-r digits d?)• exponent exp (range?)• others?° Operations:• arithmetic: +,-,x,/,...- how to round result to fit in format• comparison (<, =, >)• conversion between different formats- short to long FP numbers, FP to integer• exception handling- what to do for 0/0, 2*largest_number, etc.• binary/decimal conversion- for I/O, when radix not 10° Language/library support for these operations3CS267 L13 Floating Point.5Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - Normalized Numbers° Normalized Nonzero Representable Numbers: +- 1.d…d x 2exp• Macheps = Machine epsilon = 2-#significand bits = relative error in each operation• OV = overflow threshold = largest number• UN = underflow threshold = smallest number° +- Zero: +-, significand and exponent all zero• Why bother with -0 laterFormat # bits #significand bits macheps #exponent bits exponent range---------- -------- ----------------------- ------------ -------------------- ----------------------Single 32 23+1 2-24 (~10-7) 8 2-126 - 2127 (~10+-38)Double 64 52+1 2-53 (~10-16) 11 2-1022 - 21023 (~10+-308)Double >=80 >=64 <=2-64(~10-19) >=15 2-16382 - 216383 (~10+-4932) Extended (80 bits on all Intel machines)CS267 L13 Floating Point.6Demmel Sp 1999Rules for performing arithmetic° As simple as possible:• Take the exact value, and round it to the nearest floating pointnumber (correct rounding)• Break ties by rounding to nearest floating point number whosebottom bit is zero (rounding to nearest even)• Other rounding options too (up, down, towards 0)° Don’t need exact value to do this!• Early implementors worried it might be too expensive, but it isn’t° Applies to• +,-,*,/• sqrt• conversion between formats• rem(a,b) = remainder of a after dividing by b- a = q*b + rem, q = floor(a/b)- cos(x) = cos(rem(x,2*pi)) for |x| >= 2*pi- cos(x) is exactly periodic, with period rounded(2*pi)4CS267 L13 Floating Point.7Demmel Sp 1999Error Analysis° Basic error formula• fl(a op b) = (a op b)*(1 + d) where- op one of +,-,*,/- |d| <= macheps- assuming no overflow, underflow, or divide by zero° Example: adding 4 numbers• fl(x1+x2+x3+x4) = {[(x1+x2)*(1+d1) + x3]*(1+d2) + x4}*(1+d3) = x1*(1+d1)*(1+d2)*(1+d3) + x2*(1+d1)*(1+d2)*(1+d3) + x3*(1+d2)*(1+d3) + x4*(1+d3) = x1*(1+e1) + x2*(1+e2) + x3*(1+e3) + x4*(1+e4) where each |ei| <~ 3*macheps• get exact sum of slightly changed summands xi*(1+ei)• Backward Error Analysis - algorithm called numerically stable if itgives the exact result for slightly changed inputs• Numerical Stability is an algorithm design goalCS267 L13 Floating Point.8Demmel Sp 1999Example: polynomial evaluation using Horner’s rule° Horner’s rule to evaluate p = Σ ck * xk• p = cn, for k=n-1 downto 0, p = x*p + ck° Numerically Stable° Apply to (x-2)9 = x9 - 18*x8 + … - 512k=0n5CS267 L13 Floating Point.9Demmel Sp 1999Example: polynomial evaluation (continued)° (x-2)9 = x9 - 18*x8 + … - 512° We can compute error bounds using• fl(a op b)=(a op b)*(1+d)CS267 L13 Floating Point.10Demmel Sp 1999What happens when the “exact value” isnot a real number, or is too small or toolarge to represent accurately?You get an “exception”6CS267 L13 Floating Point.11Demmel Sp 1999Exception Handling° What happens when the “exact value” is not a realnumber, or too small or too large to representaccurately?° 5 Exceptions:• Overflow - exact result > OV, too large to represent• Underflow - exact result nonzero and < UN, too small to represent• Divide-by-zero - nonzero/0• Invalid - 0/0, sqrt(-1), …• Inexact - you made a rounding error (very common!)° Possible responses• Stop with error message (unfriendly, not default)• Keep computing (default, but how?)CS267 L13 Floating Point.12Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - “Denorms”° Denormalized Numbers: +-0.d…d x 2min_exp• sign bit, nonzero significand, minimum exponent• Fills in gap between UN and 0° Underflow Exception• occurs when exact nonzero result is less than underflow threshold UN• Ex: UN/3• return a denorm, or zero° Why bother?• Necessary so that following code never divides by zero if (a != b) then x = a/(a-b)7CS267 L13 Floating Point.13Demmel Sp 1999IEEE Floating Point Arithmetic Standard 754 - +- Infinity° +- Infinity: Sign

View Full Document

Berkeley COMPSCI C267 - Floating Point Arithmetic

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

Berkeley COMPSCI C267 - Floating Point Arithmetic

Sign up for free to view:

Please select your school