CS61B Lecture #12Today:• Floating point arithmetic (just a taste).Readings for Today:Programming Into Java,§5.3,§5.4.Readings for next Topic:Programming Into Java,§4.1Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 1Floating point• Problem: Need to represent non-integral quantities.– Regardless of magnitudes, however, precise rep-resentation can require arbitrarily large amountsof space.– Unclear what to do about irrational numbers.– Must have arithmetic that is fast.• Solution: Don’t solve the problem of representing allreal numbers.– Instead, use fact that most scientific computa-tion only needs results that are “good enough,”– That is, that carry at least as many significantdigits as the data justify.– Settle for a subset of the rationals representablein a fixed number of bits (typical: 32, 64, 80,sometimes 128).• Many variations on this theme used in the past.• Majority of machines now supportIEEE binary floating-point arithmetic,. . .• . . . which (largely) standardizes on numbers repre-sented and approximations used.Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 2Basic Idea: Representation• Java’s two floating-point types each represent justthe numbers±m × 2e0 ≤ m < 2n, for some fixed n and e in some fixedrange of integers,• and also ±∞ and NaN (Not a Number).• n is the number of significant bits (≈ 3.3× numberof significant base-10 digits).• For double, n = 53 (> 15 significant digits), and−1074 ≤ e < 972 (roughly, positive values in therange 5.0 × 10−324to 1.8 × 10308).• For float, n = 24 (> 6 significant digits), and −150 ≤e < 105, (roughly, positive values in the range1.4 × 10−45to 3.4 × 1038).• ±∞ used for values whose magnitude is larger thanthis range, including 1.0/0.• NaN used for undefined results (e.g., ∞ − ∞, 0/0, orany operation involving NaN.• -0 compares == to 0, but 1/(-0) gives −∞. Why?Sorry; you’ll have to study complex functions.Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 3Basic Idea: Arithmetic• Basic rule is simple: to compute x ⊕ y, (where ⊕ is+, -, *, /) compute the true mathematical result, andthenroundto nearest representable floating-pointnumber.• Same for literals: is no exact binary equivalent of0.1, so round to nearest.•Unbiased rounding:Sometimes, result can beex-actlyhalfway between two representable values: chooseone with last bit 0.• (Unfortunately, whole truth is a little more complex:Java implementations allowed to usemoreprecisionto compute intermediate results.)Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 4The Floating Number Line• Floating-point arithmetic gives same number of sig-nificant bits at all values, except near 0.• Implies that they “stretch out” the further from 0you go.• So, if we had only 3 significant bits, and the smallestexponent of 2 were 2−6, we’d see this:· · · · · ·0 1/8 1/4 1/2 1Note: corrected from book.• In particular, we’d get all the 3-bit (n-bit) integers:· · · · · ·1 2 3 4 5 6 7 8 10• After which, we start skipping.• Random terminology: The first 23−1 numbers after0 in the first diagram (2n− 1 in general) are calleddenormalized.• Unique in having same spacing as the next 23. Insome other versions of floating point, they are miss-ing.Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 5Input of Floating-Point• To convert from string, S, to the floating-point num-ber it denotes, use Double.parseDouble(S) orFloat.parseFloat(S)• To read, can use StreamTokenizer (see project), or(around here) the UCB I/O package, which is like C’sscanf:stdin.scan ("%le"); // Means "read a double"x = stdin.doubleItem (0);Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 6Output of Floating Point• When printing (or converting to String, as in "X=" + X),Java by default prints “enough” significant digitsand decides on whether to use scientific notationbased on size.• You can control this all explicitly:• In the standard Java library, can use DecimalFormattype.• With the UCB I/O library can use format (which im-itates the C printf functions),• E.g., If x is 3.1415926, thenDecimalFormat form = new DecimalFormat (pattern1);...System.out.print (form.format(x));stdout.format (pattern2).put (x);producesPattern1Pattern2Print"0.000" "%5.3f" 3.142"00.000" "%06.3f" 03.142"#0.000" "%6.3f" 3.142"#.000" 3.142"0.0E00" "%6.1e" 3.1E00".00E00" .31E01Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12 7A Few Points to Remember• IEEE gives reasonably well-behaved approximations,but theyareapproximations.• In particular, many decimal fractions not exactlyrepresentable in binary (0.1, e.g.)• Likewise, repeated addition not the same as multi-plication. N adds accumulate N rounding errors; onemultiply accumulates only 1 rounding error. Example:for (double x = 0.0; x < 1.0; x += 0.1) { ... }Loop runs 11 times, not 10.• When you add large quantities to small, you lose sig-nificant digits from the right of the small quantity.Thus, computations like X+Y-Z, where X and Z arenearly equal and much larger than Y won’t be terri-bly accurate.Last modified: Fri Sep 28 11:11:08 2001 CS61B: Lecture #12
View Full Document