inst eecs berkeley edu cs61c su05 CS61C Machine Structures Lecture 10 Floating Point 2005 07 06 Andy Carle CS 61C L10 Floating Point 1 A Carle Summer 2005 UCB Quote of the day 95 of the folks out there are completely clueless about floating point James Gosling Sun Fellow Java Inventor 1998 02 28 CS 61C L10 Floating Point 2 A Carle Summer 2005 UCB Review of Numbers Computers are made to deal with numbers What can we represent in N bits Unsigned integers 0to 2N 1 Signed Integers Two s Complement 2 N 1 to 2 N 1 1 CS 61C L10 Floating Point 3 A Carle Summer 2005 UCB Other Numbers What about other numbers Very large numbers seconds century 3 155 760 00010 3 1557610 x 109 Very small numbers atomic diameter 0 0000000110 1 010 x 10 8 Rationals repeating pattern 2 3 0 666666666 Irrationals 21 2 1 414213562373 Transcendentals e 2 718 3 141 All represented in scientific notation CS 61C L10 Floating Point 4 A Carle Summer 2005 UCB Scientific Notation in Decimal mantissa 6 0210 x 1023 decimal point exponent radix base Normalized form no leadings 0s exactly one digit to left of decimal point Alternatives to representing 1 1 000 000 000 Normalized 1 0 x 10 9 Not normalized 0 1 x 10 8 10 0 x 10 10 CS 61C L10 Floating Point 5 A Carle Summer 2005 UCB Scientific Notation in Binary mantissa 1 0two x 2 1 binary point exponent radix base Normalized mantissa always has exactly one 1 before the point Computer arithmetic that supports it called floating point because it represents numbers where binary point is not fixed as it is for integers Declare such variable in C as float CS 61C L10 Floating Point 6 A Carle Summer 2005 UCB Floating Point Representation 1 2 Normal format 1 xxxxxxxxxxtwo 2yyyytwo Multiple of Word Size 32 bits 31 30 23 22 S Exponent 1 bit 8 bits Significand 23 bits 0 S represents Sign Exponent represents y s Significand represents x s Represent numbers as small as 38 38 2 0 x 10 to as large as 2 0 x 10 A Carle Summer 2005 UCB CS 61C L10 Floating Point 7 Floating Point Representation 2 2 What if result too large 2 0x1038 Overflow Overflow Exponent larger than represented in 8 bit Exponent field What if result too small 0 2 0x10 38 Underflow Underflow Negative exponent larger than represented in 8 bit Exponent field How to reduce chances of overflow or underflow CS 61C L10 Floating Point 8 A Carle Summer 2005 UCB Double Precision Fl Pt Representation Next Multiple of Word Size 64 bits 31 30 20 19 S Exponent Significand 1 bit 11 bits 20 bits Significand cont d 32 bits 0 Double Precision vs Single Precision C variable declared as double Represent numbers almost as small as 2 0 x 10 308 to almost as large as 2 0 x 10308 But primary advantage is greater accuracy due to larger significand CS 61C L10 Floating Point 9 A Carle Summer 2005 UCB QUAD Precision Fl Pt Representation Next Multiple of Word Size 128 bits Unbelievable range of numbers Unbelievable precision accuracy This is currently being worked on The version in progress has 15 bits for the exponent and 112 bits for the significand CS 61C L10 Floating Point 10 A Carle Summer 2005 UCB IEEE 754 Floating Point Standard 1 4 Single Precision DP similar Sign bit 1 means negative 0 means positive Significand To pack more bits leading 1 implicit for normalized numbers 1 23 bits single 1 52 bits double Note 0 has no leading 1 so reserve exponent value 0 just for number 0 CS 61C L10 Floating Point 11 A Carle Summer 2005 UCB IEEE 754 Floating Point Standard 2 4 wanted FP numbers to be used Kahan even if no FP hardware e g sort records with FP numbers using integer compares Could break FP number into 3 parts compare signs then compare exponents then compare significands Wanted it to be faster single compare if possible especially if positive numbers Then want order Highest order bit is sign negative positive Exponent next so big exponent bigger Significand last exponents same bigger CS 61C L10 Floating Point 12 A Carle Summer 2005 UCB IEEE 754 Floating Point Standard 3 4 Negative Exponent 2 s comp 1 0 x 2 1 v 1 0 x2 1 1 2 v 2 1 2 0 1111 1111 000 0000 0000 0000 0000 0000 2 0 0000 0001 000 0000 0000 0000 0000 0000 This notation using integer compare of 1 2 v 2 makes 1 2 2 Instead pick notation 0000 0001 is most negative and 1111 1111 is most positive 1 0 x 2 1 v 1 0 x2 1 1 2 v 2 1 2 0 0111 1110 000 0000 0000 0000 0000 0000 2 0 1000 0000 000 0000 0000 0000 0000 0000 CS 61C L10 Floating Point 13 A Carle Summer 2005 UCB IEEE 754 Floating Point Standard 4 4 Called Biased Notation where bias is number subtracted to get real number IEEE 754 uses bias of 127 for single prec Subtract 127 from Exponent field to get actual value for exponent Summary single precision 31 30 23 22 S Exponent 1 bit 8 bits 0 Significand 23 bits 1 S x 1 Significand x 2 Exponent 127 Double precision identical except with exponent bias of 1023 CS 61C L10 Floating Point 14 A Carle Summer 2005 UCB 0 0111 1101 0000 0000 0000 0000 0000 000 Is this floating point number 0 0 0 CS 61C L10 Floating Point 15 A Carle Summer 2005 UCB Understanding the Significand 1 2 Method 1 Fractions In decimal 0 34010 34010 100010 3410 10010 In binary 0 1102 1102 10002 610 810 112 1002 310 410 Advantage less purely numerical more thought oriented this method usually helps people understand the meaning of the significand better CS 61C L10 Floating Point 16 A Carle Summer 2005 UCB Understanding the Significand 2 2 Method 2 Place Values Convert from scientific notation In decimal 1 6732 1x100 6x10 1 7x10 2 3x10 3 2x10 4 In binary 1 1001 1x20 1x2 1 0x2 2 0x2 3 1x2 4 Interpretation of value in each position extends beyond the decimal binary point Advantage good for quickly calculating significand value use this method for translating FP numbers CS 61C L10 Floating Point 17 A Carle Summer 2005 UCB Example Converting Binary FP to Decimal 0 0110 1000 101 0101 0100 0011 0100 0010 Sign 0 positive Exponent 0110 1000two 104ten Bias adjustment 104 127 23 Significand 1 1x2 1 0x2 2 1x2 3 0x2 4 1x2 5 1 2 1 2 3 2 5 2 7 2 9 2 14 2 15 2 17 2 22 1 0ten 0 666115ten Represents 1 666115ten 2 23 1 986 10 7 CS 61C L10 Floating Point 18 A Carle Summer 2005 UCB Peer Instruction 1 What is the decimal equivalent of this floating point number 1 1000 0001 111 0000 0000 0000 0000 0000 CS …
View Full Document
Unlocking...