15 213 Floating Point Arithmetic Sept 5 2007 Topics class03 ppt IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties 15 213 F 07 Floating Point Puzzles For each of the following C expressions either z Argue that it is true for all argument values z Explain why not true x int float x int x float f double d x int double x f float double f d float d f f Assume neither d nor f is NaN 2 3 2 3 0 d 0 0 d 2 0 0 d f f d d d 0 0 d f d f 2 15 213 Intro to Computer Systems Fall 2007 IEEE Floating Point IEEE Standard 754 Established in 1985 as uniform standard for floating point arithmetic z Before that many idiosyncratic formats Supported by all major CPUs Driven by Numerical Concerns Nice standards for rounding overflow underflow Hard to make go fast z Numerical analysts predominated over hardware types in defining standard 3 15 213 Intro to Computer Systems Fall 2007 Fractional Binary Numbers 2i 2i 1 4 2 1 bi bi 1 b2 b1 b0 b 1 b 2 b 3 1 2 1 4 1 8 b j 2 j Representation Bits to right of binary point represent fractional powers of 2 Represents rational number i bk 2 k k j 4 15 213 Intro to Computer Systems Fall 2007 Frac Binary Number Examples Value 5 3 4 2 7 8 63 64 Representation 101 112 10 1112 0 1111112 Observations Divide by 2 by shifting right Multiply by 2 by shifting left Numbers of form 0 111111 2 just below 1 0 z 1 2 1 4 1 8 1 2i 1 0 z Use notation 1 0 5 15 213 Intro to Computer Systems Fall 2007 Representable Numbers Limitation Can only exactly represent numbers of the form x 2k Other numbers have repeating bit representations Value 1 3 1 5 1 10 6 Representation 0 0101010101 01 2 0 001100110011 0011 2 0 0001100110011 0011 2 15 213 Intro to Computer Systems Fall 2007 Floating Point Representation Numerical Form 1s M 2E z Sign bit s determines whether number is negative or positive z Significand M normally a fractional value in range 1 0 2 0 z Exponent E weights value by power of two Encoding s 7 exp frac MSB is sign bit exp field encodes E frac field encodes M 15 213 Intro to Computer Systems Fall 2007 Floating Point Precisions Encoding s exp frac MSB is sign bit exp field encodes E frac field encodes M Sizes Single precision 8 exp bits 23 frac bits z 32 bits total Double precision 11 exp bits 52 frac bits z 64 bits total Extended precision 15 exp bits 63 frac bits z Only found in Intel compatible machines z Stored in 80 bits 1 bit wasted 8 15 213 Intro to Computer Systems Fall 2007 Normalized Numeric Values Condition exp 000 0 and exp 111 1 Exponent coded as biased value E Exp Bias z Exp unsigned value denoted by exp z Bias Bias value Single precision 127 Exp 1 254 E 126 127 Double precision 1023 Exp 1 2046 E 1022 1023 in general Bias 2e 1 1 where e is number of exponent bits Significand coded with implied leading 1 M 1 xxx x2 z xxx x bits of frac z Minimum when 000 0 M 1 0 z Maximum when 111 1 M 2 0 z Get extra leading bit for free 9 15 213 Intro to Computer Systems Fall 2007 Normalized Encoding Example Value Float F 15213 0 1521310 111011011011012 1 11011011011012 X 213 Significand M frac 1 11011011011012 110110110110100000000002 Exponent E Bias Exp 13 127 140 100011002 Floating Point Representation Hex Binary 140 15213 10 4 6 6 D B 4 0 0 0100 0110 0110 1101 1011 0100 0000 0000 100 0110 0 1110 1101 1011 01 15 213 Intro to Computer Systems Fall 2007 Denormalized Values Condition exp 000 0 Value Exponent value E Bias 1 Significand value M 0 xxx x2 z xxx x bits of frac Cases exp 000 0 frac 000 0 z Represents value 0 z Note that have distinct values 0 and 0 exp 000 0 frac 000 0 z Numbers very close to 0 0 z Lose precision as get smaller 11 z Gradual underflow 15 213 Intro to Computer Systems Fall 2007 Special Values Condition exp 111 1 Cases exp 111 1 frac 000 0 z Represents value infinity z Operation that overflows z Both positive and negative z E g 1 0 0 0 1 0 0 0 1 0 0 0 exp 111 1 frac 000 0 z Not a Number NaN z Represents case when no numeric value can be determined z E g sqrt 1 0 12 15 213 Intro to Computer Systems Fall 2007 Summary of Floating Point Real Number Encodings NaN 13 Normalized Denorm Denorm 0 0 15 213 Intro to Computer Systems Fall 2007 Normalized NaN Tiny Floating Point Example 8 bit Floating Point Representation the sign bit is in the most significant bit the next four bits are the exponent with a bias of 7 the last three bits are the frac z Same General Form as IEEE Format normalized denormalized representation of 0 NaN infinity 7 6 s 14 0 3 2 exp frac 15 213 Intro to Computer Systems Fall 2007 Values Related to the Exponent 15 Exp exp E 2E 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 6 6 5 4 3 2 1 0 1 2 3 4 5 6 7 n a 1 64 1 64 1 32 1 16 1 8 1 4 1 2 1 2 4 8 16 32 64 128 denorms inf NaN 15 213 Intro to Computer Systems Fall 2007 Dynamic Range s exp 0 0 Denormalized 0 numbers 0 0 0 0 0 0 Normalized 0 numbers 0 0 0 0 0 16 frac E Value 0000 000 0000 001 0000 010 6 6 6 0 1 8 1 64 1 512 2 8 1 64 2 512 closest to zero 0000 0000 0001 0001 110 111 000 001 6 6 6 6 6 8 1 64 7 8 1 64 8 8 1 64 9 8 1 64 6 512 7 512 8 512 9 512 largest denorm smallest norm 0110 0110 0111 0111 0111 110 111 000 001 010 1 1 0 0 0 14 8 1 2 15 8 1 2 8 8 1 9 8 1 10 8 1 14 16 15 16 1 9 8 10 8 7 7 n a 14 8 128 224 15 8 128 240 inf 1110 110 1110 111 1111 000 15 213 Intro to Computer Systems Fall 2007 closest to 1 below closest to 1 above largest norm Distribution of Values 6 bit IEEE like format e 3 exponent bits f 2 fraction bits Bias is 3 Notice how the distribution gets …
View Full Document