Floating Point Arithmetic Sept. 24, 1998Floating Point PuzzlesIEEE Floating PointFractional Binary NumbersFractional Binary Number ExamplesFloating Point Representation“Normalized” Numeric ValuesNormalized Encoding ExampleDenormalized ValuesInteresting NumbersMemory Referencing Bug ExampleReferencing Bug on AlphaReferencing Bug on MIPSSpecial ValuesSpecial Properties of EncodingFloating Point OperationsA Closer Look at Round-To-EvenRounding Binary NumbersFP MultiplicationFP AdditionMathematical Properties of FP AddAlgebraic Properties of FP MultFloating Point in CAnswers to Floating Point PuzzlesAlpha Floating PointFloating Point Code ExampleNumeric Format ConversionGetting FP Bit PatternFloating Point ArithmeticSept. 24, 1998Topics•IEEE Floating Point Standard•Rounding•Floating Point Operations•Mathematical properties•Alpha floating pointclass10.ppt15-213“The course that gives CMU its Zip!”CS 213 F’98– 2 –class10.pptFloating Point Puzzles•For each of the following C expressions, either:–Argue that is true for all argument values–Explain why not true•x == (int)(float) x•x == (int)(double) x•f == (float)(double) f•d == (float) d•f == -(-f);•2/3 == 2/3.0•d < 0.0 ((d*2) < 0.0)•d > f -f < -d•d * d >= 0.0•(d+f)-d == fint x = …;float f = …;double d = …;Assume neitherd nor f is NANCS 213 F’98– 3 –class10.pptIEEE Floating PointIEEE Standard 754•Estabilished in 1985 as uniform standard for floating point arithmetic–Before that, many idiosyncratic formats•Supported by all major CPUsDriven by Numerical Concerns•Nice standards for rounding, overflow, underflow•Hard to make go fast–Numercial analysts predominated over hardware types in defining standardCS 213 F’98– 4 –class10.pptFractional Binary NumbersRepresentation•Bits to right of “binary point” represent fractional powers of 2•Represents rational number:bibi–1b2b1b0b–1b–2b–3b–j• • •• • • .1242i–12i• • •• • •1/21/41/82–jbk2kk jiCS 213 F’98– 5 –class10.pptFractional Binary Number ExamplesValue Representation5-3/4 101.1122-7/8 10.111263/64 0.1111112Observation•Divide by 2 by shifting right•Numbers of form 0.111111…2 just below 1.0–Use notation 1.0 – Limitation•Can only exactly represent numbers of the form x/2k•Other numbers have repeating bit representationsValue Representation1/3 0.0101010101[01]…21/5 0.001100110011[0011]…21/10 0.0001100110011[0011]…2CS 213 F’98– 6 –class10.pptNumerical Form•–1s m 2E–Sign bit s determines whether number is negative or positive–Mantissa m normally a fractional value in range [1.0,2.0).–Exponent E weights value by power of twoEncoding•MSB is sign bit•Exp field encodes E•Significand field encodes mSizes•Single precision: 8 exp bits, 23 significand bits–32 bits total•Double precision: 11 exp bits, 52 significand bits–64 bits totalFloating Point Representations exp significandCS 213 F’98– 7 –class10.ppt“Normalized” Numeric ValuesCondition•exp 000…0 and exp 111…1Exponent coded as biased valueE = Exp – Bias–Exp : unsigned value denoted by exp –Bias : Bias value»Single precision: 127»Double precision: 1023Mantissa coded with implied leading 1m = 1.xxx…x2–xxx…x: bits of significand–Minimum when 000…0 (m = 1.0)–Maximum when 111…1 (m = 2.0 – )–Get extra leading bit for “free”CS 213 F’98– 8 –class10.pptNormalized Encoding ExampleValueFloat F = 15213.0;•1521310 = 111011011011012 = 1.11011011011012 X 213Significandm = 1.11011011011012sig = 110110110110100000000002ExponentE = 13Bias = 127Exp = 140 = 100011002Floating Point Representation (Class 02):Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000140: 100 0110 015213: 1110 1101 1011 01CS 213 F’98– 9 –class10.pptDenormalized ValuesCondition•exp = 000…0Value•Exponent value E = –Bias + 1•Mantissa value m = 0.xxx…x2–xxx…x: bits of significandCases• exp = 000…0, significand = 000…0–Represents value 0–Note that have distinct values +0 and –0•exp = 000…0, significand 000…0–Numbers very close to 0.0–Lose precision as get smaller–“Gradual underflow”CS 213 F’98– 10 –class10.pptInteresting NumbersDescription Exp Significand Numeric ValueZero 00…00 00…00 0.0Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}•Single 1.4 X 10–45•Double 4.9 X 10–324Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}•Single 1.18 X 10–38•Double 2.2 X 10–308Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}•Just larger than largest denormalizedOne 01…11 00…00 1.0 Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}•Single 3.4 X 1038•Double 1.8 X 10308CS 213 F’98– 11 –class10.pptMemory Referencing Bug Examplemain (){ long int a[2]; double d = 3.14; a[2] = 1073741824; /* Out of bounds reference */ printf("d = %.15g\n", d); exit(0);}main (){ long int a[2]; double d = 3.14; a[2] = 1073741824; /* Out of bounds reference */ printf("d = %.15g\n", d); exit(0);}Alpha MIPS Sun-g 5.30498947741318e-315 3.1399998664856 3.14-O 3.14 3.14 3.14From Class 01CS 213 F’98– 12 –class10.pptReferencing Bug on AlphaOptimized Code•Double d stored in register•Unaffected by errant writeAlpha -g• 1073741824 = 0x40000000 = 230•Overwrites all 8 bytes with value 0x0000000040000000•Denormalized value 230 X (smallest denorm 2–1074) = 2–1044• 5.305 X 10–315a[0] long int a[2]; double d = 3.14; a[2] = 1073741824;a[1]dAlpha Stack Frame (-g)CS 213 F’98– 13 –class10.pptReferencing Bug on MIPSMIPS -g•Overwrites lower 4 bytes with value 0x40000000•Original value 3.14 represented as 0x40091eb851eb851f•Modified value represented as 0x40091eb840000000•Exp = 1024 E = 1024–1023 = 1•Mantissa difference: .0000011eb851f16•Integer value: 11eb851f16 = 300,647,71110•Difference = 21 X 2–52 X 300,647,711 1.34 X 10–7•Compare to 3.140000000 – 3.139999866 = 0.000000134 long int a[2]; double d = 3.14; a[2] = 1073741824;a[1]a[0]dMIPS Stack Frame (-g)CS 213 F’98– 14 –class10.pptSpecial ValuesCondition•exp = 111…1Cases• exp = 111…1, significand = 000…0–Represents value(infinity)–Operation
View Full Document