Floating PointIEEE Floating PointFractional Binary NumbersFrac. Binary Number ExamplesRepresentable NumbersFloating Point RepresentationFloating Point Precisions“Normalized” Numeric ValuesNormalized Encoding ExFloating Point OperationsFloating Point in CAriane 5SummaryFloating PointTopicsTopicsOverview of Floating Pointfloats.pptCS 105“Tour of the Black Holes of Computing!”– 2 –CS 105IEEE Floating PointIEEE Floating PointIEEE Standard 754IEEE Standard 754Established in 1985 as uniform standard for floating point arithmeticBefore that, many idiosyncratic formatsSupported by all major CPUsDriven by Numerical ConcernsDriven by Numerical ConcernsNice standards for rounding, overflow, underflowHard to make go fastNumerical analysts predominated over hardware types in defining standard– 3 –CS 105Fractional Binary NumbersFractional Binary NumbersRepresentationRepresentationBits to right of “binary point” represent fractional powers of 2Represents rational number:bibi–1b2b1b0b–1b–2b–3b–j• • •• • • .1242i–12i• • •• • •1/21/41/82–jbk2kk ji– 4 –CS 105Frac. Binary Number ExamplesFrac. Binary Number ExamplesValueValueRepresentationRepresentation5-3/4 101.1122-7/8 10.111263/64 0.1111112ObservationsObservationsDivide by 2 by shifting rightMultiply by 2 by shifting leftNumbers of form 0.111111…2 just below 1.01/2 + 1/4 + 1/8 + … + 1/2i + … 1.0Use notation 1.0 – – 5 –CS 105Representable NumbersRepresentable NumbersLimitationLimitationCan only exactly represent numbers of the form x/2kOther numbers have repeating bit representationsValueValueRepresentationRepresentation1/3 0.0101010101[01]…21/5 0.001100110011[0011]…21/10 0.0001100110011[0011]…2– 6 –CS 105Numerical FormNumerical Form–1s M 2ESign bit s determines whether number is negative or positiveSignificand M normally a fractional value in range [1.0,2.0).Exponent E weights value by power of twoEncodingEncodingMSB is sign bitexp field encodes Efrac field encodes MFloating Point RepresentationFloating Point Representations exp frac– 7 –CS 105EncodingEncodingMSB is sign bitexp field encodes Efrac field encodes MSizesSizesSingle precision: 8 exp bits, 23 frac bits32 bits totalDouble precision: 11 exp bits, 52 frac bits64 bits totalExtended precision: 15 exp bits, 63 frac bitsOnly found in Intel-compatible machinesStored in 80 bits»1 bit wastedFloating Point PrecisionsFloating Point Precisionss exp frac– 8 –CS 105“Normalized” Numeric Values“Normalized” Numeric ValuesConditionConditionexp 000…0 and exp 111…1Exponent coded as Exponent coded as biasedbiased value valueE = Exp – BiasExp : unsigned value denoted by exp Bias : Bias value»Single precision: 127 (Exp: 1…254, E: -126…127)»Double precision: 1023 (Exp: 1…2046, E: -1022…1023)»in general: Bias = 2e-1 - 1, where e is number of exponent bitsSignificand coded with implied leading 1Significand coded with implied leading 1M = 1.xxx…x2xxx…x: bits of fracMinimum when 000…0 (M = 1.0)Maximum when 111…1 (M = 2.0 – )Get extra leading bit for “free”– 9 –CS 105Normalized Encoding Ex Normalized Encoding Ex ValueValueFloat F = 15213.0;1521310 = 111011011011012 = 1.11011011011012 X 213SignificandSignificandM = 1.11011011011012frac= 110110110110100000000002ExponentExponentE = 13Bias = 127Exp = 140 = 100011002Floating Point Representation (Class 02):Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000140: 100 0110 015213: 1110 1101 1011 01– 10 –CS 105Floating Point OperationsFloating Point OperationsConceptual ViewConceptual ViewFirst compute exact resultMake it fit into desired precisionPossibly overflow if exponent too largePossibly round to fit into fracRounding Modes (illustrate with $ rounding)Rounding Modes (illustrate with $ rounding)$1.40$1.40$1.60$1.60$1.50$1.50$2.50$2.50–$1.50–$1.50Zero $1 $1 $1 $2 –$1Round down (- ) $1 $1 $1 $2 –$2Round up (+ ) $2 $2 $2 $3 –$1Nearest Even (default) $1 $2 $2 $2 –$2Note:1. Round down: rounded result is close to but no greater than true result.2. Round up: rounded result is close to but no less than true result.– 11 –CS 105Floating Point in CFloating Point in CC Guarantees Two LevelsC Guarantees Two Levelsfloat single precisiondouble double precisionConversionsConversionsCasting between int, float, and double changes numeric values Double or float to intTruncates fractional partLike rounding toward zeroNot defined when out of range»Generally saturates to TMin or TMax int to doubleExact conversion, as long as int has ≤ 53 bit word size int to floatWill round according to rounding mode– 12 –CS 105Ariane 5Ariane 5Exploded 37 seconds after liftofCargo worth $500 millionWhyWhyComputed horizontal velocity as floating point numberConverted to 16-bit integerWorked OK for Ariane 4Overflowed for Ariane 5Used same software– 13 –CS 105SummarySummaryIEEE Floating Point Has Clear Mathematical PropertiesIEEE Floating Point Has Clear Mathematical PropertiesRepresents numbers of form M X 2ECan reason about operations independent of implementationAs if computed with perfect precision and then roundedNot the same as real arithmeticViolates associativity/distributivityMakes life difficult for compilers & serious numerical applications
View Full Document