Unformatted text preview:

Floating PointTopicsTopics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical propertiesSystems I2IEEE Floating PointIEEE Standard 754IEEE Standard 754 Established in 1985 as uniform standard for floating pointarithmetic Before that, many idiosyncratic formats Supported by all major CPUsDriven by Numerical ConcernsDriven by Numerical Concerns Nice standards for rounding, overflow, underflow Hard to make go fast Numerical analysts predominated over hardware types indefining standard3Fractional Binary NumbersRepresentationRepresentation Bits to right of “binary point” represent fractional powers of 2 Represents rational number:bibi–1b2b1b0b–1b–2b–3b–j• • •• • • .1242i–12i• • •• • •1/21/41/82–jbk!2kk =" ji#4Frac. Binary Number ExamplesValueValueRepresentationRepresentation5-3/4 101.1122-7/8 10.111263/64 0.1111112ObservationsObservations Divide by 2 by shifting right Multiply by 2 by shifting left Numbers of form 0.111111…2 just below 1.01/2 + 1/4 + 1/8 + … + 1/2i + … → 1.0Use notation 1.0 – ε5Representable NumbersLimitationLimitation Can only exactly represent numbers of the form x/2k Other numbers have repeating bit representationsValueValueRepresentationRepresentation1/3 0.0101010101[01]…21/5 0.001100110011[0011]…21/10 0.0001100110011[0011]…26Numerical FormNumerical Form –1s M 2ESign bit s determines whether number is negative or positiveSignificand M normally a fractional value in range [1.0,2.0).Exponent E weights value by power of twoEncodingEncoding MSB is sign bit exp field encodes E frac field encodes MFloating Point Representations exp frac7EncodingEncoding MSB is sign bit exp field encodes E frac field encodes MSizesSizes Single precision: 8 exp bits, 23 frac bits32 bits total Double precision: 11 exp bits, 52 frac bits64 bits total Extended precision: 15 exp bits, 63 frac bitsOnly found in Intel-compatible machinesStored in 80 bits» 1 bit wastedFloating Point Precisionss exp frac8“Normalized” Numeric ValuesConditionCondition !exp ≠ 000…0 and exp ≠ 111…1Exponent coded as Exponent coded as biasedbiased value value!E = Exp – BiasExp : unsigned value denoted by expBias : Bias value» Single precision: 127 (Exp: 1…254, E: -126…127)» Double precision: 1023 (Exp: 1…2046, E: -1022…1023)» in general: Bias = 2e-1 - 1, where e is number of exponent bitsSignificandSignificand coded with implied leading 1 coded with implied leading 1!M = 1.xxx…x2!xxx…x: bits of fracMinimum when 000…0 (M = 1.0)Maximum when 111…1 (M = 2.0 – ε )Get extra leading bit for “free”9Normalized Encoding ExampleValueValueFloat F = 15213.0; 1521310 = 111011011011012 = 1.11011011011012 X 213SignificandSignificandM = 1.11011011011012frac = 110110110110100000000002ExponentExponentE = 13Bias = 127Exp = 140 = 100011002Floating Point Representation (Class 02):Hex: 4 6 6 D B 4 0 0Binary: 0100 0110 0110 1101 1011 0100 0000 0000140: 100 0110 015213: 1110 1101 1011 0110Denormalized ValuesConditionCondition !exp = 000…0ValueValue Exponent value E = –Bias + 1 Significand value M = 0.xxx…x2 xxx…x: bits of fracCasesCases exp = 000…0, frac = 000…0 Represents value 0 Note that have distinct values +0 and –0 exp = 000…0, frac ≠ 000…0 Numbers very close to 0.0 Lose precision as get smaller “Gradual underflow”11Special ValuesConditionCondition !exp = 111…1CasesCases exp = 111…1, frac = 000…0 Represents value ∞ (infinity) Operation that overflows Both positive and negative E.g., 1.0/0.0 = −1.0/−0.0 = +∞, 1.0/−0.0 = −∞ exp = 111…1, frac ≠ 000…0 Not-a-Number (NaN) Represents case when no numeric value can be determined E.g., sqrt(–1), ∞ − ∞12Summary of Floating PointReal Number EncodingsNaNNaN+∞−∞−0+Denorm +Normalized-Denorm-Normalized+013Tiny Floating Point Example8-bit Floating Point Representation8-bit Floating Point Representation the sign bit is in the most significant bit. the next four bits are the exponent, with a bias of 7. the last three bits are the fracSame General Form as IEEE FormatSame General Form as IEEE Format normalized, denormalized representation of 0, NaN, infinitysexp frac0236714Values Related to the ExponentExp exp E 2E0 0000 -6 1/64 (denorms)1 0001 -6 1/642 0010 -5 1/323 0011 -4 1/164 0100 -3 1/85 0101 -2 1/46 0110 -1 1/27 0111 0 18 1000 +1 29 1001 +2 410 1010 +3 811 1011 +4 1612 1100 +5 3213 1101 +6 6414 1110 +7 12815 1111 n/a (inf, NaN)15Dynamic Ranges exp frac E Value0 0000 000 -6 00 0000 001 -6 1/8*1/64 = 1/5120 0000 010 -6 2/8*1/64 = 2/512…0 0000 110 -6 6/8*1/64 = 6/5120 0000 111 -6 7/8*1/64 = 7/5120 0001 000 -6 8/8*1/64 = 8/5120 0001 001 -6 9/8*1/64 = 9/512…0 0110 110 -1 14/8*1/2 = 14/160 0110 111 -1 15/8*1/2 = 15/160 0111 000 0 8/8*1 = 10 0111 001 0 9/8*1 = 9/80 0111 010 0 10/8*1 = 10/8…0 1110 110 7 14/8*128 = 2240 1110 111 7 15/8*128 = 2400 1111 000 n/a infclosest to zerolargest denormsmallest normclosest to 1 belowclosest to 1 abovelargest normDenormalizednumbersNormalizednumbers16Distribution of Values6-bit IEEE-like format6-bit IEEE-like format e = 3 exponent bits f = 2 fraction bits Bias is 3Notice how the distribution gets denser toward zero.Notice how the distribution gets denser toward zero.-15 -10 -5 0 5 10 15Denormalized Normalized Infinity17Distribution of Values(close-up view)6-bit IEEE-like format6-bit IEEE-like format e = 3 exponent bits f = 2 fraction bits Bias is 3-1 -0.5 0 0.5 1Denormalized Normalized Infinity18Interesting NumbersDescriptionDescriptionexpexpfracfracNumeric ValueNumeric ValueZeroZero0000……00000000……00000.00.0Smallest Pos. Smallest Pos. DenormDenorm..0000……00000000……010122–– {23,52}{23,52} X 2 X 2–– {126,1022}{126,1022} Single ≈ 1.4 X 10–45 Double ≈ 4.9 X 10–324Largest Largest DenormalizedDenormalized0000……00001111……1111(1.0 (1.0 –– εε) X 2) X 2–– {126,1022}{126,1022} Single ≈ 1.18 X 10–38 Double ≈ 2.2 X 10–308Smallest Pos. NormalizedSmallest Pos. Normalized0000……01010000……00001.0 X 21.0 X 2–– {126,1022}{126,1022} Just larger than largest denormalizedOneOne0101……11110000……00001.01.0


View Full Document

UT CS 429H - Floating Point

Download Floating Point
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Floating Point and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Floating Point 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?