DOC PREVIEW
UCSC CMPE 012 - Floating Point Numbers

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMPE12 Cyrus BazeghiFloating Point NumbersCMPE12 Cyrus Bazeghi2Floating Point Numbers•Registers for real numbers usually contain 32 or 64 bits, allowing 232or 264numbers to be represented.•Which reals to represent? There are an infinite number between 2 adjacent integers. (or two reals!!)•Which bit patterns for reals selected?•Answer: use scientific notationCMPE12 Cyrus Bazeghi3A B A x 10B0 any 01 .. 9 0 1 .. 91 .. 9 1 10 .. 901 .. 9 2 100 .. 9001 .. 9 -1 0.1 .. 0.91 .. 9 -2 0.01 .. 0.09Consider: A x 10B, where A is one digitHow to do scientific notation in binary?Standard: IEEE 754 Floating-PointFloating Point NumbersCMPE12 Cyrus Bazeghi4IEEE 754 Single Precision Floating Point FormatRepresentation:S E F• S is one bit representing the sign of the number• E is an 8 bit biased integer representing the exponent• F is an unsigned integerThe true value represented is: (-1)Sx fx 2e• S = sign bit •e= E – bias•f= F/2n+ 1• for single precision numbers n=23, bias=127022233031CMPE12 Cyrus Bazeghi5S, E, F are all fields within a representation. Each is just a bunch of bits.S is the sign bit• (-1)S (-1)0= +1 and (-1)1= -1• Just a sign bit for signed magnitudeE is the exponent field• The E field is a biased-127 representation.• True exponent is (E –bias)• The base (radix) is always 2 (implied).• Some early machines used radix 4 or 16 (IBM)IEEE 754 Single Precision Floating Point FormatCMPE12 Cyrus Bazeghi6F (or M) is the fractional or mantissa field.• It is in a strange form.• There are 23 bits for F.• A normalized FP number always has a leading 1.• No need to store the one, just assume it.• This MSB is called the HIDDEN BIT.IEEE 754 Single Precision Floating Point FormatCMPE12 Cyrus Bazeghi7How to convert 64.2 into IEEE SP1. Get a binary representation for 64.2• Binary of left of radix point is:• Binary of right of radix:.2 x 2 = 0.4 0.4 x 2 = 0.8 0.8 x 2 = 1.6 1.6 x 2 = 1.2 1• Binary for .2:• 64.2 is:2. Normalize binary form• Produces:CMPE12 Cyrus Bazeghi8Floating Point• Since floating point numbers are always stored innormal form, how do we represent 0?• 0x0000 0000 and 0x8000 0000 represent 0.• What numbers cannot be represented because of this?3. Turn true exponent into bias-1274. Put it together:23-bit F is:S E F is:In hex:CMPE12 Cyrus Bazeghi9IEEE Floating Point FormatOther special values:• + 5 / 0 = + • + = 0 11111111 00000… (0x7f80 0000)• -7/0 = -• - = 1 11111111 00000… (0xff80 0000)• 0/0 or + + - = NaN (Not a number)• NaN ? 11111111 ?????…(S is either 0 or 1, E=0xff, and F is anything but all zeroes)• Also de-normalized numbers (beyond scope)CMPE12 Cyrus Bazeghi10IEEE Floating PointWhat is the decimal value for this SP FP number0x4228 0000?CMPE12 Cyrus Bazeghi11IEEE Floating PointWhat is 47.62510in SP FP format?CMPE12 Cyrus Bazeghi12What do floating-point numbers represent?• Rational numbers with non-repeating expansionsin the given base within the specified exponent range.• They do not represent repeating rational or irrational numbers, or any number too small or too large.Floating Point FormatCMPE12 Cyrus Bazeghi13IEEE Double Precision FP• IEEE Double Precision is similar to SP– 52-bit M• 53 bits of precision with hidden bit– 11-bit E, excess 1023, representing –1023 <- -> 2046– One sign bit• Always use DP unless memory/file size is important– SP ~ 10-38… 1038– DP ~ 10-308… 10308• Be very careful of these ranges in numeric computationCMPE12 Cyrus Bazeghi14Floating Point ArithmeticFloating Point operations include•Addition•Subtraction•Multiplication•DivisionThey are complicated because…CMPE12 Cyrus Bazeghi15Floating Point Addition1. Align decimal points2. Add3. Normalize the result• Often already normalized• Otherwise move one digit1.0001631 x 1034. Possibly round result1.000 x 1039.997 x 102+ 4.631 x 10-19.997 x 102+ 0.004631 x 10210.001631 x 102Decimal ReviewHow do we do this?CMPE12 Cyrus Bazeghi16Floating Point AdditionFirst step: get into SP FP if not already.25 = 0 01111101 00000000000000000000000100 = 0 10000101 10010000000000000000000Or with hidden bit.25 = 0 01111101 1 00000000000000000000000100 = 0 10000101 1 10010000000000000000000Example: 0.25 + 100 in SP FP Hidden BitCMPE12 Cyrus Bazeghi17Second step: Align radix points– Shifting F left by 1 bit, decreasing e by 1– Shifting F right by 1 bit, increasing e by 1– Shift F right so least significant bits fall off– Which of the two numbers should we shift?Floating Point AdditionCMPE12 Cyrus Bazeghi18Floating Point AdditionShift the .25 to increase its exponent so it matches that of 100.0.25’s e: 01111101 – 1111111 (127) = 100’s e: 10000101 – 1111111 (127) =Shift .25 by 8 then.Easier method: Bias cancels with subtraction, soSecond step: Align radix points cont.10000101- 0111110100001000100’s E0.25’s ECMPE12 Cyrus Bazeghi19Carefully shifting the 0.25’s fractionS E HB F• 0 01111101 1 00000000000000000000000 (original value)• 0 01111110 0 10000000000000000000000 (shifted by 1)• 0 01111111 0 01000000000000000000000 (shifted by 2)• 0 10000000 0 00100000000000000000000 (shifted by 3)• 0 10000001 0 00010000000000000000000 (shifted by 4)• 0 10000010 0 00001000000000000000000 (shifted by 5)• 0 10000011 0 00000100000000000000000 (shifted by 6)• 0 10000100 0 00000010000000000000000 (shifted by 7)• 0 10000101 0 00000001000000000000000 (shifted by 8)Floating Point AdditionCMPE12 Cyrus Bazeghi20Floating Point AdditionThird Step: Add fractions with hidden bit0 10000101 1 10010000000000000000000 (100)+ 0 10000101 0 00000001000000000000000 (.25)0 10000101 1 10010001000000000000000Fourth Step: Normalize the result• Get a ‘1’ back in hidden bit• Already normalized most of the time• Remove hidden bit and finishedCMPE12 Cyrus Bazeghi21Normalization exampleS E HB F0 011 1 1100+ 0 011 1 10110 011 11 0111Need to shift so that only a 1 in HB spot0 100 1 1011 1 -> discardedFloating Point AdditionCMPE12 Cyrus Bazeghi22Floating Point Subtraction•Mantissa’s are sign-magnitude•Watch out when the numbers are close1.23455 x 102- 1.23456 x 102•A many-digit normalization is possibleThis is why FP addition is in many ways moredifficult than FP multiplicationCMPE12 Cyrus Bazeghi23Floating Point Subtraction1. Align radix points2. Perform sign-magnitude operand swap if needed• Compare magnitudes (with hidden bit)•


View Full Document

UCSC CMPE 012 - Floating Point Numbers

Download Floating Point Numbers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Floating Point Numbers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Floating Point Numbers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?