UCSC CMPE 012 - Floating Point Format Lecture Notes - D1745414

Home> Schools> University of California, Santa Cruz> Computer Engineering (CMPE) > CMPE 012> Floating Point Format Lecture Notes

DOC PREVIEW

UCSC CMPE 012 - Floating Point Format Lecture Notes

School name University of California, Santa Cruz

Course Cmpe 012- Computer Systems and Assembly

Pages 18

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 18 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1IEEE Double Precision FPSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 181CMPE12c Cyrus BazeghiWhat do floating-point numbers represent?•Rational numbers with non-repeating expansionsin the given base within the specified exponent range.•They do not represent repeating rational or irrational numbers, or any number too small or too large.Floating Point Format2CMPE12c Cyrus BazeghiIEEE Double Precision FP•IEEE Double Precision is similar to SP–52-bit M•53 bits of precision with hidden bit–11-bit E, excess 1023, representing –1022 <- -> 1023–One sign bit•Always use DP unless memory/file size is important–SP ~ 10-38 … 1038–DP ~ 10-308 … 10308•Be very careful of these ranges in numeric computation3CMPE12c Cyrus BazeghiFloating Point ArithmeticFloating Point operations include•Addition•Subtraction•Multiplication•DivisionThey are complicated because…4CMPE12c Cyrus BazeghiFloating Point Addition1. Align decimal points2. Add3. Normalize the result•Often already normalized•Otherwise move one digit1.0001631 x 1034. Round result1.000 x 103 9.997 x 102+ 4.631 x 10-1 9.997 x 102+ 0.004631 x 102 10.001631 x 102Decimal ReviewHow do we do this?5CMPE12c Cyrus BazeghiFloating Point AdditionFirst step: get into SP FP if not already.25 = 0 01111101 00000000000000000000000100 = 0 10000101 10010000000000000000000Or with hidden bit.25 = 0 01111101 1 00000000000000000000000100 = 0 10000101 1 10010000000000000000000Example: 0.25 + 100 in SP FP Hidden Bit6CMPE12c Cyrus BazeghiSecond step: Align radix points–Shifting F left by 1 bit, dec reasing e by 1–Shifting F right by 1 bit, in creasing e by 1–Shift F right so least significant bits fall of–Which of the two numbers should we shift?Floating Point Addition7CMPE12c Cyrus BazeghiFloating Point AdditionShift the .25 to increase its exponent so it matches that of 100.0.25’s e: 01111101 – 1111111 (127) = 100’s e: 10000101 – 1111111 (127) =Shift .25 by 8 then.Easier method: Bias cancels with subtraction, soSecond step: Align radix points cont. 10000101- 01111101 00001000100’s E0.25’s E8CMPE12c Cyrus BazeghiCarefully shifting the 0.25’s fraction S E HB F•0 01111101 1 00000000000000000000000 (original value)•0 01111110 0 10000000000000000000000 (shifted by 1)•0 01111111 0 01000000000000000000000 (shifted by 2)•0 10000000 0 00100000000000000000000 (shifted by 3)•0 10000001 0 00010000000000000000000 (shifted by 4)•0 10000010 0 00001000000000000000000 (shifted by 5)•0 10000011 0 00000100000000000000000 (shifted by 6)•0 10000100 0 00000010000000000000000 (shifted by 7)•0 10000101 0 00000001000000000000000 (shifted by 8)Floating Point Addition9CMPE12c Cyrus BazeghiFloating Point AdditionThird Step: Add fractions with hidden bit0 10000101 1 10010000000000000000000 (100)+ 0 10000101 0 00000001000000000000000 (.25)0 10000101 1 10010001000000000000000Fourth Step: Normalize the result•Get a ‘1’ back in hidden bit•Already normalized most of the time•Remove hidden bit and finished10CMPE12c Cyrus BazeghiNormalization exampleS E HB F0 011 1 1100+ 0 011 1 10110 011 11 0111Need to shift so that only a 1 in HB spot0 100 1 10111 -> discardedFloating Point Addition11CMPE12c Cyrus BazeghiFloating Point Subtraction•Mantissa’s are sign-magnitude•Watch out when the numbers are close1.23455 x 102- 1.23456 x 102•A many-digit normalization is possibleThis is why FP addition is in many ways moredifficult than FP multiplication12CMPE12c Cyrus BazeghiFloating Point Subtraction1. Align radix points2. Perform sign-magnitude operand swap if needed•Compare magnitudes (with hidden bit)•Change sign bit if order of operands is changed.3. Subtract4. Normalize5. RoundSteps to do subtraction13CMPE12c Cyrus BazeghiS E HB F0 011 1 1011 smaller - 0 011 1 1101 biggerswitch order and make result negative0 011 1 1101 bigger - 0 011 1 1011 smaller1 011 0 00101 000 1 0000 switched signFloating Point SubtractionSimple Example:14CMPE12c Cyrus BazeghiFloating Point Multiplication1. Multiply mantissas 3.0 x 5.0 15.002. Add exponents1 + 2 = 33. Combine15.00 x 1034. Normalize if needed1.50 x 104Decimal example:3.0 x 101 x 5.0 x 102How do we do this?15CMPE12c Cyrus BazeghiFloating Point MultiplicationMultiplication in binary (4-bit F)0 10000100 0100 x 1 00111100 1100Step 1: Multiply mantissas(put hidden bit back first!!) 1.0100 x 1.1100 00000 00000 10100 10100+ 10100 100011000010.0011000016CMPE12c Cyrus BazeghiFloating Point MultiplicationSecond step: Add exponents, subtract extra bias.10000100 + 00111100Third step: Renormalize, correcting exponent1 01000001 10.00110000Becomes1 01000010 1.000110000Fourth step: Drop the hidden bit1 01000010 00011000011000000 11000000- 01111111 (127) 0100000117CMPE12c Cyrus BazeghiMultiply these SP FP numbers together0x49FC0000x 0x4BE00000Floating Point Multiplication18CMPE12c Cyrus BazeghiFloating Point Division•True division•Unsigned, full-precision division on mantissas•This is much more costly (e.g. 4x) than mult.•Subtract exponents•Faster division•Newton’s method to find reciprocal•Multiply dividend by reciprocal of divisor•May not yield exact result without some work•Similar speed as

View Full Document