DOC PREVIEW
Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 49 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS61C - Machine Structures Lecture 10 - Floating Point, Part II and MiscellaneousReviewOverviewSpecial NumbersRepresentation for Not a NumberSpecial Numbers (cont’d)Representation for Denorms (1/2)Representation for Denorms (2/2)RoundingIEEE Rounding ModesRound to EvenCasting floats to ints and vice versaint -> float -> intfloat -> int -> floatFloating Point FallacyAdministriviaJ-Format Instructions (1/5)J-Format Instructions (2/5)J-Format Instructions (3/5)J-Format Instructions (4/5)J-Format Instructions (5/5)Decoding Machine LanguageDecoding Example (1/6)Decoding Example (2/6)Decoding Example (3/6)Decoding Example (4/6)Decoding Example (5/6)Decoding Example (6/6)Bitwise Operations (1/2)Bitwise Operations (2/2)Logical Operators (1/4)Logical Operators (2/4)Logical Operators (3/4)Logical Operators (4/4)Uses for Logical Operators (1/3)Uses for Logical Operators (2/3)Uses for Logical Operators (3/3)Shift Instructions (1/4)Shift Instructions (2/4)Shift Instructions (3/4)Shift Instructions (4/4)Uses for Shift Instructions (1/5)Uses for Shift Instructions (2/5)Uses for Shift Instructions (3/5)Uses for Shift Instructions (4/5)Uses for Shift Instructions (5/5)Things to Remember (1/3)Things to Remember (2/3)Things to Remember (3/3)1CS61C L10 Fl. Pt. © UC RegentsCS61C - Machine StructuresLecture 10 - Floating Point, Part II and MiscellaneousSeptember 29, 2000David Pattersonhttp://www-inst.eecs.berkeley.edu/~cs61c/2CS61C L10 Fl. Pt. © UC RegentsReview°Floating Point numbers approximate values that we want to use.°IEEE 754 Floating Point Standard is most widely accepted attempt to standardize interpretation of such numbers ($1T)°New MIPS registers($f0-$f31), instruct.:•Single Precision (32 bits, 2x10-38… 2x1038):add.s, sub.s, mul.s, div.s•Double Precision (64 bits , 2x10-308…2x10308): add.d, sub.d, mul.d, div.d °Type is not associated with data, bits have no meaning unless given in context3CS61C L10 Fl. Pt. © UC RegentsOverview°Special Floating Point Numbers: NaN, Denorms°IEEE Rounding modes°Floating Point fallacies, hacks°Catchup topics:•Representation of jump, jump and link•Reverse time travel: MIPS machine language-> MIPS assembly language-> C code•Logical, shift instructions (time permiting)4CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers°What have we defined so far? (Single Precision)Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero ???°Professor Kahan had clever ideas; “Waste not, want not”5CS61C L10 Fl. Pt. © UC RegentsRepresentation for Not a Number°What do I get if I calculate sqrt(-4.0)or 0/0?•If infinity is not an error, these shouldn’t be either.•Called Not a Number (NaN)•Exponent = 255, Significand nonzero°Why is this useful?•Hope NaNs help with debugging?•They contaminate: op(NaN,X) = NaN•OK if calculate but don’t use it•Ask math majors6CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers (cont’d)°What have we defined so far? (Single Precision)?Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero NaN7CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (1/2)°Problem: There’s a gap among representable FP numbers around 0•Smallest representable pos num:-a = 1.0… 2 * 2-127 = 2-127•Second smallest representable pos num:-b = 1.000……1 2 * 2-127 = 2-127 + 2-150•a - 0 = 2-127•b - a = 2-150ba0+-Gap!Gap!8CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (2/2)°Solution:•We still haven’t used Exponent = 0, Significand nonzero•Denormalized number: no leading 1•Smallest representable pos num:-a = 2-150 •Second smallest representable pos num:-b = 2-1490+-9CS61C L10 Fl. Pt. © UC RegentsRounding°When we perform math on real numbers, we have to worry about rounding°The actual math carries two extra bits of precision, and then round to get the proper value°Rounding also occurs when converting a double to a single precision value, or converting a floating point number to an integer10CS61C L10 Fl. Pt. © UC RegentsIEEE Rounding Modes°Round towards +infinity•ALWAYS round “up”: 2.001 -> 3 •-2.001 -> -2°Round towards -infinity•ALWAYS round “down”: 1.999 -> 1, •-1.999 -> -2°Truncate•Just drop the last bits (round towards 0)°Round to (nearest) even•Normal rounding, almost11CS61C L10 Fl. Pt. © UC RegentsRound to Even°Round like you learned in grade school°Except if the value is right on the borderline, in which case we round to the nearest EVEN number•2.5 -> 2•3.5 -> 4°Insures fairness on calculation•This way, half the time we round up on tie, the other half time we round down•Ask statistics majors°This is the default rounding mode12CS61C L10 Fl. Pt. © UC RegentsCasting floats to ints and vice versa°(int) exp•Coerces and converts it to the nearest integer•affected by rounding modes•i = (int) (3.14159 * f);°(float) exp•converts integer to nearest floating point•f = f + (float) i;13CS61C L10 Fl. Pt. © UC Regentsint -> float -> int°Will not always work°Large values of integers don’t have exact floating point representations°Similarly, we may round to the wrong valueif (i == (int)((float) i)) { printf(“true”);}14CS61C L10 Fl. Pt. © UC Regentsfloat -> int -> float°Will not always work°Small values of floating point don’t have good integer representations°Also rounding errorsif (f == (float)((int) f)) { printf(“true”);}15CS61C L10 Fl. Pt. © UC RegentsFloating Point Fallacy°FP Add, subtract associative: FALSE!•x = – 1.5 x 1038, y = 1.5 x 1038, and z = 1.0•x + (y + z) = –1.5x1038 + (1.5x1038 + 1.0)= –1.5x1038 + (1.5x1038) = 0.0•(x + y) + z = (–1.5x1038 + 1.5x1038) + 1.0= (0.0) + 1.0 = 1.0°Therefore, Floating Point add, subtract are not associative!•Why? FP result approximates real result!•This exampe: 1.5 x 1038 is so much larger than 1.0 that 1.5 x 1038 + 1.0 in floating point representation is still 1.5 x 103816CS61C L10 Fl. Pt. © UC RegentsAdministrivia°Need to catchup with Homework°Reading assignment: Reading 4.817CS61C L10 Fl. Pt. © UC RegentsJ-Format Instructions (1/5)°For branches, we assumed that we won’t want to branch too far, so we can specify change in PC.°For general jumps (j and jal), we may jump to anywhere in memory.°Ideally, we could specify a 32-bit memory address to jump to.°Unfortunately, we can’t fit both a 6-bit opcode and a 32-bit address into a single 32-bit word, so we compromise.18CS61C


View Full Document

Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download Lecture 10 Floating Point, Part II and Miscellaneous
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 Floating Point, Part II and Miscellaneous and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 Floating Point, Part II and Miscellaneous 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?