Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous - D2866204

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 61C> Lecture 10 Floating Point, Part II and Miscellaneous

DOC PREVIEW

Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous

School name University of California, Berkeley

Course Compsci 61c- Machine Structures

Pages 49

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 49 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS61C - Machine Structures Lecture 10 - Floating Point, Part II and MiscellaneousReviewOverviewSpecial NumbersRepresentation for Not a NumberSpecial Numbers (cont’d)Representation for Denorms (1/2)Representation for Denorms (2/2)RoundingIEEE Rounding ModesRound to EvenCasting floats to ints and vice versaint -> float -> intfloat -> int -> floatFloating Point FallacyAdministriviaJ-Format Instructions (1/5)J-Format Instructions (2/5)J-Format Instructions (3/5)J-Format Instructions (4/5)J-Format Instructions (5/5)Decoding Machine LanguageDecoding Example (1/6)Decoding Example (2/6)Decoding Example (3/6)Decoding Example (4/6)Decoding Example (5/6)Decoding Example (6/6)Bitwise Operations (1/2)Bitwise Operations (2/2)Logical Operators (1/4)Logical Operators (2/4)Logical Operators (3/4)Logical Operators (4/4)Uses for Logical Operators (1/3)Uses for Logical Operators (2/3)Uses for Logical Operators (3/3)Shift Instructions (1/4)Shift Instructions (2/4)Shift Instructions (3/4)Shift Instructions (4/4)Uses for Shift Instructions (1/5)Uses for Shift Instructions (2/5)Uses for Shift Instructions (3/5)Uses for Shift Instructions (4/5)Uses for Shift Instructions (5/5)Things to Remember (1/3)Things to Remember (2/3)Things to Remember (3/3)1CS61C L10 Fl. Pt. © UC RegentsCS61C - Machine StructuresLecture 10 - Floating Point, Part II and MiscellaneousSeptember 29, 2000David Pattersonhttp://www-inst.eecs.berkeley.edu/~cs61c/2CS61C L10 Fl. Pt. © UC RegentsReview°Floating Point numbers approximate values that we want to use.°IEEE 754 Floating Point Standard is most widely accepted attempt to standardize interpretation of such numbers ($1T)°New MIPS registers($f0-$f31), instruct.:•Single Precision (32 bits, 2x10-38… 2x1038):add.s, sub.s, mul.s, div.s•Double Precision (64 bits , 2x10-308…2x10308): add.d, sub.d, mul.d, div.d °Type is not associated with data, bits have no meaning unless given in context3CS61C L10 Fl. Pt. © UC RegentsOverview°Special Floating Point Numbers: NaN, Denorms°IEEE Rounding modes°Floating Point fallacies, hacks°Catchup topics:•Representation of jump, jump and link•Reverse time travel: MIPS machine language-> MIPS assembly language-> C code•Logical, shift instructions (time permiting)4CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers°What have we defined so far? (Single Precision)Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero ???°Professor Kahan had clever ideas; “Waste not, want not”5CS61C L10 Fl. Pt. © UC RegentsRepresentation for Not a Number°What do I get if I calculate sqrt(-4.0)or 0/0?•If infinity is not an error, these shouldn’t be either.•Called Not a Number (NaN)•Exponent = 255, Significand nonzero°Why is this useful?•Hope NaNs help with debugging?•They contaminate: op(NaN,X) = NaN•OK if calculate but don’t use it•Ask math majors6CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers (cont’d)°What have we defined so far? (Single Precision)?Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero NaN7CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (1/2)°Problem: There’s a gap among representable FP numbers around 0•Smallest representable pos num:-a = 1.0… 2 * 2-127 = 2-127•Second smallest representable pos num:-b = 1.000……1 2 * 2-127 = 2-127 + 2-150•a - 0 = 2-127•b - a = 2-150ba0+-Gap!Gap!8CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (2/2)°Solution:•We still haven’t used Exponent = 0, Significand nonzero•Denormalized number: no leading 1•Smallest representable pos num:-a = 2-150 •Second smallest representable pos num:-b = 2-1490+-9CS61C L10 Fl. Pt. © UC RegentsRounding°When we perform math on real numbers, we have to worry about rounding°The actual math carries two extra bits of precision, and then round to get the proper value°Rounding also occurs when converting a double to a single precision value, or converting a floating point number to an integer10CS61C L10 Fl. Pt. © UC RegentsIEEE Rounding Modes°Round towards +infinity•ALWAYS round “up”: 2.001 -> 3 •-2.001 -> -2°Round towards -infinity•ALWAYS round “down”: 1.999 -> 1, •-1.999 -> -2°Truncate•Just drop the last bits (round towards 0)°Round to (nearest) even•Normal rounding, almost11CS61C L10 Fl. Pt. © UC RegentsRound to Even°Round like you learned in grade school°Except if the value is right on the borderline, in which case we round to the nearest EVEN number•2.5 -> 2•3.5 -> 4°Insures fairness on calculation•This way, half the time we round up on tie, the other half time we round down•Ask statistics majors°This is the default rounding mode12CS61C L10 Fl. Pt. © UC RegentsCasting floats to ints and vice versa°(int) exp•Coerces and converts it to the nearest integer•affected by rounding modes•i = (int) (3.14159 * f);°(float) exp•converts integer to nearest floating point•f = f + (float) i;13CS61C L10 Fl. Pt. © UC Regentsint -> float -> int°Will not always work°Large values of integers don’t have exact floating point representations°Similarly, we may round to the wrong valueif (i == (int)((float) i)) { printf(“true”);}14CS61C L10 Fl. Pt. © UC Regentsfloat -> int -> float°Will not always work°Small values of floating point don’t have good integer representations°Also rounding errorsif (f == (float)((int) f)) { printf(“true”);}15CS61C L10 Fl. Pt. © UC RegentsFloating Point Fallacy°FP Add, subtract associative: FALSE!•x = – 1.5 x 1038, y = 1.5 x 1038, and z = 1.0•x + (y + z) = –1.5x1038 + (1.5x1038 + 1.0)= –1.5x1038 + (1.5x1038) = 0.0•(x + y) + z = (–1.5x1038 + 1.5x1038) + 1.0= (0.0) + 1.0 = 1.0°Therefore, Floating Point add, subtract are not associative!•Why? FP result approximates real result!•This exampe: 1.5 x 1038 is so much larger than 1.0 that 1.5 x 1038 + 1.0 in floating point representation is still 1.5 x 103816CS61C L10 Fl. Pt. © UC RegentsAdministrivia°Need to catchup with Homework°Reading assignment: Reading 4.817CS61C L10 Fl. Pt. © UC RegentsJ-Format Instructions (1/5)°For branches, we assumed that we won’t want to branch too far, so we can specify change in PC.°For general jumps (j and jal), we may jump to anywhere in memory.°Ideally, we could specify a 32-bit memory address to jump to.°Unfortunately, we can’t fit both a 6-bit opcode and a 32-bit address into a single 32-bit word, so we compromise.18CS61C

View Full Document

Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-23-24-25-26-47-48-49 out of 49 pages.

Berkeley COMPSCI 61C - Lecture 10 Floating Point, Part II and Miscellaneous

Sign up for free to view:

Please select your school