CS61C - Machine Structures Lecture 10 - Floating Point, Part II and MiscellaneousReviewOverviewSpecial NumbersRepresentation for Not a NumberSpecial Numbers (cont’d)Representation for Denorms (1/2)Representation for Denorms (2/2)RoundingIEEE Rounding ModesRound to EvenCasting floats to ints and vice versaint -> float -> intfloat -> int -> floatFloating Point FallacyAdministriviaJ-Format Instructions (1/5)J-Format Instructions (2/5)J-Format Instructions (3/5)J-Format Instructions (4/5)J-Format Instructions (5/5)Decoding Machine LanguageDecoding Example (1/6)Decoding Example (2/6)Decoding Example (3/6)Decoding Example (4/6)Decoding Example (5/6)Decoding Example (6/6)Bitwise Operations (1/2)Bitwise Operations (2/2)Logical Operators (1/4)Logical Operators (2/4)Logical Operators (3/4)Logical Operators (4/4)Uses for Logical Operators (1/3)Uses for Logical Operators (2/3)Uses for Logical Operators (3/3)Shift Instructions (1/4)Shift Instructions (2/4)Shift Instructions (3/4)Shift Instructions (4/4)Uses for Shift Instructions (1/5)Uses for Shift Instructions (2/5)Uses for Shift Instructions (3/5)Uses for Shift Instructions (4/5)Uses for Shift Instructions (5/5)Things to Remember (1/3)Things to Remember (2/3)Things to Remember (3/3)1CS61C L10 Fl. Pt. © UC RegentsCS61C - Machine StructuresLecture 10 - Floating Point, Part II and MiscellaneousSeptember 29, 2000David Pattersonhttp://www-inst.eecs.berkeley.edu/~cs61c/2CS61C L10 Fl. Pt. © UC RegentsReview°Floating Point numbers approximate values that we want to use.°IEEE 754 Floating Point Standard is most widely accepted attempt to standardize interpretation of such numbers ($1T)°New MIPS registers($f0-$f31), instruct.:•Single Precision (32 bits, 2x10-38… 2x1038):add.s, sub.s, mul.s, div.s•Double Precision (64 bits , 2x10-308…2x10308): add.d, sub.d, mul.d, div.d °Type is not associated with data, bits have no meaning unless given in context3CS61C L10 Fl. Pt. © UC RegentsOverview°Special Floating Point Numbers: NaN, Denorms°IEEE Rounding modes°Floating Point fallacies, hacks°Catchup topics:•Representation of jump, jump and link•Reverse time travel: MIPS machine language-> MIPS assembly language-> C code•Logical, shift instructions (time permiting)4CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers°What have we defined so far? (Single Precision)Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero ???°Professor Kahan had clever ideas; “Waste not, want not”5CS61C L10 Fl. Pt. © UC RegentsRepresentation for Not a Number°What do I get if I calculate sqrt(-4.0)or 0/0?•If infinity is not an error, these shouldn’t be either.•Called Not a Number (NaN)•Exponent = 255, Significand nonzero°Why is this useful?•Hope NaNs help with debugging?•They contaminate: op(NaN,X) = NaN•OK if calculate but don’t use it•Ask math majors6CS61C L10 Fl. Pt. © UC RegentsSpecial Numbers (cont’d)°What have we defined so far? (Single Precision)?Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- infinity255 nonzero NaN7CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (1/2)°Problem: There’s a gap among representable FP numbers around 0•Smallest representable pos num:-a = 1.0… 2 * 2-127 = 2-127•Second smallest representable pos num:-b = 1.000……1 2 * 2-127 = 2-127 + 2-150•a - 0 = 2-127•b - a = 2-150ba0+-Gap!Gap!8CS61C L10 Fl. Pt. © UC RegentsRepresentation for Denorms (2/2)°Solution:•We still haven’t used Exponent = 0, Significand nonzero•Denormalized number: no leading 1•Smallest representable pos num:-a = 2-150 •Second smallest representable pos num:-b = 2-1490+-9CS61C L10 Fl. Pt. © UC RegentsRounding°When we perform math on real numbers, we have to worry about rounding°The actual math carries two extra bits of precision, and then round to get the proper value°Rounding also occurs when converting a double to a single precision value, or converting a floating point number to an integer10CS61C L10 Fl. Pt. © UC RegentsIEEE Rounding Modes°Round towards +infinity•ALWAYS round “up”: 2.001 -> 3 •-2.001 -> -2°Round towards -infinity•ALWAYS round “down”: 1.999 -> 1, •-1.999 -> -2°Truncate•Just drop the last bits (round towards 0)°Round to (nearest) even•Normal rounding, almost11CS61C L10 Fl. Pt. © UC RegentsRound to Even°Round like you learned in grade school°Except if the value is right on the borderline, in which case we round to the nearest EVEN number•2.5 -> 2•3.5 -> 4°Insures fairness on calculation•This way, half the time we round up on tie, the other half time we round down•Ask statistics majors°This is the default rounding mode12CS61C L10 Fl. Pt. © UC RegentsCasting floats to ints and vice versa°(int) exp•Coerces and converts it to the nearest integer•affected by rounding modes•i = (int) (3.14159 * f);°(float) exp•converts integer to nearest floating point•f = f + (float) i;13CS61C L10 Fl. Pt. © UC Regentsint -> float -> int°Will not always work°Large values of integers don’t have exact floating point representations°Similarly, we may round to the wrong valueif (i == (int)((float) i)) { printf(“true”);}14CS61C L10 Fl. Pt. © UC Regentsfloat -> int -> float°Will not always work°Small values of floating point don’t have good integer representations°Also rounding errorsif (f == (float)((int) f)) { printf(“true”);}15CS61C L10 Fl. Pt. © UC RegentsFloating Point Fallacy°FP Add, subtract associative: FALSE!•x = – 1.5 x 1038, y = 1.5 x 1038, and z = 1.0•x + (y + z) = –1.5x1038 + (1.5x1038 + 1.0)= –1.5x1038 + (1.5x1038) = 0.0•(x + y) + z = (–1.5x1038 + 1.5x1038) + 1.0= (0.0) + 1.0 = 1.0°Therefore, Floating Point add, subtract are not associative!•Why? FP result approximates real result!•This exampe: 1.5 x 1038 is so much larger than 1.0 that 1.5 x 1038 + 1.0 in floating point representation is still 1.5 x 103816CS61C L10 Fl. Pt. © UC RegentsAdministrivia°Need to catchup with Homework°Reading assignment: Reading 4.817CS61C L10 Fl. Pt. © UC RegentsJ-Format Instructions (1/5)°For branches, we assumed that we won’t want to branch too far, so we can specify change in PC.°For general jumps (j and jal), we may jump to anywhere in memory.°Ideally, we could specify a 32-bit memory address to jump to.°Unfortunately, we can’t fit both a 6-bit opcode and a 32-bit address into a single 32-bit word, so we compromise.18CS61C
View Full Document