1CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBLecturer NSOE Steven Kusaloinst.eecs.berkeley.edu/~cs61cCS61C : Machine StructuresLecture 16 – Floating Point II 2004-10-0620 years from now...1) We'll all have robot servants or...2) The world will be a smoking ruinCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBExample: Representing 1/3 in MIPS•1/3 = 0.33333…10= 0.25 + 0.0625 + 0.015625 + 0.00390625 + … = 1/4 + 1/16 + 1/64 + 1/256 + …= 2-2+ 2-4+ 2-6+ 2-8 + …= 0.0101010101… 2 * 20= 1.0101010101… 2* 2-2• Sign: 0• Exponent = -2 + 127 = 125 = 01111101• Significand = 0101010101…0 0111 1101 0101 0101 0101 0101 0101 010CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBRepresentation for ± ∞•In FP, divide by 0 should produce ± ∞, not overflow.•Why?• OK to do further computations with ∞ E.g., X/0 > Y may be a valid comparison• Ask math majors•IEEE 754 represents ± ∞• Most positive exponent reserved for ∞• Significands all zeroesCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBSpecial Numbers•What have we defined so far? (Single Precision)Exponent Significand Object00 00 nonzero ???1-254 anything +/- fl. pt. #255 0 +/- ∞255 nonzero ???•Professor Kahan had clever ideas; “Waste not, want not”• Exp=0,255 & Sig!=0 …CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBRepresentation for Not a Number•What is sqrt(-4.0)or 0/0?• If ∞ not an error, these shouldn’t be either.• Called Not a Number (NaN)• Exponent = 255, Significand nonzero• Why is this useful?• Hope NaNs help with debugging?• They contaminate: op(NaN, X) = NaNCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBRepresentation for Denorms (1/2)•Problem: There’s a gap among representable FP numbers around 0• Smallest representable pos num:a = 1.0… 2* 2-126= 2-126• Second smallest representable pos num:b = 1.000……1 2* 2-126= 2-126+ 2-149a - 0 = 2-126b - a = 2-149ba0+-Gaps!Normalization and implicit 1is to blame!2CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBRepresentation for Denorms (2/2)•Solution:• We still haven’t used Exponent = 0, Significand nonzero• Denormalized number: no leading 1, implicit exponent = -126.• Smallest representable pos num:a = 2-149• Second smallest representable pos num:b = 2-1480+-CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBOverview•Reserve exponents, significands:Exponent Significand Object00 00 nonzero Denorm1-254 anything +/- fl. pt. #255 0+/- ∞255 nonzero NaNCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBRounding•Math on real numbers ⇒ we worry about rounding to fit result in the significant field.•FP hardware carries 2 extra bits of precision, and rounds for proper value•Rounding occurs when converting…• double to single precision• floating point # to an integerCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBIEEE Four Rounding Modes•Round towards + ∞• ALWAYS round “up”: 2.1 ⇒ 3, -2.1 ⇒ -2•Round towards - ∞• ALWAYS round “down”: 1.9 ⇒ 1, -1.9 ⇒ -2•Truncate• Just drop the last bits (round towards 0)•Round to (nearest) even (default)• Normal rounding, almost: 2.5 ⇒ 2, 3.5 ⇒ 4• Like you learned in grade school• Insures fairness on calculation• Half the time we round up, other half downCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBInteger Multiplication (1/3)•Paper and pencil example (unsigned):Multiplicand 1000 8Multiplier x10019100000000000+1000 01001000• m bits x n bits = m + n bit productCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBInteger Multiplication (2/3)•In MIPS, we multiply registers, so:• 32-bit value x 32-bit value = 64-bit value•Syntax of Multiplication (signed):• mult register1, register2• Multiplies 32-bit values in those registers & puts 64-bit product in special result regs:- puts product upper half in hi, lower half in lo• hi and lo are 2 registers separate from the 32 general purpose registers• Use mfhi register & mflo register to move from hi, lo to another register3CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBInteger Multiplication (3/3)•Example:• in C: a = b * c;• in MIPS:- let b be $s2; let c be $s3; and let a be $s0 and $s1 (since it may be up to 64 bits)mult $s2,$s3 # b*cmfhi $s0 # upper half of # product into $s0mflo $s1 # lower half of# product into $s1•Note: Often, we only care about the lower half of the product.CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBInteger Division (1/2)•Paper and pencil example (unsigned):1001 Quotient Divisor 1000|1001010 Dividend-1000101011010-100010 Remainder(or Modulo result)• Dividend = Quotient x Divisor + RemainderCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBInteger Division (2/2)• Syntax of Division (signed):• div register1, register2• Divides 32-bit register 1 by 32-bit register 2: • puts remainder of division in hi, quotient in lo• Implements C division (/) and modulo (%)• Example in C:a = c / d;b = c % d;• in MIPS: a↔$s0;b↔$s1;c↔$s2;d↔$s3div $s2,$s3 # lo=c/d, hi=c%dmflo $s0 # get quotientmfhi $s1 # get remainderCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBUnsigned Instructions & Overflow•MIPS also has versions of mult, div for unsigned operands:multudivu• Determines whether or not the product and quotient are changed if the operands are signed or unsigned.•MIPS does not check overflow on ANY signed/unsigned multiply, divide instr• Up to the software to check hiCS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBFP Addition & Subtraction• Much more difficult than with integers(can’t just add significands)• How do we do it?• De-normalize to match larger exponent• Add significands to get resulting one• Normalize (& check for under/overflow)• Round if needed (may need to renormalize)• If signs ≠, do a subtract. (Subtract similar)• If signs ≠ for add (or = for sub), what’s ans sign?• Question: How do we integrate this into the integer arithmetic unit? [Answer: We don’t!]CS 61C L16 : Floating Point II (1)Kusalo, Spring 2005 © UCBMIPS Floating Point Architecture (•Separate floating point instructions:• Single Precision:add.s, sub.s, mul.s, div.s• Double Precision:add.d, sub.d, mul.d, div.d•These are far more complicated than their integer counterparts• Can take
View Full Document