Unformatted text preview:

inst eecs berkeley edu cs61c CS61C Machine Structures Lecture 15 Floating Point I 2004 02 23 TA Danny Krause inst eecs berkeley edu cs61c td This day in history 1455 Publication of the Gutenberg Bible 1998 Netscape founds Mozilla org CS 61C L15 Floating Point I 1 Krause Spring 2005 UCB Quote of the day 95 of the folks out there are completely clueless about floating point James Gosling Sun Fellow Java Inventor 1998 02 28 CS 61C L15 Floating Point I 2 Krause Spring 2005 UCB Review of Numbers Computers are made to deal with numbers What can we represent in N bits Unsigned integers 0 to 2N 1 Signed Integers Two s Complement 2 N 1 to 2 N 1 1 CS 61C L15 Floating Point I 3 Krause Spring 2005 UCB Other Numbers What about other numbers Very large numbers seconds century 3 155 760 00010 3 1557610 x 109 Very small numbers atomic diameter 0 0000000110 1 010 x 10 8 Rationals repeating pattern 2 3 0 666666666 Irrationals 21 2 1 414213562373 Transcendentals e 2 718 3 141 All represented in scientific notation CS 61C L15 Floating Point I 4 Krause Spring 2005 UCB Scientific Notation in Decimal mantissa exponent 6 0210 x 1023 decimal point radix base Normalized form no leadings 0s exactly one digit to left of decimal point Alternatives to representing 1 1 000 000 000 Normalized 1 0 x 10 9 Not normalized 0 1 x 10 8 10 0 x 10 10 CS 61C L15 Floating Point I 5 Krause Spring 2005 UCB Scientific Notation in Binary mantissa exponent 1 0two x 2 1 binary point radix base Computer arithmetic that supports it called floating point because it represents numbers where the binary point is not fixed as it is for integers Declare such variable in C as float CS 61C L15 Floating Point I 6 Krause Spring 2005 UCB Floating Point Representation 1 2 Normal format 1 xxxxxxxxxxtwo 2yyyytwo Multiple of Word Size 32 bits 31 30 23 22 S Exponent 1 bit 8 bits Significand 23 bits 0 S represents Sign Exponent represents y s Significand represents x s Represent numbers as small as 2 0 x 10 38 to as large as 2 0 x 1038 CS 61C L15 Floating Point I 7 Krause Spring 2005 UCB Floating Point Representation 2 2 What if result too large 2 0x1038 Overflow Overflow Exponent larger than represented in 8 bit Exponent field What if result too small 0 2 0x10 38 Underflow Underflow Negative exponent larger than represented in 8 bit Exponent field How to reduce chances of overflow or underflow CS 61C L15 Floating Point I 8 Krause Spring 2005 UCB Double Precision Fl Pt Representation Next Multiple of Word Size 64 bits 31 30 20 19 S Exponent Significand 1 bit 11 bits 20 bits Significand cont d 32 bits 0 Double Precision vs Single Precision C variable declared as double Represent numbers almost as small as 2 0 x 10 308 to almost as large as 2 0 x 10308 But primary advantage is greater accuracy due to larger significand CS 61C L15 Floating Point I 9 Krause Spring 2005 UCB QUAD Precision Fl Pt Representation Next Multiple of Word Size 128 bits Unbelievable range of numbers Unbelievable precision accuracy This is currently being worked on The current version has 15 bits for the exponent and 112 bits for the significand Oct Precision That s just silly It s been implemented before CS 61C L15 Floating Point I 10 Krause Spring 2005 UCB IEEE 754 Floating Point Standard 1 4 Single Precision DP similar Sign bit 1 means negative 0 means positive Significand To pack more bits leading 1 implicit for normalized numbers 1 23 bits single 1 52 bits double always true Significand 1 for normalized numbers Note 0 has no leading 1 so reserve exponent value 0 just for number 0 CS 61C L15 Floating Point I 11 Krause Spring 2005 UCB IEEE 754 Floating Point Standard 2 4 Kahan wanted FP numbers to be used even if no FP hardware e g sort records with FP numbers using integer compares Could break FP number into 3 parts compare signs then compare exponents then compare significands Wanted it to be faster single compare if possible especially if positive numbers Then want order Highest order bit is sign negative positive Exponent next so big exponent bigger Significand last exponents same bigger CS 61C L15 Floating Point I 12 Krause Spring 2005 UCB IEEE 754 Floating Point Standard 3 4 Negative Exponent 2 s comp 1 0 x 2 1 v 1 0 x2 1 1 2 v 2 1 2 0 1111 1111 000 0000 0000 0000 0000 0000 2 0 0000 0001 000 0000 0000 0000 0000 0000 This notation using integer compare of 1 2 v 2 makes 1 2 2 Instead pick notation 0000 0001 is most negative and 1111 1111 is most positive 1 0 x 2 1 v 1 0 x2 1 1 2 v 2 1 2 0 0111 1110 000 0000 0000 0000 0000 0000 2 0 1000 0000 000 0000 0000 0000 0000 0000 CS 61C L15 Floating Point I 13 Krause Spring 2005 UCB IEEE 754 Floating Point Standard 4 4 Called Biased Notation where bias is number subtract to get real number IEEE 754 uses bias of 127 for single prec Subtract 127 from Exponent field to get actual value for exponent 1023 is bias for double precision Summary single precision 31 30 23 22 S Exponent 1 bit 8 bits 0 Significand 23 bits 1 S x 1 Significand x 2 Exponent 127 Double precision identical except with exponent bias of 1023 CS 61C L15 Floating Point I 14 Krause Spring 2005 UCB Father of the Floating point standard IEEE Standard 754 for Binary Floating Point Arithmetic 1989 ACM Turing Award Winner Prof Kahan www cs berkeley edu wkahan ieee754status 754story html CS 61C L15 Floating Point I 15 Krause Spring 2005 UCB Administrivia Midterm in 2 weeks Midterm 1 LeConte Mon 2004 03 07 7 10pm Conflicts DSP Email Head TA Andy cc Dan How should we study for the midterm Form study groups don t prepare in isolation Attend the review session 2004 03 06 2pm in 10 Evans Look over HW Labs Projects Write up your 1 page study sheet handwritten Go over old exams HKN office has put them online link from 61C home page CS 61C L15 Floating Point I 16 Krause Spring 2005 UCB Upcoming Calendar Week 6 This week Mon Wed Thurs Lab Fri Holiday Floating Pt I Floating Pt Floating Pt II Running Program Running Program Running Program MIPS inst Next week Format III 7 Digital 8 Systems Midterm Midterm week 7pm CS 61C L15 Floating Point I 17 State Elements Finite State Machines Comb Logic Midterm grades out Krause Spring 2005 UCB Understanding the Significand 1 2 Method 1 Fractions In decimal 0 34010 34010 100010 3410 10010 In binary 0 1102 1102 10002 610 810 112 1002 310 410 Advantage less purely numerical more thought oriented this method usually helps people understand the meaning …


View Full Document

Berkeley COMPSCI 61C - Lecture Notes

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?