U of I CS 498 - Floating Point Arithmetic - D2479223

Home> Schools> University of Illinois> Computer Science (CS) > CS 498> Floating Point Arithmetic

DOC PREVIEW

U of I CS 498 - Floating Point Arithmetic

School name University of Illinois

Course Cs 498- Special Topics

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Floating Point ArithmeticMaría Jesús GarzaránCS 498: Program OptimizationFall 2007University of Illinois at Urbana-Champaign 2Floating-point numbers Standard way to represent and work with non-integernumbers in a digital computer Gives the illusion of working with real numbers in amachine that only works with a finite set of numbers– Most times it is Ok.– However, we can have surprises.2 3Problems Rounding errors in the representation of the parameters– Most real numbers are not exactly representable in floating-point format. Inparticular, most of the simple-looking base-10 numbers such as 0.1, 0.2, 0.3… are not exactly representable. This means that we have rounding errorsfrom the moment we enter these numbers in the computer. Imprecision in calculation– The result of an operation, even when the operands are exactlyrepresentable of floating point numbers, can be an unrepresentablenumber.– Two sources of error in X op Y:• Representation of X and Y as a floating point number• Representation of the result X op Y as a floating point number 4As a result … Floating point operations lack several math properties.– Floating point operations are commutative– They are not associative or distributive:• Floating point addition is not associative• Floating point multiplication is not associative• The distributive law between multiplication and additiondoes not necessarily hold.– If a + b = a, then b is 0, may not true with floating pointoperations3 5Floating point addition is not associativeExample1 (0.1 + 0.2) + 0.3 ≠ 0.1 + (0.2 + 0.3)int main(int argc, char **argv){ double one = 0.1, two = 0.2, three = 0.3, six =0.6; double result1, result2; result1= ((one + two) + three); result2= (one + (two + three)); printf(”result1=%1.20f, result2=%1.20f\n",result1,result2); }result1=0.60000000000000008882,result2=0.59999999999999997780 ((one+ two) + three)) == six) is TRUE (one + (two + three)) == six) is FALSEThus, test if abs(((six - ((one + two) + three))) is <= epsilon) 6 The mantissae are added or subtracted (after shifting themantissa and increasing the exponent of the smallernumber, if necessary, to make the exponents agree). The final normalized result is obtained by rounding(after shifting the mantissa and adjusting the exponent, ifnecessary).– 3.12x101+4.26xl01=7.38x101 – 2.77x102+7.55x102=10.32x102, i.e., 1.03x103 – 6.18xl01+1.84 x l0-1=6.18x101+0.0184x101=6.1984x 101, i.e.,6.20x101 – 3.65x10-1-2.78x10-1=0.87x10-1, i.e., 8.70x10-2Floating point addition is not associative4 7Floating point addition is not associativeExample2 a=6.31x101, b=4.24x100, c=2.47x10-1, (a+b)+c=(6.31x101+0.424x101)+2.47x10-1, i.e.,6.73x101+0.0247x101, i.e., 6.75x101 a+(b+c)=6.31x101+(4.24x100+0.247x100), i.e.,6.31x101+4.49x100, i.e.,6.31x101+4.49x100, i.e.,6.31x101+0.449x101, i.e.,6.76x101.Examples involving adding many numbers of varying size indicatethat adding in order of increasing magnitude is preferable to adding inthe reverse order. 8No effect when adding/subs small numbersExample3:5.18x102+4.37x10-1=5.18x102+0.00437x102=5.18437x102, i.e.5.18x1025 9Another problemExample4The result of ax(1/a) is not 1;a = 3.00xl00 1/a is 3.33x10-1 ax(1/a) is 9.99x10-1,whence the multiplicative inverse may not

View Full Document

U of I CS 498 - Floating Point Arithmetic

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

U of I CS 498 - Floating Point Arithmetic

Sign up for free to view:

Please select your school