UW AMATH 352 - Lecture 5: Floating Point Arithmetic

Unformatted text preview:

Lecture 5: Floating Point ArithmeticAMath 352Wed., Apr. 71 / 15Binary Representation and Base 2 ArithmeticMost computers today use binary or base 2 arithmetic. This isnatural since on/off gates can represent a 1 (on) or a 0 (off), andthese are the only two digits in base 2. In base 10, a naturalnumber is represented by a sequence of digits from 0 to 9, with theright-most digit representing 1’s (or 100’s), the next representing10’s (or 101’s), the next representing 100’s (or 102’s), etc. In base2, the digits are 0 and 1, and the right-most digit represents 1’s (or20’s), the next represents 2’s (or 21’s), the next 4’s (or 22’s), etc.2 / 15Binary Representation of Natural NumbersConsider the decimal number 27. To find its binary representation,first find the highest power of 2 that is less than or equal to 27;this is 24, so a 1 goes in the fifth position from the right of thenumber: 1 .Then subtract 24from 27 to find that the remainder is 11. Since23is less than 11, a 1 goes in the next position to the right: 11 .Subtracting 23from 11 leaves 3, which is less than 22, so a 0 goesin the next position: 110 . Since 21is less than 3, a 1 goes in thenext position, and since 3 − 21= 1, another 1 goes in theright-most position to give 27 = 110112.3 / 15Binary Representation of Natural NumbersConsider the decimal number 27. To find its binary representation,first find the highest power of 2 that is less than or equal to 27;this is 24, so a 1 goes in the fifth position from the right of thenumber: 1 .Then subtract 24from 27 to find that the remainder is 11. Since23is less than 11, a 1 goes in the next position to the right: 11 .Subtracting 23from 11 leaves 3, which is less than 22, so a 0 goesin the next position: 110 . Since 21is less than 3, a 1 goes in thenext position, and since 3 − 21= 1, another 1 goes in theright-most position to give 27 = 110112.3 / 15Binary Representation of Natural NumbersConsider the decimal number 27. To find its binary representation,first find the highest power of 2 that is less than or equal to 27;this is 24, so a 1 goes in the fifth position from the right of thenumber: 1 .Then subtract 24from 27 to find that the remainder is 11. Since23is less than 11, a 1 goes in the next position to the right: 11 .Subtracting 23from 11 leaves 3, which is less than 22, so a 0 goesin the next position: 110 . Since 21is less than 3, a 1 goes in thenext position, and since 3 − 21= 1, another 1 goes in theright-most position to give 27 = 110112.3 / 15Binary Arithmetic with Natural NumbersBinary arithmetic is carried out in a similar way to decimalarithmetic, except that when adding binary numbers one mustremember that 1 + 1 is 102. To add the two numbers 10 = 10102and 27 = 110112, we align their binary digits and do the additionas below:1 0 1 0+ 1 1 0 1 1− − − − − −1 0 0 1 0 1You can check that 1001012is equal to 37. Subtraction is similar,with borrowing from the next column being necessary whensubtracting 1 from 0. Multiplication and division follow similarpatterns.4 / 15Rational Numbers in Base 2Just as we represent rational numbers using decimal expansions,we can also represent them using binary expansions. The digits tothe right of the decimal point in base 10 represent 10−1’s (tenths),10−2’s (hundredths), etc., while those to the right of the binarypoint in base 2 represent 2−1’s (halves) 2−2’s (fourths), etc. Forexample, the fraction 11/2 is 5.5 in base 10, while it is 101.12inbase 2: one 22, one 20, and one 2−1.Not all rational numbers can be represented with finite decimalexpansions. The number 1/3, for example, is .333, with the barover the 3 meaning that this digit is repeated infinitely manytimes. The same is true for binary expansions, although thenumbers that require an infinite binary expansion may be differentfrom the ones that require an infinite decimal expansion.5 / 15Rational Numbers in Base 2Just as we represent rational numbers using decimal expansions,we can also represent them using binary expansions. The digits tothe right of the decimal point in base 10 represent 10−1’s (tenths),10−2’s (hundredths), etc., while those to the right of the binarypoint in base 2 represent 2−1’s (halves) 2−2’s (fourths), etc. Forexample, the fraction 11/2 is 5.5 in base 10, while it is 101.12inbase 2: one 22, one 20, and one 2−1.Not all rational numbers can be represented with finite decimalexpansions. The number 1/3, for example, is .333, with the barover the 3 meaning that this digit is repeated infinitely manytimes. The same is true for binary expansions, although thenumbers that require an infinite binary expansion may be differentfrom the ones that require an infinite decimal expansion.5 / 151/10 in Base 2For example, the number 1/10 = 0.1 in base 10 has the repeatingbinary expansion: 0.00011002. To see this, one can do binary longdivision in a similar way to base 10 long division:.0 0 0 1 1 0 0− − − − − − − − −1 0 1 0 / 1. 0 0 0 0 0 0 0 01 0 1 0− − − −1 1 0 01 0 1 0− − − −1 0 0 0 06 / 15Fixed Point RepresentationA computer word consists of a certain number of bits, which canbe either on (to represent 1) or off (to represent 0). Some earlycomputers used fixed point representation, where one bit is used todenote the sign of a number, a certain number of the remainingbits are used to store the part of the binary number to the left ofthe binary point, and the remaining bits are used to store the partto the right of the binary point. The difficulty with this system isthat it can store numbers only in a very limited range. If, say, 16bits are used to store the part of the number to the left of thebinary point, then the left-most bit represents 215, and numbersgreater than or equal to 216cannot be stored. Similarly, if, say, 15bits are used to store the part of the number to the right of thebinary point, then the right-most bit represents 2−15and nopositive number smaller than 2−15can be stored.7 / 15Floating Point RepresentationA more flexible system is floating point representation, which isbased on scientific notation. Here a number is written in the form±m × 2E, where 1 ≤ m < 2. Thus, the number 10 = 10102wouldbe written as 1.0102× 23, while110= 0.00011002would be writtenas 1.10011002× 2−4.The computer word consists of three fields: one for the sign, onefor the exponent E , and one for the significand m. A singleprecision word consists of 32 bits: 1


View Full Document

UW AMATH 352 - Lecture 5: Floating Point Arithmetic

Download Lecture 5: Floating Point Arithmetic
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 5: Floating Point Arithmetic and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 5: Floating Point Arithmetic 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?