Data RepresentationsData RepresentationsComputers store data as different variable types, e.g. integer, floating point, complex, etc.Different machines have different wordlengths, e.g. 4-byte ints on a 32-bit machine (Pentium), 8-byte ints on a 64-bit machine (Alpha).This makes (binary) data non-portable.IntegersIntegersAll data types represented by 0's and 1's.An integer value:N = # of bits in wordsi= value of bit i in binary string se.g. 0 0 0 0 0 1 1 0 = 22 + 21 = 6 for 8-bit word.Use "two's complement" method for sign.=∑=× −Integers, Cont'dIntegers, Cont'dLargest value that can be represented is 2N - 1.For 32-bit word this is 4,294,967,295.Arithmetic with integers is exact, except:When division results in remainder.Result exceeds largest representable integere.g. 2 × 109 + 3 × 109 = overflow errorNote multiplication by 2's can be achieved by left-shift, which is very fast (in C: "<<" operator).Two's ComplementTwo's ComplementExploits finite size of data representations (cyclic groups) and properties of binary arithmetic.To get negative of binary number, invert all bits and add 1 to the result.e.g. 1 = 0 0 0 0 0 0 0 1 in 8-bitinvert bits: 1 1 1 1 1 1 1 0add 1: 0 0 0 0 0 0 0 1result: 1 1 1 1 1 1 1 1 = -1In 8 bits, signed char ranges from -128 to +127.Negative Powers of 2Negative Powers of 2Binary notation can be extended to cover negative powers of 2, e.g. "110.101" is:1 × 22 + 1 × 21 + 1 × 2-1 + 1 × 2-3 = 6.625Can represent real numbers by specifying some location in the word as the "binary point" → fixed-point representation.In practice, use some bits for an exponent → floating-point representation.FloatsFloatsFor most machines these days, real numbers are represented by floating-point format:s = sign B = base (usually 2, sometimes 16)M = mantissa e = exponentE = bias, usually 127.In past, manufacturers used different number of bits for each of M and e → non-portable code.=× ×−Floats, Cont'dFloats, Cont'dCurrently, most manufacturers adopt IEEE standard:s = 1st bitNext 8 bits are eLast 23 bits are M, expressed as a binary fraction, either 1.F, or, if e=0, 0.F, where F is in base 2.Largest single-precision float fmax =2127≈ 1038.Smallest (and least precise!) fmin = 2-149 ≈ 10-45.Round-off ErrorRound-off ErrorNot all values along real axis can be represented.There are 1038 integers between fmin & fmax, but only 232 ≈ 109 bit patterns.Values < |10-45| result in "underflow" error.If value cannot be represented, next nearest value is produced. Difference between desired and actual value is called "round-off error" (RE).Round-off Error, Cont'dRound-off Error, Cont'dSmallest value em for which 1 + em > 1 is called "machine accuracy", typically ∼10-7 for 32 bits.Double precision greatly reduces em (~ 10-16).RE accumulates in a calculation:Random walk: total error N1/2 em after N operations.But algorithms rarely random → linear error N em.Round-off Error, Cont'dRound-off Error, Cont'dSubtraction of two very nearly equal numbers can give rise to large RE.e.g. Solution of quadratic equation......can go badly wrong whenever ac << b2 (Cf. PS#2).RE cannot be avoidedit is a consequence of using a finite number of bits to represent real values.=−±− Truncation ErrorTruncation ErrorIn practice, most numerical algorithms approxi-mate desired solution with a finite number of artithmetic operations.e.g. evaluating integral by quadrature summing series using finite number of termsDifference between true solution and numerical approximation to solution is called "truncation error" (TE).Truncation Error, Cont'dTruncation Error, Cont'dTE exists even on "perfect" machine with no RE.TE is under programmer's control; much effort goes into reducing it.Usually RE and TE do not interact.Sometimes TE can amplify RE until it swamps calculation. Solution is then called unstable.e.g. Integer powers of Golden Mean (Cf.
View Full Document