Unformatted text preview:

EE 486 Winter 02 03 Bound on add EE 486 lecture 5 bounds on Arithmetic Winograd s bounds M J Flynn Computer Architecture Arithmetic Group 1 Stanford University Based on r d gate model The r d gate can compute any r input function in a d valued truth system in a single gate delay Model bounds number of fan in limited gate delays ignores fan out wires or any other constraint Computer Architecture Arithmetic Group 2 Stanford University Spira s lemma The r d gate r input lines d valued input line f r d gate If a d valued output f is a function of all n d valued input arguments then t the number of r d gate delays is determined t is greater than or equal to ceiling logr n in units of gate delays unit gate delay Computer Architecture Arithmetic Group 3 Stanford University Spira s lemma n input lines 5 Computer Architecture Arithmetic Group 4 Stanford University Winograd s bound on add f logic implemented in r d gates t gate delays Computer Architecture Arithmetic Group Stanford University The add bound is the application of Spira s lemma to the optimal rns representation So the time for addition is at least t which is greater than or equal to logr 2 logd N where imply ceiling functions For binary numbers the reduces to logr 2n for the addition of 2 n bit numbers Computer Architecture Arithmetic Group 6 Stanford University M J Flynn 1 EE 486 Winter 02 03 Bound can be for exactly N or M in the rns or for a representation that has equal or greater capacity The latter usually gives a better bound We call this M To find this we use the optimal rns algorithm and continue until the prime or prime power product is equal to or exceeds M For the bound the N is the largest modulus while logd N is simply the number of input lines needed to represent N For a prime base eg 2 212 212 and for a composite base select the largest base factor eg 10 or bi quinary 1012 512 Winograd s add bound Winograd s add bound Computer Architecture Arithmetic Group 7 Stanford University Winograd s add bound Despite its limitations the bound can be closely approached or even bettered using some sort of DOT function by avoiding the requirements of the r d gate Indeed the bound doesn t apply at all for a redundant number representation rnr we ll see more of this later Computer Architecture Arithmetic Group 9 Stanford University The log number sys a practical realization of the multiply bound X can be represented by a sign log X In lns Lx Sx interger Lx fract Lx this requires n 1 k f bits Sx is 0 if X is and 1 if X is negative X 1 Sx x 2 Lx of course the base need not be 2 To represent numbers smaller than 1 use a bias as in fpns Computer Architecture Arithmetic Group 11 Stanford University Computer Architecture Arithmetic Group 8 Stanford University Bound on Multiply Represent numbers as composite of prime factors or powers of prime factors This is the best representation for numbers that are to be multiplied or divided Just add subtract the corresponding prime factor exponent Computer Architecture Arithmetic Group 10 Stanford University More on the log number system To represent numbers less than 1 0 use a bias so Lx Sx iLx fLx bias where the bias would be 2k 1 or 2k 1 1 same as fpns Now multiply and divide become easy LXY LX LY and LX Y LX LY of course add subtract now are much more complicated X Y X 1 Y X which requires a table lookup for the 1 Y X Computer Architecture Arithmetic Group 12 Stanford University M J Flynn 2 EE 486 More on the log number system Net the lns is interesting only in special applications signal processing etc and usually is restricted to numbers with limited precision Computer Architecture Arithmetic Group 13 Stanford University Bound on Multiply Note that N N for all N In particular Binary numbers 2n is 2n 2 Prime base pn max pn 2 pn 1 Composite base base is p1 x p2 max pin Computer Architecture Arithmetic Group 15 Stanford University Winter 02 03 Back to Winograd s bound on Multiply Similar reasoning to the add bound but uses a log or exponential representation for arguments Result is surprising since it shows that the multiply delay bound is always as good as or better than the add delay logr 2 logd N where imply ceiling functions Computer Architecture Arithmetic Group 14 Stanford University Bounds Bounds on add multiply use different representations non compatible Bound on add can be used as bound on multiply on the optimal rns All in all it s hard to beat binary Computer Architecture Arithmetic Group 16 Stanford University We can develop a fan in limited gate model for tables This is NOT a bound but serves as an indicator for comparisons Assume a 2 D storage array with a unit delay for storage itself This array is addressed by n address bits n 2 address the X decoder and n 2 address the Y decoder The bounds don t apply In rnr the carry is usually limited to one digit So its always fixed something like logr 2 but the actual radix and the redundant radix may differ so it s a bit more complicated What about table look up What about redundant numbers Computer Architecture Arithmetic Group 17 Stanford University Computer Architecture Arithmetic Group 18 Stanford University M J Flynn 3 EE 486 Winter 02 03 We can overlap the X and Y decode then the Y line selects a single gate which is then ORed as before Now the delay is 2 logr n 2 1 1 logr2n 2 The first term is the X and Y decode next the store gate then the Y line select then the OR logic The X lines select a row all 2n 2 elements in the row are accessed The Y decoder selects the correct output line All 2n 2 Y lines are ORed to an output So X decode logr n 2 Y decode logr n 2 1 ORing the 2n 2 Y output lines logr 2n 2 We could sum these terms but a better model is available Table look up Table look up Computer Architecture Arithmetic Group 19 Stanford University Computer Architecture Arithmetic Group 20 Stanford University Again it s hard to beat binary but there are special application where either the residue or the log number systems work well As we ll see tables and the redundant number representation will play a role in getting the best in arithmetic design For our table model it s pretty clear that specific logic implementations will give better smaller gate delays There are exceptions when the operand sizes are small We ll use tables in divide square root and the higher level functions these will all tend to be small starter tables Note that a better table model …


View Full Document

Stanford EE 486 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?