UCD EEC 180A - High-Speed VLSI Arithmetic Units: Adders and Multipliers

Unformatted text preview:

High-Speed VLSI Arithmetic Units: Adders and Multipliers* by Prof. Vojin G. Oklobdzija *Book Chapter taken from: "Design of High Performance Microprocessors Circuits", A. Chandrakasan, W. Bowhill, F Fox, Editors, IEEE Press, 2000.Oklobdzija: HIGH-SPEED VLSI ARITHMETIC UNITS: ADDERS AND MULTIPLIERS Introduction Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware [1-4]. Given that the hardware can only perform a relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. Since ultimately, speed, power and chip area are the most often used measures of the efficiency of an algorithm, there is a strong link between the algorithms and technology used for its implementation. High-speed Addition: Algorithms and VLSI Implementation: First we will examine a realization of a one-bit adder which represents a basic building block for all the more elaborate addition schemes. Full Adder: Operation of a Full Adder is defined by the Boolean equations for the sum and carry signals: iiiiiiiiiiiiiiiicbacbacbacbacbas ⊕⊕=+++= iiiiiiiiiiiiicbacbacbacbac +++=+1 Where: ai, bi, and ci are the inputs to the i-th full adder stage, and si and ci+1 are the sum and carry outputs from the i-th stage, respectively. From the above equation we realize that the realization of the Sum function requires two XOR logic gates. The Carry function is further rewritten defining the Carry-Propagate pi and Carry-Generate gi terms: iiibap⊕= , iiibag•= At a given stage i, a carry is generated if gi is true (i.e., both ai and bi are ONEs), and if pi is true, a stage propagates an input carry to its output (i.e., either ai or bi is a ONE). The logical implementation of the full adder is shown in Fig. 1.a. ci+1cisiaibi01sbiaici+1sici(a.)(b.)pigipi Fig. 1.a.b. Full-Adder implementation (a) regular (b) using multiplexer in the critical path 11:18 PM September 13, 1999 2Oklobdzija: HIGH-SPEED VLSI ARITHMETIC UNITS: ADDERS AND MULTIPLIERS For this implementation, the delay from either a or bi to si is two XOR delays and the delay from ci to ci+1 is 2 gate delays. Some technologies, such as CMOS, implement the functions more efficiently by using pass-transistor circuits. For example, the critical path of the carry-in to carry-out uses a fast pass-transistor multiplexer [8] in an alternative implementation of the Full Adder shown in Fig.1.b. The ability of pass-transistor logic to provide an efficient multiplexer implementation has been exploited in CPL and DPL logic families [10,11]. Even an XOR gate is more efficiently implemented using multiplexer topology. A Full-Adder cell which is entirely multiplexer based as published by Hitachi [11] is shown in Fig.2. Such a Full-Adder realization contains only two transistors in the Input-to-Sum path and only one transistor in the Cin-to-Cout path (not counting the buffer). The short critical path is a factor that contributes to a remarkable speed of this implementation. AiVCCSiSiXOR/XNOR MULTIPLEXERBUFFERCiCiMULTIPLEXERVCCCi+1Ci+1BUFFERVCCVCCOR/NORAND/NANDAiAiAiBiBiBiBiAiAiBiBi Fig.2. Pass-Transistor realization of a Full-Adder in DPL [11] 11:18 PM September 13, 1999 3Oklobdzija: HIGH-SPEED VLSI ARITHMETIC UNITS: ADDERS AND MULTIPLIERS Ripple Carry Adder: A ripple carry adder for N-bit numbers is implemented by concatenating N full adders as shown on Figure 3. At the i-th bit position, the i-th bits of operands A and B and a carry signal from the preceding adder stage are used to generate the i-th bit of the sum, s, and a carry, c , to the next adder stage. This is called a Ripple Carry Adder (RCA), since the carry signal “ripple” from the least significant bit position to the most significant [3,4]. If the ripple carry adder is implemented by concatenating N full adders, the delay of such an adder is 2N gate delays from C-to-C . i i+1in out The path from the input to the output signal that is likely to take the longest time is designated as a "critical path". In the case of a RCA, this is the path from the least significant input a0 or b0 to the last sum bit sn. Assuming a multiplexer based XOR gate implementation, this critical path will consist of N+1 pass transistor delays. However, such a long chain of transistors will significantly degrade the signal, thus some amplification points are necessary. In practice, we can use a multiplexer cell to build this critical path using standard cell library as shown in Fig.3 [8]. ai+1bi+1aibiai+2bi+2coutci+1cisisi+1si+2cin Fig. 3. Carry-Chain of an RCA implemented using multiplexer from the standard cell library [8] Carry Skip Adder: Since the Cin-to-Cout represents the longest path in the ripple-carry-adder an obvious attempt is to accelerate carry propagation through the adder. This is accomplished by using Carry-Propagate pi signals within a group of bits. If all the pi signals within the group are pi = 1, the condition exist for the carry to bypass the entire group: 121......−+++••••=kiiiippppP The Carry Skip Adder (CSKA) divides the words to be added into groups of equal size of k-bits. The basic structure of an N-bit Carry Skip Adder is shown on Fig. 4. Within the group, carry propagates in a ripple-carry fashion. In addition, an AND gate is used to form the group 11:18 PM September 13, 1999 4Oklobdzija: HIGH-SPEED VLSI ARITHMETIC UNITS: ADDERS AND MULTIPLIERS propagate signal P. If P = 1 the condition exists for carry to bypass (skip) over the group as shown in Fig.4. CinGr-1Gr-2...SN-kSN-1aN-1bN-1bN-k-1aN-k-1S(r-1)k-1S(r-2)kG1Go...SkS2k-1a2k-1b2k-1bkakSk-1S0......a(r-1)kb(r-1)ka(r-1)kb(r-1)k...ak-1bk-1a0b0............... .........Pr-1Pr-2P1P0Cout++++ANDOROROR ORANDANDANDcritical path, delay ∆=2(k-1)+(N/k-2) Fig.4. Basic Structure of a CSKA: N-bits, k-bits/group, r=N/k groups The maximal delay ∆ of a Carry Skip Adder is encountered when carry signal is generated in the least-significant bit position, rippling through k-1 bit positions, skipping over N/k-2 groups in the middle, rippling to the k-1 bits of most significant group and being assimilated in the Nth bit position to produce the sum SN: SKIPrcarcaSKIPrcaCSAkNkkkNk ∆−+∆−=∆−+∆−+∆−=∆ )2()1(2)1()2()1( Thus, CSKA is faster than RCA at the expense of a few


View Full Document

UCD EEC 180A - High-Speed VLSI Arithmetic Units: Adders and Multipliers

Download High-Speed VLSI Arithmetic Units: Adders and Multipliers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view High-Speed VLSI Arithmetic Units: Adders and Multipliers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view High-Speed VLSI Arithmetic Units: Adders and Multipliers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?