1EE241 - Spring 2005Advanced Digital Integrated CircuitsLecture 23:Multipliers2Two Level Carry-Select Adder23Conditional Sum Adders4TG Conditional SumConditional CellConditional Sum Adder2-way MUXesRothermel, JSSC 8935Carry-Lookahead AddersAdder treesRadix of a treeMinimum depth treesSparse treesLogic manipulationsConventional vs. LingStack height limiting6Propagate and Generate SignalsDefine 3 new variables that ONLY depend on ai, biGenerate (gi) = aibiPropagate (pi) = ai+bi(could be XOR as well)Delete = aibiCan also derive expressions for sand coutbased on diand pi()iniiiiniiiioutcgpgscpgpgc⊕=+=),(,47A0,B0A1,B1AN-1,BN-1...Ci,0P0Ci,1P1Ci,N-1PN-1...Carry Lookahead AdderWeinberger, Smith, 1958.8Lookahead Adder1−+=iiiicpgcLooakahead Equations()1111111111−+++−+++++++=++=+=iiiiiiiiiiiiiiicppgpgcpgpgcpgcPosition i:Position i + 1:Carry exists if:- generated in stage i + 1- generated in stage i and propagated through i + 1- propagated through both i and i + 159Lookahead Adder• Unrolling of carry recurrence can be continued• If unrolled to level k, resulting in two-level AND-OR structure• AND Fan-In = k + 1, OR Fan-In = k + 1• k + 1 transistors in the MOS stack• Limits k to 2 – 4 • Later referred to as a radix of an adder10Lookahead AdderVDDP3P2P1P0G3G2G1G0Ci,0Co,3Mirror Implementation611Block Lookahead11231231232334−+++++++++++++++++=iiiiiiiiiiiiiiiicppppgpppgppgpgcFourth bit carry:iiiiiiiiiiiigpppgppgpgG1231232333, +++++++++++++=iiiiiippppP1233, ++++=13,3,4 −++++=iiiiiicPGcBlock generate and block propagate:12Block LookaheadCan create groups of groups, or ‘super-groups’:jjjjjjjjjjjjGPPPGPPGPGG123123233*:3 +++++++++++++=jjjjjjpPPPP123*:3 ++++=Delay is ⎡⎤Nctdlog1=713Block LookaheadFrom Oklobdzija14Carry Lookahead TreesCo0,G0P0Ci0,+=Co1,G1P1G0P1P0Ci0,++=Co2,G2P2G1P2P1G0P+2P1P0Ci0,++=G2P2G1+()=P2P1()G0P0Ci0,+()+G2:1P2:1Co0,+=Can continue building the tree hierarchically.815Tree AdderslmGppP ⋅=lmmGgpgG ⋅+=m – more significantl – less significant Start from the input P, G, and continue up the tree2-bit groups, then 4-bit groups, …()()( )lmlmmllmmppgpgpgpgpg ⋅⋅+=•= ,,,),(Kogge, Stone, Trans on Comp,’73Radix 216Tree Adders: Radix 216-bit radix-2 Kogge-Stone Tree(A0, B0)(A1, B1)(A2, B2)(A3, B3)(A4, B4)(A5, B5)(A6, B6)(A7, B7)(A8, B8)(A9, B9)(A10, B10)(A11, B11)(A12, B12)(A13, B13)(A14, B14)(A15, B15)S0S1S2S3S4S5S6S7S8S9S10S11S12S13S14S15917Tree Adders: Radix 4(a0, b0)(a1, b1)(a2, b2)(a3, b3)(a4, b4)(a5, b5)(a6, b6)(a7, b7)(a8, b8)(a9, b9)(a10, b10)(a11, b11)(a12, b12)(a13, b13)(a14, b14)(a15, b15)S0S1S2S3S4S5S6S7S8S9S10S11S12S13S14S1516-bit radix-4 Kogge-Stone Tree18Sparse Trees(a0, b0)(a1, b1)(a2, b2)(a3, b3)(a4, b4)(a5, b5)(a6, b6)(a7, b7)(a8, b8)(a9, b9)(a10, b10)(a11, b11)(a12, b12)(a13, b13)(a14, b14)(a15, b15)S1S3S5S7S9S11S13S15S0S2S4S6S8S10S12S1416-bit radix-2 sparse tree with sparseness of 2 (Han-Carlson)1019Full vs. Sparse TreesSparse trees have less transistors, wiresLess powerLess input loadingRecovering missing carriesRipple (extra gate delay)Precompute (extra fanout)Complex precompute can get into the critical pathAdder Delay [FO4]Total Transistor Width [unit width/bit]30040050060070080090010007 9 11 13 15 17Radix-4 Kogge-StoneRadix-4 2-SparseRadix-4 4-Sparse-23.3%20Tree Adders: Other TreesLadner-Fischer(A0, B0)(A1, B1)(A2, B2)(A3, B3)(A4, B4)(A5, B5)(A6, B6)(A7, B7)(A8, B8)(A9, B9)(A10, B10)(A11, B11)(A12, B12)(A13, B13)(A14, B14)(A15, B15)S0S1S2S3S4S5S6S7S8S9S10S11S12S13S14S151121Ling AdderVariation of CLALing, IBM J. Res. Dev, 5/811−⋅+=iiiiGpgG1−⊕=iiiGpSiiibap ⊕=iiibag ⋅=11 −−⋅+=iiiiHtgH11 −−+⊕=iiiiiiHtgHtSiiibat +=iiibag ⋅=Ling’s equations22Ling Adder1−⋅+=iiiiGpgG1−⋅+=iiiiGtgG11 −−⋅+=iiiiGtgHLing’s equation shifts the index ofpseudo carryDoran, Trans on Comp 9/88Propagates informationon two bitsConventional CLA:Also:1223Ling Adder01231232333gtttgttgtgG +++=012122300121122233gttgtgggtttgttgtgH+++=+++=Conventional radix-4Ling radix-4Reduces the stack height (or width)Reduces input loading24Ling vs. CLA101520253035404550556067891011Delay [FO4]Energy [pJ]R2 LingR2 CLAR4 LingR4 CLAR. Zlatanovici, ESSCIRC’031325Static vs. Dynamic81318232833385 7 9 11 13 15Delay [FO4]Energy [pJ]Compound Domino R2Domino R2Domino R4Static R226Stack Height LimitingTransform conventional G, PPark, VLSI Circ’001427HP AdderNaffziger, ISSCC’9601234ppppi =28HP Adder – Differential DominoCarry rippleSum select1529Hybrid AddersDobberpuhl, JSSC 11/92DEC Aplha 2106430DEC AdderCombination:8-bit tapered pre-discharged Manchester carry chains, with Cin= 0 and Cin= 132-bit LSB carry-lookahead32-bit MSB conditional sum adderCarry-select on most significant bitsLatch-based timing1631ZX··Y×Zk2kk0=MN1–+∑==Xi2ii0=M1–∑⎝⎠⎜⎟⎜⎟⎜⎟⎛⎞Yj2jj0=N1–∑⎝⎠⎜⎟⎜⎟⎜⎟⎛⎞=XiYj2ij+j0=N1–∑⎝⎠⎜⎟⎜⎟⎜⎟⎛⎞i0=M1–∑=XXi2ii0=M1–∑=YYj2jj0=N1–∑=withBinary Multiplication321 0 1 11 0 1 0 1 00 0 0 0 0 01 0 1 0 1 01 0 1 0 1 01 0 1 0 1 0×1 1 1 0 0 1 1 1 0+Partial ProductsAND operationBinary MultiplicationN+ M bits in the final sumN bitsM bits1733Shift-and-Add MultiplierStandard adder and shift-in the multiplicandShift the result as well and addN cyclesParallel adders add more hardware (adders) instead.34HA FA FA HAFA FA FA HAFA FA FA HAX0X1X2X3Y1X0X1X2X3Y2X0X1X2X3Y3Z1Z2Z3Z4Z5Z6Z0Z7Array Multiplier1835HA FA FA HAHAFAFAFAFAFA FA HACritical Path 1Critical Path 2Critical Path 1 & 2MxN Array Multiplier— Critical Path36HA HA HA HAFAFAFAHAFAHA FA FAFAHA FA HAVector Merging AdderCarry-Save Multiplier1937SCSCSCSCSCSCSCSCSCSCSCSCSCSCSCSCZ0Z1Z2Z3Z4Z5Z6Z7X0X1X2X3Y1Y2Y3Y0Vector Merging CellHA Multiplier CellFA Multiplier CellX and Y signals are broadcastedthrough the complete array.( )Multiplier Floorplan38MultipliersPartial product generationPartial product accumulationFinal summation2039Generating Partial ProductsAll partial products: ANDBooth’s recoding – reduction of partial product countX7PP7X6PP6X5PP5X4PP4X3PP3X2PP2X1PP1X0PP040Booth RecodingInstead of generating all the partial products0 * x = 0 1 * x = x x={0,1}Reduce the number of partial productsby grouping0 0 00 1 1*1 0 2* (shift)1 1 3* (or 4* -1)Booth’512141Booth RecodingInstead of using set {0, 1*Y, 2*Y, 3*Y}Use {0, 1*Y, 2*Y, 4*Y, -Y}Shifting and complementing3*Y = 4*Y – YCan be simplified by looking into three bits –modified Booth recoding42Modified Booth
View Full Document