1EE141 – Fall 2005Lecture 19Adders (Cont.)Adders (Cont.)MultipliersMultipliersEE141 2Administrative Stuff Homework 7 due today Midterm 2 material• Wires• Logic gates• Logical effort• Adders Review session on Tue Nov 8North Gate Hall, Room 105, 6:30-8:30pm2EE141 3Class Material Last lecture• Adders Today’s lecture• Adders (Cont.)• Multipliers and other arithmetic• Intro to powerAdders (Cont.)Adders (Cont.)3EE141 5Carry Look-AheadSumi= Ai⊕ Bi⊕ Carryi-1Carryi= Ai·Bi+ (Ai+ Bi)·Carryi-1Partial SumGenerate PropagateCarryi= Gi+ Pi·Carryi-1EE141 6Cok,fAkBkCok,1–,,()GkPkCok 1–,+==AN-1, BN-1A1, B1P1S1••••••SN-1PN-1Ci, N-1S0P0Ci,0Ci,1A0, B0Look-Ahead: Basic IdeaThe idea is to eliminatecarry rippling effect4EE141 7Cok,GkPkGk1–Pk1–Cok 2–,+()+=Cok,GkPkGk1–Pk1–…P1G0P0Ci0,+()+()+()+=Expanding Look-Ahead equations:All the way:Co,3Ci,0VDDP0P1P2P3G0G1G2G3Look-Ahead: TopologyImplementation issues:- long stack (N+1)- or multiple stagesÎ still linear delay!EE141 8A7FA6A5A4A3A2A1A0A0A1A2A3A4A5A6A7Ftp∼ log2(N)tp∼ NLogarithmic Look-Ahead AdderIdea: large stacks Î limit carry look-ahead to 2-4 bitsÎ organize carry P and G into recursive trees5EE141 9Carry Look-Ahead TreesCo0,G0P0Ci0,+=Co1,G1P1G0P1P0Ci0,++=Co2,G2P2G1P2P1G0P+2P1P0Ci0,++=G2P2G1+()=P2P1()G0P0Ci0,+()+G2:1P2: 1Co0,+=Can continue building the tree hierarchically...EE141 10GG=Gi+PiGi-1GP=PiPi-1Oddinput bitsEven input bits SumevenSumoddPG Gen.CM1 CM2 CM3 CM4CM5CM1 CM2 CM3 CM4CM5PG Gen.1 2 3 4 5 6 7 XORXORCourtesy:R. Krishnamurthy(Intel)High-Performance Adders: Kogge-Stone Tree Adder Generate all 32 carries• Full-blown binary tree ⇒ energy-inefficient # carry-merge stages = log2(32) ⇒ 5 stages6EE141 11Energy inefficientEnergy inefficient1235 4679 8101113 12141517 16181921 20222325 24262729 283031PGCarry-merge gatesXOR00Courtesy:R. Krishnamurthy (Intel) Critical path = PG + 5 + XOR = 7 gate stages Generate, Propagate FO of 2,3 Maximum interconnect spans 16bKogge-Stone AdderEE141 12Tree Adders16-bit radix-2 Kogge-Stone tree(A0, B0)(A1, B1)(A2, B2)(A3, B3)(A4, B4)(A5, B5)(A6, B6)(A7, B7)(A8, B8)(A9, B9)(A10, B10)(A11, B11)(A12, B12)(A13, B13)(A14, B14)(A15, B15)S0S1S2S3S4S5S6S7S8S9S10S11S12S13S14S157EE141 13Example: Domino AdderVDDClkPi= ai + biClkaibiVDDClkGi = aibiClkaibiPropagate GenerateEE141 14Example: Domino AdderVDDClkkPi:i-k+1Pi-k:i-2k+1Pi:i-2k+1VDDClkkGi:i-k+1Pi:i-k+1Gi-k:i-2k+1Gi:i-2k+1Propagate GenerateThe “dot” operator (carry-merge)8EE141 15Example: Domino SumVDDClkGi:0ClkSumVDDClkdClkGi:0ClkSi1ClkdSi0KeeperEE141 16Tree Adders(a0, b0)(a1, b1)(a2, b2)(a3, b3)(a4, b4)(a5, b5)(a6, b6)(a7, b7)(a8, b8)(a9, b9)(a10, b10)(a11, b11)(a12, b12)(a13, b13)(a14, b14)(a15, b15)S0S1S2S3S4S5S6S7S8S9S10S11S12S13S14S1516-bit radix-4 Kogge-Stone Tree9EE141 17Courtesy:R. Krishnamurthy(Intel) Generate every 4thcarry in parallel Side-path: 4-bit conditional sum generator 73% fewer carry-merge gates ⇒ energy-efficientSparse-Tree Adder ArchitectureEE141 18PGPGGGGG11GGGG77Static sum generatorStatic sum generatorSingleSingle--rail dynamic sparserail dynamic sparse--tree pathtree pathAdderAdderInputsInputsclk2clk2SumSum3131clk3clk3clkclkclkclkGGGG2727GGGG1515CM0CM0LatchLatchCM1CM1XORXORCC2727SumSum31_031_0SumSum31_131_1GGGG33Courtesy:R. Krishnamurthy(Intel)Adder Core Critical Path Critical path: 7 gates Æ same as KS Sparse-tree: single-rail dynamic Exploit non-criticality of sum generator Convert to static logic Æ semi-dynamic design10EE141 19Courtesy:R. Krishnamurthy(Intel)Sparse-Tree Architecture Performance impact: 20% speedup• 33-50% reduced G/P fanouts• 80% reduced wiring complexity• 30% reduction in maximum interconnect Power impact: 56% reduction• 73% fewer carry-merge gates• 50% reduction in average transistor sizeEE141 20002020404060608080100100140140160160180180200200220220240240260260280280Delay (ps)Delay (ps)WorstWorst--case Energy (pJ)case Energy (pJ)Dynamic KoggeDynamic Kogge--StoneStoneSemiSemi--dynamic Sparsedynamic Sparse--Tree Tree 20%20%4GHz 4GHz DesignDesign56%56%130nm CMOS, 1.2V, 110130nm CMOS, 1.2V, 110ooCCCourtesy:R. Krishnamurthy(Intel)Energy-Delay Space 20% speedup over Kogge-Stone 56% worst-case energy reduction11MultipliersMultipliersEE141 22ZX··Y×Zk2kk0=MN1–+∑==Xi2ii0=M1–∑Yj2jj0=N1–∑=XiYj2ij+j0=N1–∑i0=M1–∑=XXi2ii0=M1–∑=YYj2jj0=N1–∑=withThe Binary Multiplication12EE141 23x+Partial productsMultiplicandMultiplierResult1 0 1 0 1 01 0 1 0 1 01 0 1 0 1 01 1 1 0 0 1 1 1 00 0 0 0 0 01 0 1 0 1 01 0 1 1The Binary MultiplicationEE141 24Y0Y1X3X2X1X0X3HAX2FAX1FAX0HAY2X3FAX2FAX1FAX0HAZ1Z3Z6Z7Z5Z4Y3X3FAX2FAX1FAX0HAZ2Z0The Array Multiplier13EE141 25HA FA FA HAHAFAFAFAFAFA FA HACritical Path 1Critical Path 2Critical Path 1 & 2()()[]()()andsumcarrymulttNtNtNMt⋅−+⋅−+⋅−+−≈ 1121The M-by-N Array Multiplier: Critical PathEE141 26ABPCiVDDAAAVDDCiAPABVDDVDDCiCiCoSCiPPPPPSum GenerationCarry GenerationSetupTransmission-Gate Full AdderBalanced tsumand tcarry14EE141 27Carry-Save MultiplierHA HA HA HAFAFAFAHAFAHA FA FAFAHA FA HAVector Merging Adder()()mergeandcarrymultttNtNt+⋅−+⋅−= 11EE141 28Multiplier FloorplanSCSCSCSCSCSCSCSCSCSCSCSCSCSCSCSCZ0Z1Z2Z3Z4Z5Z6Z7X0X1X2X3Y1Y2Y3Y0Vector Merging CellHA Multiplier CellFA Multiplier CellX and Y signals are broadcastedthrough the complete array.( )15EE141 29Wallace-Tree Multiplier6543210 6543210Partial products First stageBit position6543210 6543210Second stage Final adderFA HA(a) (b)(c) (d)EE141 30Wallace-Tree MultiplierPartial productsFirst stageSecond stageFinal adderFA FA FAHA HAFAx3y3z7z6z5z4z3z2z1z0x3y2x2y3x1y1x3y0x2y0x0y1x0y2x2y2x1y3x1y2x3y1x0y3x1y0x0y0x2y116EE141 31Wallace-Tree MultiplierFAFAFAFAy0y1y2y3y4y5SCi-1Ci-1Ci-1CiCiCiFAy0y1y2FAy3y4y5FAFACCSCi-1Ci-1Ci-1CiCiCiEE141 32Multipliers – Summary Optimization goals different than in binary adder Once again: Identify critical path Other possible techniques• Logarithmic versus linear (Wallace Tree Mult)• Data encoding (Booth)• PipeliningFirst glimpse at system level optimization17EE141 33The Binary ShifterAiAi-1BiBi-1RightLeftnopBit- Slice i...EE141 34The Barrel ShifterSh3Sh2Sh1Sh0Sh3Sh2Sh1A3A2A1A0B3B2B1B0: Control Wire: Data WireArea
View Full Document