ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore ParallelismPower Dissipation in CMOS Logic (0.25µ)Low-Power Datapath ArchitectureA Reference DatapathA Parallel ArchitectureLevel Converter: L to HLevel Converter: H to LControl Signals, N = 4PowerVoltage vs. SpeedIncreasing MultiprocessingExtreme Cases: Vt = 0Example: Multiplier CoreA Multicore DesignHow Many Cores?Design TradeoffsPower Reduction in ProcessorsParallel ArchitecturePipeline ArchitectureApproximate TrendMulticore ProcessorsSlide 22Cell - Cell Broadband Engine ArchitectureCell’s Nine-Processor ChipSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)11ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI DesignSpring 2007Spring 2007Reducing Power through Multicore ParallelismReducing Power through Multicore ParallelismVishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher ProfessorECE Department, Auburn UniversityECE Department, Auburn UniversityAuburn, AL 36849Auburn, AL [email protected]@eng.auburn.eduhttp://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)22Power Dissipation in CMOS Power Dissipation in CMOS Logic (0.25µ)Logic (0.25µ)%75 %5%20PPtotaltotal (0→1) = (0→1) = CCLL V VDDDD22 + + ttscscVVDDDD I Ipeakpeak ++ VVDDDDIIleakageleakageCLVDDVDDSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)33Low-Power Datapath ArchitectureLow-Power Datapath ArchitectureLower supply voltageLower supply voltageThis slows down circuit speedThis slows down circuit speedUse parallel computing to gain the speed backUse parallel computing to gain the speed backWorks well when threshold voltage is also lowered.Works well when threshold voltage is also lowered.About 60% reduction in power obtainable.About 60% reduction in power obtainable.Reference: A. P. Chandrakasan and R. W. Brodersen, Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS DesignLow Power Digital CMOS Design, Boston: Kluwer , Boston: Kluwer Academic Publishers (Now Springer), 1995.Academic Publishers (Now Springer), 1995.Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)44A Reference DatapathA Reference DatapathCombinationallogicOutputInputRegisterRegisterCKSupply voltage = VrefTotal capacitance switched per cycle = CrefClock frequency = fPower consumption: Pref= CrefVref2fCrefSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)55A Parallel ArchitectureA Parallel ArchitectureComb.LogicCopy 1Comb.LogicCopy 2Comb.LogicCopy NRegisterRegisterRegisterRegisterN to 1 multiplexerMultiphaseClock gen. and muxcontrolInputOutputCKff/Nf/Nf/NEach copy processes every Nth input, operates at reduced voltageSupply voltage:VN ≤ V1 = VrefN = Deg. of parallelismSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)66Level Converter: L to HLevel Converter: L to HVin_LVout_HVDDHVDDLTransistors with thicker oxide and longer channelsN. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)77Level Converter: H to LLevel Converter: H to LVin_HVout_LVDDLTransistors with thicker oxide and longer channelsN. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)88Control Signals, N = 4Control Signals, N = 4CKPhase 1Phase 2Phase 3Phase 4Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)99PowerPowerPN = Pproc + PoverheadPproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN2f= (Cinreg+ Ccomb+Coutreg)VN2f= CrefVN2fPoverhead= CoverheadVN2f ≈ δCref(N – 1)VN2fPN= [1 + δ(N – 1)]CrefVN2fPN VN2── = [1 + δ(N – 1)] ───P1 Vref2Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)1010Voltage vs. SpeedVoltage vs. Speed CLVref CLVrefDelay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2 where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltageSupply voltageNormalized gate delay, T4.03.02.01.00.0VtVref =5VV2=2.9VN=1N=2V3N=31.2μ CMOSVoltage reduction slows down as we get closer to VtSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)1111Increasing MultiprocessingIncreasing MultiprocessingPN/P11 2 3 4 5 6 7 8 9 10 11 121.00.80.60.40.20.0Vt=0V (extreme case)Vt=0.4VVt=0.8VN1.2μ CMOS, Vref = 5VSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)1212Extreme Cases: VExtreme Cases: Vtt = 0 = 0Delay, T α 1/ VrefFor N processing elements, delay = NT → VN = Vref/NPN1── = [1+ δ (N – 1)] ── → 1/NP1N2For negligible overhead, δ→0PN 1── ≈ ──P1N2For Vt > 0, power reduction is less and there will be an optimum value of N.Spring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)1313Example: Multiplier CoreExample: Multiplier CoreSpecification:Specification:200MHz Clock200MHz Clock15W dissipation @ 5V15W dissipation @ 5VLow voltage operation, VLow voltage operation, VDDDD ≥ 1.5 volts ≥ 1.5 volts (V(VDDDD – 0.5) – 0.5)22 Relative clock rate = Relative clock rate = ────────────── 20.2520.25Problem:Problem:Integrate multiplier core on a SOCIntegrate multiplier core on a SOCPower budget for multiplier ~ 5WPower budget for multiplier ~ 5WSpring 07, Feb 20Spring 07, Feb 20ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal)1414A Multicore DesignA Multicore DesignMultiplierCore
View Full Document