T.H.A.D.D. GROUP TOM DUAN HELEN YU ANDY LEE DANNY HUANG DAWEY HUANGAgenda5-Stage PipelineMultiply and AccumulateCritical Path (WB stage)Memory SubsystemClock DividerMemory Subsystem DiagramCache OrganizationInstruction Cache/ControllerData CacheData Cache <-> Victim CachePower Reduction MethodsPower Consumption of ComponentsComponent Optimization Results (1)Component Optimization Results (2)Supply Voltage Reduction ResultsSupply Voltage ComparisonDesign ChallengesConclusionT.H.A.D.D. GROUPTOM DUANHELEN YUANDY LEEDANNY HUANGDAWEY HUANGDSP Enabled Processor DesignAgendaDatapath DesignMemory SubsystemPower OptimizationPerformance5-Stage PipelineID/EX PIPELINE REGIF/ID PIPELINE REG EX/MEM PIPELINE REG MEM/WB PIPELINE REGINSTRUCTIONCACHEDATACACHEREGISTERFILEBRANCH LOGICJUMP LOGICALUMACSTAGE 1MACSTAGE 2Multiply and Accumulate2-stage pipeline multiplierNo stalling when LW followed by MACREGISTERFILEMULTIPIER 2MULTIPIER 1ID/EX PIPELINE REG EX/MEM PIPELINE REG MEM/WB PIPELINE REGCritical Path (WB stage)MEM/WB PIPELINE REGMAC STAGE 2MUXTO REGISTER FILEFROM DATA MEMORY32321616323232Memory Subsystem2x clock rate of processor3 controllerssdraminstruction blockdata blockasynchronous component interface (arbitrator)Clock DividerCounterICLKCLKCLK2XMemory Subsystem DiagramINSTRUCTION CACHE BLOCKDATA CACHE BLOCKARBITRATORaddressmissreadySDRAM BLOCKSDRAM(GIVEN)CONTROLLERCACHECONTROL CONTROLMAINCACHEVICTIMCACHEBUFFERaddressreadymissaddressdata datareadyCache OrganizationInstruction Cache Data CacheOrganization: Direct mapped 2-way Set AssociativeBlock size: 4 words 4 wordsCache size: 5 blocks/ 20 words 7 blocks/ 28 wordsReplacement Policy: None Random (toggle)Write Policy: None Write-through w/ bufferVictim Cache: 5 blocks / 20 wordsInstruction Cache/ControllerController FSMCache BlocksADDRESSCLKREADDISABLESDRAM READYSDRAM DATAMISSSDRAM ADDRESSDOUTWRITEWORDHITDATAIDLECHECKMISSADDRESS5 BLOCKS EACH 4 WORDSData CacheData Cache <-> Victim CachePower Reduction MethodsLimiting VHDL sensitivity listBalance input arrivalEnable/Disable componentsEliminate unnecessary control signals & data busesMinimize execution time to lower supply voltagePower Consumption of Componentsvictim cache 12.59data cache 11.37data control 50.46instr cache 32.88instr control 15.55Supply voltage = 2.5Voltscomponent energy [uJ] ifidreg 10.28 exmemreg 12.69 shftadd 13.78 mul2 15.12 comparator 17.27 alu 17.82 idexreg 25.69pc reg 30.83 write_buf 35.60 mul1 45.68icache 50.46 m32x4 53.54 fwunit 62.01 dcache 74.43 controller 84.14 mux32x2 95.26 hazard 158.62 regfile 174.37 sdram 45.18Total power: 1107.98sdram 44.56sdram control 0.62Component Optimization Results (1)466.20262.48699.87149.35196.04505.9645.1848.3974.4262.01158.56174.370.00100.00200.00300.00400.00500.00600.00700.00800.00SDRAM INSTRUCTION MEM DATA MEM FORW ARDING HAZARD UNIT REGISTER FILEenergy consumed [uJ]Supply voltage = 2.5VoltsComponent Optimization Results (2)116.7147.14174.475.4576.2837.72157.990.110.0020.0040.0060.0080.00100.00120.00140.00160.00180.00REGISTER ALU/ADDER MUX SHIFTERenergy consumed [uJ]Supply voltage = 2.5VoltsSupply Voltage Reduction Results160.586.5660.8048.42110.0345.1862.084.1158.6174.458.020.4321.8917.2438.8244.8122.329.757.1110.4-20.040.060.080.0100.0120.0140.0160.0180.0mux registers multiplier instr block data block sdram fwunit controller hazard regfile energy consumed [uJ]component 2.5V 1.5V % diffmux 160.5 58.0 64%re giste rs 86.56 20.43 76%multiplier 60.80 21.89 64%instr block 48.42 17.24 64%data block 110.03 38.82 65%sdram 45.18 44.81 1% fw unit 62.0 22.3 64% controller 84.1 29.7 65% hazard 158.6 57.1 64% regfile 174.4 110.4 37%Supply Voltage Comparison 2.5V 1.5V % Diff.Total pow er: [uJ] 1,059.0 397.7 62% Cycle time: [ns] 32.0 52.0 63%Execution time: [us] 411.0 651.7 59%Design Challengeswhat we learned: power optimization conceptswhat surprised us: component interface timingwhat challenged us: reducing cache missConclusionA Very Rewarding ProjectExcellent PerformanceCan Sleep
View Full Document