CS#152,#Spring#2011Section#8Christopher*CelioUniversity*of*California,*BerkeleyMonday, March 21, 2011Agenda•Grades•Upcoming*Quiz*3–What*it*covers–OOO*processors–VLIW–Branch*PredictionMonday, March 21, 2011Intel*Core*2*Duo*(Penryn)*Vs.*NVidia*GTX*280•Intel*Core*2*Duo*(Penryn)–dualNcore–2007+–45nm–410*million*transistors–~2GHz–3*or*6MB*of*cache–10N35*Watts–107mm2•each*core*is*22mm2•L2*SRAM*is*6mm2/MB•NVidia*GTX*280–10*core(?)*(240*“stream”*processors)–2008–65nm–1.4*Billion*transistors–576mm2–602*MHz(core*clock)–236* Watts*!!!*http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/Monday, March 21, 2011Grades–Department*guidelines:•Average*GPA*2.7N3.1–Class*Average:*75%–Class*Standard*Deviation:*11.5%–Homework:*15%–Labs:*35%–Quizzes:*50%Monday, March 21, 2011Quiz*3•superscalar*pipelines*(inorder*&*outNofNorder)•outNofNorder*processors–what*are*the*different*stages?*What*is*done*in*each*stage*(e.g.,*what*resources*are*allocated*in*decode?)–register*renaming•explicit*versus*implicit*register*renaming*designs•when*to*allocate*registers,*when*to*free*registers–ROBs,*instruction*windows•dataNinNROB*versus*dataNnotNinNROB*versus*split*ROB/instruction*window*designs–branches*and*exceptions...*how*are*they*handled?–Load/Store*Queues•when*can*stores,*loads*be*fired*to*memory?•VLIW•software–instruction*reNordering–loop*unrolling–software*pipelining–how*code*will*get*scheduled*on*different*pipelines•branch*prediction–BHTs,*BTBs,*2Nbit*counters,*local*history,*global*history,*tournament*branch*predictors–when*can*you*make*predictions?*When*do*you*learn*prediction*was*wrong?*Monday, March 21, 2011Out*of*Order*Processors<lots of drawing on the board here>Monday, March 21, 2011March 14, 2011 CS152, Spring 20117Out-of-Order Control Complexity:MIPS R10000Control Logic[ SGI/MIPS Technologies Inc., 1995 ]Monday, March 21, 2011Out*of*Order*ProcessorsYeager. The MIPS R10000 Superscalar Microprocesor. IEE Micro. 1996Monday, March 21, 2011Out*of*Order*ProcessorsMonday, March 21, 2011OOO*StylesMonday, March 21, 2011March 9, 2011 CS152, Spring 201111“Data-in-ROB” Design(HP PA8000, Intel Pentium Pro, Core2 Duo & Nehalem)• On dispatch into ROB, ready sources can be in regfile or in ROB dest (copied into src1/src2 if ready before dispatch)• On completion, write to dest field and broadcast to src fields.• On issue, read from ROB src fieldsRegister Fileholds only committed stateReorderbufferLoad UnitFU FUFUStore Unit< t, result >t1t2..tnIns# use exec op p1 src1 p2 src2 pd dest dataCommitMonday, March 21, 2011March 9, 2011 CS152, Spring 2011Unified Physical Register File(MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy Bridge)• Rename all architectural registers into a single physical register file during decode, no register values read• Functional units read and write from single unified register file holding committed and temporary registers in execute• Commit only updates mapping of architectural register to physical register, no data movement12Unified Physical Register FileRead operands at issueFunctional UnitsWrite results at completionCommited Register MappingDecode Stage Register MappingMonday, March 21, 2011DEC*Alpha*21264•1996/1997•singleNcore–4Nway–outNofNorder–highly*speculative–7Nstage–up*to*80*instructions*in*flight–tournament*branch*predictor•15.2M*transistors–6M*for*logic*–rest*is*caching,*history*tables•350*nm•600*MHz•64KB*I$,*64KB*D$*(onNchip)–1*to*16MB*L2$*(offNchip)•314mm2*die*(fairly*large)Monday, March 21, 2011DEC*Alpha*21264Monday, March 21, 201121264*Register*Renaming•Registers*are*renamed,*then*instructions*are*inserted*into*the*issue*queue•Map*table*backed*up*on*every*inNflight*insnMonday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?•In*what*situations*is*renaming*useful?•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?–Code*with*ILP*and*name*dependencies:*loops•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?–Code*with*ILP*and*name*dependencies:*loops•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?–Not*much*ILP*within*a*basic*block,*so*renaming*isn’t*too*useful*without*branch*predictionMonday, March 21, 201121264*Superscalar*Execution•21264*couldn’t * fit*full*bypassing*into*one*clock*cycle•Instead,*they*fully*bypass * within*each*of* t wo*clusters;*interNcluster*bypass*takes*another*cycleMonday, March 21, 201121264*Instruction*Reordering•As*mentioned*earlier,*21264*uses*explicit*renaming,*as*opposed*to*dataNinNROB*design•What*does*ROB*hold?Monday, March 21, 2011Memory*Ordering*in*the*21264•To*execute*the*critical*instruction*path*quickly,*want*to*execute*loads*ASAP•Initially,*loads*speculatively*bypass*stores•On*a*misspeculation,*set*a*“wait”*bit*for*that*load’s*PC,*so*it*will*behave*conservatively*from*then*on•Clear*wait*bits*periodically*Monday, March 21, 2011Speculation*in*the*21264•What*does*the*21264*speculate*on?–Next*I$*line/way–Branches,*indirect*jumps–Exceptions–Load/Store*ordering–Load*hit/miss•Shortens*hit*time*by*a*cycle–Anything*else?Monday, March 21, 2011Question:*Stores•When*are*stores*sent*to*memory?–at*commit*time•Why*are*stores*saved*in*a*store*buffer*before*commit*time?–so*they*can*be*forwarded*to*dependent*loadsMonday, March 21, 2011March 14, 2011 CS152, Spring 201125VLIW: Very Long Instruction Word• Multiple operations packed into one instruction• Each operation slot is for a fixed function• Constant operation latencies are specified• Architecture requires guarantee of:– Parallelism within an instruction => no cross-operation RAW check– No data use before data ready => no data interlocksTwo Integer Units,Single Cycle LatencyTwo Load/Store Units,Three Cycle
View Full Document