DOC PREVIEW
Berkeley COMPSCI 152 - Section Notes

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS#152,#Spring#2011Section#8Christopher*CelioUniversity*of*California,*BerkeleyMonday, March 21, 2011Agenda•Grades•Upcoming*Quiz*3–What*it*covers–OOO*processors–VLIW–Branch*PredictionMonday, March 21, 2011Intel*Core*2*Duo*(Penryn)*Vs.*NVidia*GTX*280•Intel*Core*2*Duo*(Penryn)–dualNcore–2007+–45nm–410*million*transistors–~2GHz–3*or*6MB*of*cache–10N35*Watts–107mm2•each*core*is*22mm2•L2*SRAM*is*6mm2/MB•NVidia*GTX*280–10*core(?)*(240*“stream”*processors)–2008–65nm–1.4*Billion*transistors–576mm2–602*MHz(core*clock)–236* Watts*!!!*http://3dimensionaljigsaw.wordpress.com/2008/06/18/physics-based-games-the-new-genre/Monday, March 21, 2011Grades–Department*guidelines:•Average*GPA*2.7N3.1–Class*Average:*75%–Class*Standard*Deviation:*11.5%–Homework:*15%–Labs:*35%–Quizzes:*50%Monday, March 21, 2011Quiz*3•superscalar*pipelines*(inorder*&*outNofNorder)•outNofNorder*processors–what*are*the*different*stages?*What*is*done*in*each*stage*(e.g.,*what*resources*are*allocated*in*decode?)–register*renaming•explicit*versus*implicit*register*renaming*designs•when*to*allocate*registers,*when*to*free*registers–ROBs,*instruction*windows•dataNinNROB*versus*dataNnotNinNROB*versus*split*ROB/instruction*window*designs–branches*and*exceptions...*how*are*they*handled?–Load/Store*Queues•when*can*stores,*loads*be*fired*to*memory?•VLIW•software–instruction*reNordering–loop*unrolling–software*pipelining–how*code*will*get*scheduled*on*different*pipelines•branch*prediction–BHTs,*BTBs,*2Nbit*counters,*local*history,*global*history,*tournament*branch*predictors–when*can*you*make*predictions?*When*do*you*learn*prediction*was*wrong?*Monday, March 21, 2011Out*of*Order*Processors<lots of drawing on the board here>Monday, March 21, 2011March 14, 2011 CS152, Spring 20117Out-of-Order Control Complexity:MIPS R10000Control Logic[ SGI/MIPS Technologies Inc., 1995 ]Monday, March 21, 2011Out*of*Order*ProcessorsYeager. The MIPS R10000 Superscalar Microprocesor. IEE Micro. 1996Monday, March 21, 2011Out*of*Order*ProcessorsMonday, March 21, 2011OOO*StylesMonday, March 21, 2011March 9, 2011 CS152, Spring 201111“Data-in-ROB” Design(HP PA8000, Intel Pentium Pro, Core2 Duo & Nehalem)• On dispatch into ROB, ready sources can be in regfile or in ROB dest (copied into src1/src2 if ready before dispatch)• On completion, write to dest field and broadcast to src fields.• On issue, read from ROB src fieldsRegister Fileholds only committed stateReorderbufferLoad UnitFU FUFUStore Unit< t, result >t1t2..tnIns# use exec op p1 src1 p2 src2 pd dest dataCommitMonday, March 21, 2011March 9, 2011 CS152, Spring 2011Unified Physical Register File(MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy Bridge)• Rename all architectural registers into a single physical register file during decode, no register values read• Functional units read and write from single unified register file holding committed and temporary registers in execute• Commit only updates mapping of architectural register to physical register, no data movement12Unified Physical Register FileRead operands at issueFunctional UnitsWrite results at completionCommited Register MappingDecode Stage Register MappingMonday, March 21, 2011DEC*Alpha*21264•1996/1997•singleNcore–4Nway–outNofNorder–highly*speculative–7Nstage–up*to*80*instructions*in*flight–tournament*branch*predictor•15.2M*transistors–6M*for*logic*–rest*is*caching,*history*tables•350*nm•600*MHz•64KB*I$,*64KB*D$*(onNchip)–1*to*16MB*L2$*(offNchip)•314mm2*die*(fairly*large)Monday, March 21, 2011DEC*Alpha*21264Monday, March 21, 201121264*Register*Renaming•Registers*are*renamed,*then*instructions*are*inserted*into*the*issue*queue•Map*table*backed*up*on*every*inNflight*insnMonday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?•In*what*situations*is*renaming*useful?•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?–Code*with*ILP*and*name*dependencies:*loops•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?Monday, March 21, 201121264*Register*Renaming•What*hazards*does*renaming*obviate?–WAR,*WAW•In*what*situations*is*renaming*useful?–Code*with*ILP*and*name*dependencies:*loops•If*you*had*to*choose*between*branch*prediction*and*renaming,*which*would*you*pick?–Not*much*ILP*within*a*basic*block,*so*renaming*isn’t*too*useful*without*branch*predictionMonday, March 21, 201121264*Superscalar*Execution•21264*couldn’t * fit*full*bypassing*into*one*clock*cycle•Instead,*they*fully*bypass * within*each*of* t wo*clusters;*interNcluster*bypass*takes*another*cycleMonday, March 21, 201121264*Instruction*Reordering•As*mentioned*earlier,*21264*uses*explicit*renaming,*as*opposed*to*dataNinNROB*design•What*does*ROB*hold?Monday, March 21, 2011Memory*Ordering*in*the*21264•To*execute*the*critical*instruction*path*quickly,*want*to*execute*loads*ASAP•Initially,*loads*speculatively*bypass*stores•On*a*misspeculation,*set*a*“wait”*bit*for*that*load’s*PC,*so*it*will*behave*conservatively*from*then*on•Clear*wait*bits*periodically*Monday, March 21, 2011Speculation*in*the*21264•What*does*the*21264*speculate*on?–Next*I$*line/way–Branches,*indirect*jumps–Exceptions–Load/Store*ordering–Load*hit/miss•Shortens*hit*time*by*a*cycle–Anything*else?Monday, March 21, 2011Question:*Stores•When*are*stores*sent*to*memory?–at*commit*time•Why*are*stores*saved*in*a*store*buffer*before*commit*time?–so*they*can*be*forwarded*to*dependent*loadsMonday, March 21, 2011March 14, 2011 CS152, Spring 201125VLIW: Very Long Instruction Word• Multiple operations packed into one instruction• Each operation slot is for a fixed function• Constant operation latencies are specified• Architecture requires guarantee of:– Parallelism within an instruction => no cross-operation RAW check– No data use before data ready => no data interlocksTwo Integer Units,Single Cycle LatencyTwo Load/Store Units,Three Cycle


View Full Document

Berkeley COMPSCI 152 - Section Notes

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Section Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Section Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Section Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?