DOC PREVIEW
Berkeley COMPSCI 252 - Review of Instruction Sets, Pipelines and Caches

This preview shows page 1-2-3-4-5-35-36-37-38-39-71-72-73-74-75 out of 75 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 75 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 Graduate Computer Architecture Lecture 2 Review of Instruction Sets, Pipelines, and CachesReview, #1Amdahl’s LawToday: Quick review of everything you should have learnedComputer PerformanceCycles Per Instruction (Throughput)Example: Calculating CPI bottom upExample: Branch Stall ImpactSPEC: System Performance Evaluation CooperativeSPEC First RoundIntegrated Circuits CostsA "Typical" RISCExample: MIPS (­ DLX)Datapath vs ControlApproaching an ISA5 Steps of DLX Datapath Figure 3.1, Page 1305 Steps of DLX Datapath Figure 3.4, Page 134Inst. Set Processor ControllerSlide 19Visualizing Pipelining Figure 3.3, Page 133CS 252 AdministriviaPipelining is not quite that easy!One Memory Port/Structural Hazards Figure 3.6, Page 142One Memory Port/Structural Hazards Figure 3.7, Page 143Speed Up Equation for PipeliningExample: Dual-port vs. Single-portData Hazard on R1 Figure 3.9, page 147Three Generic Data HazardsSlide 29Slide 30Forwarding to Avoid Data Hazard Figure 3.10, Page 149HW Change for Forwarding Figure 3.20, Page 161Data Hazard Even with Forwarding Figure 3.12, Page 153Data Hazard Even with Forwarding Figure 3.13, Page 154Software Scheduling to Avoid Load HazardsControl Hazard on Branches Three Stage StallBranch Stall ImpactPipelined DLX Datapath Figure 3.22, page 163Four Branch Hazard AlternativesSlide 40Delayed BranchEvaluating Branch AlternativesNow, Review of Memory HierarchyRecap: Who Cares About the Memory Hierarchy?Levels of the Memory HierarchyThe Principle of LocalityMemory Hierarchy: TerminologyCache MeasuresSimplest Cache: Direct Mapped1 KB Direct Mapped Cache, 32B blocksTwo-way Set Associative CacheDisadvantage of Set Associative Cache4 Questions for Memory HierarchyQ1: Where can a block be placed in the upper level?Q2: How is a block found if it is in the upper level?Q3: Which block should be replaced on a miss?Q4: What happens on a write?Write Buffer for Write ThroughImpact of Memory Hierarchy on AlgorithmsQuicksort vs. Radix as vary number keys: InstructionsQuicksort vs. Radix as vary number keys: Instrs & TimeQuicksort vs. Radix as vary number keys: Cache missesA Modern Memory HierarchyWhat is virtual memory?Three Advantages of Virtual MemoryIssues in Virtual Memory System DesignLarge Address SpacesTranslation Look-Aside BuffersOverlapped Cache & TLB AccessProblems With Overlapped TLB AccessSummary #1/5: Control and PipeliningSummary #2/5: CachesSummary #3/5: The Cache Design SpaceSummary #4/5: TLB, Virtual MemorySummary #5/5: Memory Hierachy1/20/05CS252-S05 Lec21Prof. David CullerElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~culler/courses/cs252-s05CS252Graduate Computer ArchitectureLecture 2 Review of Instruction Sets, Pipelines, and Caches1/20/05CS252-S05 Lec22Review, #1•Technology is changing rapidly:Capacity SpeedLogic 2x in 3 years 2x in 3 yearsDRAM 4x in 3 years 2x in 10 yearsDisk 4x in 3 years 2x in 10 years Processor ( n.a.) 2x in 1.5 years•What was true five years ago is not necessarily true now.•Execution time is the REAL measure of computer performance!–Not clock rate, not CPI•“X is n times faster than Y” means:e(Y)Performance(X)Performanc ExTime(X)ExTime(y)1/20/05CS252-S05 Lec23Amdahl’s Law enhancedenhancedenhancednewoldoverallSpeedupFraction Fraction 1 ExTimeExTime Speedup1Best you could ever hope to do: enhancedmaximumFraction - 11 Speedup  enhancedenhancedenhancedoldnewSpeedupFractionFraction ExTime ExTime 11/20/05CS252-S05 Lec24Today: Quick review of everything you should have learned1/20/05CS252-S05 Lec25Computer Performance CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction CycleCPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock RateProgram XCompiler X (X)Inst. Set. X XOrganization X XTechnology Xinst countCPICycle time1/20/05CS252-S05 Lec26Cycles Per Instruction (Throughput)“Instruction Frequency”CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Average Cycles per Instruction”jnjjI CPI TimeCycle time CPU 1Count nInstructioI F where F CPI CPIjjnjjj11/20/05CS252-S05 Lec27Example: Calculating CPI bottom upTypical Mix of instruction typesin programBase Machine (Reg / Reg)Op Freq Cycles CPI(i) (% Time)ALU 50% 1 .5 (33%)Load 20% 2 .4 (27%)Store 10% 2 .2 (13%)Branch 20% 2 .4 (27%) 1.5Design guideline: Make the common case fastMIPS 1% rule: only consider adding an instruction of it is shown to add 1% performance improvement on reasonable benchmarks.Run benchmark and collect workload characterization (simulate, machine counters, or sampling)1/20/05CS252-S05 Lec28Example: Branch Stall Impact•Assume CPI = 1.0 ignoring branches (ideal)•Assume solution was stalling for 3 cycles•If 30% branch, Stall 3 cycles on 30% Op Freq Cycles CPI(i) (% Time)Other 70% 1 .7 (37%)Branch 30% 4 1.2 (63%) new CPI = 1.9•New machine is 1/1.9 = 0.52 times faster (i.e. slow!)1/20/05CS252-S05 Lec29SPEC: System Performance Evaluation Cooperative•First Round 1989–10 programs yielding a single number (“SPECmarks”)•Second Round 1992–SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)»Compiler Flags unlimited. March 93 of DEC 4000 Model 610:spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas•Third Round 1995–new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –“benchmarks useful for 3 years”–Single flag setting for all programs: SPECint_base95, SPECfp_base95 •Fourth Round 2000: 26 apps–analysis and simulation programs–Compression: bzip2, gzip, –Integrated circuit layout, ray tracing, lots of others1/20/05CS252-S05 Lec210SPEC First Round•One program: 99% of time in single line of code•New front-end compiler could improve dramaticallyBenchmarkSPEC Perf0100200300400500600700800gccepressospicedoducnasa7lieqntottmatrix300fpppptomcatv1/20/05CS252-S05 Lec211 Integrated Circuits CostsDie Cost goes roughly with die area4 Test_Die Die_Area 2Wafer_diam Die_Area2m/2)(Wafer_dia wafer per Dies


View Full Document

Berkeley COMPSCI 252 - Review of Instruction Sets, Pipelines and Caches

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Review of Instruction Sets, Pipelines and Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Review of Instruction Sets, Pipelines and Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Review of Instruction Sets, Pipelines and Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?