Unformatted text preview:

ECE/CS 552: Introduction To Computer Architecture 1ECE/CS 752: Midterm 1 ReviewProf. Mikko H. LipastiUniversity of Wisconsin‐MadisonLecture notes based on notes by John P. ShenUpdated by Mikko LipastiComputer Architecture • Instruction Set Architecture (IBM 360)– … the attributes of a [computing] system as seen by the programmer. I.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementationAmdahl Blaaw & Brooksphysical implementation. ‐‐Amdahl, Blaaw, & Brooks, 1964• Machine Organization (microarchitecture)– ALUS, Buses, Caches, Memories, etc.• Machine Implementation (realization)– Gates, cells, transistors, wiresIron LawProcessor Performance = ---------------TimeProgramInstructions CyclesIiTime=XXArchitecture --> Implementation --> RealizationCompiler Designer Processor Designer Chip DesignerProgramInstructionCycle(code size)XX(CPI) (cycle time)Ideal PipeliningComb. Logicn Gate DelayGateDelayLGateDelayLL BW = ~(1/n)n--2n--2BW = ~(2/n)• Bandwidth increases linearly with pipeline depth• Latency increases by latch delaysGateDelayLGateDelayLGateDelayLn--3n--3n--3BW = ~(3/n)Pipelining Idealisms• Uniform subcomputations– Can pipeline into stages with equal delay– Balance pipeline stages• Identical computations– Can fill pipeline with identical work– Unify instruction types (example later)• Independent computations– No relationships between work units– Minimize pipeline stalls• Are these practical?– No, but can get close enough to get significant speedupExample (quicksort/MIPS)# for (; (j < high) && (array[j] < array[low]) ; ++j );# $10 = j# $9 = high# $6 = array# $8 = lowbge done, $10, $9mul $15, $10, 4addu$24, $6, $15addu$24, $6, $15lw $25, 0($24)mul $13, $8, 4addu $14, $6, $13lw $15, 0($14)bge done, $25, $15cont:addu $10, $10, 1. . .done:addu $11, $11, -1ECE/CS 552: Introduction To Computer Architecture 2Resolution of Pipeline Hazards• Pipeline hazards– Potential violations of program dependences– Must ensur e program dependences are not violated• Hazard resolution– Static: compiler/programmer guarantees correctness– Dynamic: hardware performs checks at runtime• Pipeline interlock– Hardware mechanism for dynamic hazard resolution– Must detect and enforce dependences at runtimePipeline Hazards• Necessary conditions:– WAR: write stage earlier than read stage• Is this possible in IF‐RD‐EX‐MEM‐WB ?– WAW: write stage earlier than write stage• Is this possible in IF‐RD‐EX‐MEM‐WB ?– RAW: read stage earlier than write stage• Is this possible in IF‐RD‐EX‐MEM‐WB?• If conditions not met, no need to resolve• Check for both register and memoryForwarding Paths (ALU instructions)IFIDi+1: i+2: i+3:RDR1R1 R1FORWARDINGbALUPATHSai: R1i: R1i: R1(i i+1)Forwardingvia Path ai+1:i+1:i+2:(i i+2)Forwardingvia Path b(i i+3)i writes R1before i+3 reads R1ALUMEMWBR1R1cReview• Pipelining Overview• Control– Data hazards• Stalls• Forwarding or bypassing– Control flow hazards• Branch prediction• Exceptions• Real PipelinesPower & Variability Summary• Basic Concepts– CMOS scaling trends– Power vs. Energy– Dynamic power vs. leakage power•Circuit Techniques for low powerCircuit Techniques for low power• Architectural Techniques low power– 3:1 rule of thumb• Variability challenge• Multicore as a potential solution [Lee & Kim]– What happens when Vnominal= VminPipelining to Superscalar• Forecast– Limits of pipelining– The case for superscalarInstructionlevel parallel machines–Instruction‐level parallel machines– Superscalar pipeline organization– Superscalar pipeline designECE/CS 552: Introduction To Computer Architecture 3Amdahl’s LawNo. ofProcessorsN1h1 -h1-ff• h = fraction of time in serial code• f = fraction that is vectorizable• v = speedup for f• Overall speedup:Time11 fvffSpeedup11Revisit Amdahl’s Law• Sequential bottleneck• Even if v is infinite–Performance limited by nonvectorizablefvffv1111lim–Performance limited by nonvectorizable portion (1‐f)No. ofProcessorsNTime1h1 -h1 - ffPipelined Performance ModelPipelineDepthN1• Tyranny of Amdahl’s Law [Bob Colwell]– When g is even slightly below 100%, a big performance hit will result– Stalled cycles are the key adversary and must be minimized as much as possible1-gg1Superscalar Proposal• Moderate tyranny of Amdahl’s Law– Ease sequential bottleneck– More generally applicableRobust (less sensitive to f)–Robust (less sensitive to f)– Revised Amdahl’s Law:vfsfSpeedup11Limits on Instruction Level Parallelism (ILP)Weiss and Smith [1984] 1.58Sohi and Vajapeyam [1987] 1.81Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck)Tjaden and Flynn [1973] 1.96Uht [1986] 2.00Smith et al. [1989]2.00Smith et al. [1989]2.00Jouppi and Wall [1988] 2.40Johnson [1991] 2.50Acosta et al. [1986] 2.79Wedig [1982] 3.00Butler et al. [1991] 5.8Melvin and Patt [1991] 6Wall [1991] 7 (Jouppi disagreed)Kuck et al. [1972] 8Riseman and Foster [1972] 51 (no control dependences)Nicolau and Fisher [1984] 90 (Fisher’s optimism)Classifying ILP Machines[Jouppi, DECWRL 1991]• Baseline scalar RISC– Issue parallelism = IP = 1– Operation latency = OP = 1–Peak IPC = 1123456IF DE EX WB1234567890TIME IN CYCLES (OF BASELINE MACHINE)SUCCESSIVEINSTRUCTIONSECE/CS 552: Introduction To Computer Architecture 4Superscalar ChallengesI-cacheFETCHDECODEBranchPredictorInstructionBufferInstructionFlowCOMMITD-cacheStoreQueueReorderBufferIntegerFloating-pointMediaMemoryRegisterData MemoryDataEXECUTE(ROB)FlowFlowLimitations of Scalar Pipelines• Scalar upper bound on throughput– IPC <= 1 or CPI >= 1– Solution: wide (superscalar) pipeline• Inefficient unified pipelineLong latency for each instruction–Long latency for each instruction– Solution: diversified, specialized pipelines• Rigid pipeline stall policy– One stalled instruction stalls all newer instructions– Solution: Out‐of‐order execution, distributed execution pipelinesRIOS‐I Fetch HardwareB1A1B5A5B9A9B13A130123TlogicB2A2B6A6B10A10B14A140123TlogicB0A0B4A4B8A8B12A12IFAR0123TlogicB3A3B7A7B11A11B15A150123Odddirectory setsA & BTLBhitmuxInstruction buffer network255mux255mux255mux255Evendirectory setsA &


View Full Document
Download Midterm review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?