DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 17 – Advanced Processors I

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors I2005-10-27John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 17 – Advanced Processors Iwww-inst.eecs.berkeley.edu/~cs152/TAs: David Marquardt and Udam SainiUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ILast Time: Error Correcting CodesCosmic ray hit D1. But how do we know that?D₃D₂D₁P₂D₀P₁P₀On readout we compute:P₀ xor D₃ xor D₁ xor D₀ = 1 xor 0 xor 0 xor 0 = 1 P₁ xor D₃ xor D₂ xor D₀ = 1 xor 0 xor 1 xor 0 = 0P₂ xor D₃ xor D₂ xor D₁ = 0 xor 0 xor 1 xor 0 = 10 11 0 0 1 1We write:D₃D₂D₁P₂D₀P₁P₀0 01 0 0 1 1Later, we read:P₂P₁P₀ = b101 = 5What does “5” mean?0 01 0 0 1 1The position of the flipped bit!To repair, just flip it back ...D₃D₂D₁P₂D₀P₁P₀1436 57 2Note: we number the least significant bit with 1, not 0! 0 is reserved for “no errors”.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IToday: Beyond the 5-stage pipelineTaxonomy: Introduction to advanced processor techniques.Superpipelining: Increasing the number of pipeline stages.Superscalar: Issuing several instructions in a single cycle.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors I5 Stage Pipeline: A point of departureSecondsProgram InstructionsProgram=SecondsCycle InstructionCyclesAt best, the 5-stage pipeline executes one instruction per clock, with a clock period determined by the slowest stageFilling all delay slots(branch,load)Filling all delay slots(branch,load)Perfect cachingPerfect cachingApplication does not need multi-cycle instructions (multiply, divide, etc)Application does not need multi-cycle instructions (multiply, divide, etc)UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ISuperpipelining: Add more stagesToday!Today!Goal: Reduce critical path byadding more pipeline stages.Difficulties: Added penalties for load delays and branch misses.Ultimate Limiter: As logic delay goes to 0, FF clk-to-Q and setup. Example: 8-stage ARM XScale:extra IF, ID, data cache stages.Also, power!UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IGoal: Improve CPI by issuing several instructions per cycle.Difficulties: Load and branchdelays affect more instructions.Ultimate Limiter: Programs maybe a poor match to issue rules.Example: CPU with floating point ALUs: issue 1 FP + 1 integer instruction per cycle.Superscalar: Multiple issues per cycleToday!Today!UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IOut of Order: Going around stallsTuesdayTuesdayGoal: Issue instructions out of program orderADDDExample:MULTDwaiting on F4 to load ...MULTDwaiting on F4 to load ...... so let ADDD gofirst... so let ADDD gofirstDifficulties: Bookkeeping is highly complex.A poor fit for lockstep instruction scheduling.Ultimate Limiter: The amount of instruction level parallelism present in an application.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IDynamic Scheduling: End lockstepGoal: Enable out-of-order by breaking pipeline in two: fetch and execution.Limiters: Design complexity, instruction level parallelism. Example: IBM Power 5: TuesdayTuesdayUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors I Throughput and multiple threadsNext ThursdayNext ThursdayGoal: Use multiple CPUs (real and virtual) to improve (1) throughput of machines that run many programs (2) execution time of multi-threaded programs.Difficulties: Gaining full advantage requires rewriting applications, OS, libraries.Ultimate limiter: Amdahl’s law, memory system performance.Example: Sun Niagara (8 SPARCs on one chip).UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IFriday: DRAM controller checkoffDRAMDRAM ControllerIM BusDM BusTestVectorsRun your test vector suite on the Calinx board, display results on LEDs.Test ongoing transactions on both buses, with randomized start times. Load, store, and verify different data word patterns.Write different addresses, values!UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ISuperpipeliningUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IAdd pipeline stages, reduce clock periodQ. Could adding pipeline stages hurt the CPI for an application?ARM XScale8 stagesCPI Problem Possible SolutionTaken branches cause longer stallsBranch prediction, loop unrollingCache misses take more clock cyclesLarger caches, add prefetch opcodes to ISAA. Yes, due to these problems:UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors II1:I2:I3:I4:I5:t1 t2 t3 t4 t5 t6 t7 t8Time:InstI6:Recall: Control hazards ...IRIRIF (Fetch) ID (Decode) EX (ALU)IRIRMEMWBBEQ R4,R3,25SUB R1,R9,R8AND R6,R5,R4I1:I2:I3:Sample Program(ISA w/o branch delay slot)IF IDIFEXIDIFMEM WBEX stage computes if branch is takenIf branch is taken, these instructions MUST NOT complete!We avoiding stalling by (1) adding a branch delay slot, and (2) adding comparator to ID stageIf we add more early stages, we must stall.I-CacheUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors II1:I2:I3:I4:I5:t1 t2 t3 t4 t5 t6 t7 t8Time:InstI6:IRIRIF (Fetch) ID (Decode) EX (ALU)IRIRMEMWBIF IDIFEXIDIFMEM WBEX stage computes if branch is takenIf we predicted incorrectly, these instructions MUST NOT complete!We update the PC based on the outputs of the branch predictor. If it is perfect, pipe stays full!Dynamic Predictors: a cache of branch historyI-CacheSolution: Branch prediction ...A control instr?Taken or Not Taken?The PC a branch “targets”Branch PredictorPredictionsUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IBranch predictors cache branch history0b0110[...]01001000 BNEZ R1 LoopUpdate BHT/BTB for next time,oncetruebehavior known Update BHT/BTB for next time,oncetruebehavior known “Taken” or“Not Taken”“Taken” or“Not Taken”Branch History Table (BHT)2 bitstarget addressBranch Target Buffer (BTB)“Taken” Address“Taken” AddressPC + 4 + Loop28-bit address tag0b0110[...]0100Address of BNEZ instruction=HitHit28 bits80-90% accurate80-90% accurateMust check prediction, kill instruction if needed.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ISimple (”2-bit”) Branch History Table EntryD Q D QPrediction for next branch (1 = take, 0 = not take)We do not change the


View Full Document

Berkeley COMPSCI 152 - Lecture 17 – Advanced Processors I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 17 – Advanced Processors I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 17 – Advanced Processors I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 17 – Advanced Processors I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?