DOC PREVIEW
Berkeley COMPSCI 61C - CPU Design

This preview shows page 1-2 out of 6 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

inst eecs berkeley edu cs61c UCB CS61C Machine Structures Review Lecture 29 CPU Design Pipelining to Improve Performance II 2010 04 07 Pipelining is a BIG idea Lecturer SOE Dan Garcia Optimal Pipeline Each stage is executing part of an instruction each clock cycle One instruction finishes during each clock cycle Cal researcher Marty Banks has put together a system to help with the eyestrain many viewers experience with 3D content on a small screen the vergence accomodation conflict www technologyreview com computing 24976 On average execute far more quickly What makes this work Similarities between instructions allow us to use same stages for all instructions generally Each stage takes about the same amount of time as all others little wasted time Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 2 Structural Hazard 1 Single Memory 1 2 Limits to pipelining Hazards prevent next Time clock cycles I n I D Reg Reg s Load I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I D Reg Reg Instr 3 r I D Reg d Instr 4 e r Read same memory twice in same clock cycle instruction from executing during its designated clock cycle Structural hazards HW cannot support some Garcia Spring 2010 UCB Structural Hazard 1 Single Memory 2 2 Solution infeasible and inefficient to create second memory I Reg I D Reg Reg D Reg I Reg D Reg I Reg D Reg I Reg ALU O Instr 2 r Instr 3 d e Instr 4 r Time clock cycles ALU an L1 Data Cache need more complex hardware to control when both caches miss I n s t sw r Instr 1 ALU have both an L1 Instruction Cache and Structural Hazard 2 Registers 1 2 ALU so simulate this by having two Level 1 Caches a temporary smaller of usually most recently used copy of memory Reg Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 4 ALU We ll learn about this more friday next week ALU CS61C L29 CPU Design Pipelining to Improve Performance II 3 ALU in the pipeline ALU These might result in pipeline stalls or bubbles ALU combination of instructions single person to fold and put clothes away Control hazards Pipelining of branches causes later instruction fetches to wait for the result of the branch Data hazards Instruction depends on result of prior instruction still in the pipeline missing sock ALU Problems for Pipelining CPUs D Reg Can we read and write to registers simultaneously CS61C L29 CPU Design Pipelining to Improve Performance II 5 Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 6 Garcia Spring 2010 UCB Structural Hazard 2 Registers 2 2 Two different solutions have been used Time clock cycles Garcia Spring 2010 UCB ALU same clock cycle ALU Result can perform Read and Write during ALU 2 Build RegFile with independent read and write ports ALU Write to Registers during first half of each clock cycle Read from Registers during second half of each clock cycle I n I D Reg Reg s beq I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I D Reg Reg Instr 3 r I D Reg Reg d Instr 4 e r Where do we do the compare for the branch ALU 1 RegFile access is VERY fast takes less than half the time of ALU stage CS61C L29 CPU Design Pipelining to Improve Performance II 7 Control Hazard Branching 1 9 Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 8 Control Hazard Branching 2 9 Control Hazard Branching 3 9 We had put branch decision making Initial Solution Stall until decision is made hardware in ALU stage insert no op instructions those that accomplish therefore two more instructions after the branch nothing just take time or hold up the fetch of the next instruction for 2 cycles Drawback branches take 3 clock cycles each assuming comparator is put in ALU stage will always be fetched whether or not the branch is taken Desired functionality of a branch if we do not take the branch don t waste any time and continue executing normally if we take the branch don t execute any instructions after the branch just go to the desired label CS61C L29 CPU Design Pipelining to Improve Performance II 9 Garcia Spring 2010 UCB Control Hazard Branching 4 9 Control Hazard Branching 5 9 Optimization 1 insert special branch comparator in Stage 2 ALU ALU ALU Garcia Spring 2010 UCB ALU identifies it as a branch immediately make a decision and set the new value of the PC Benefit since branch is complete in Stage 2 only one unnecessary instruction is fetched so only one no op is needed Side Note This means that branches are idle in Stages 3 4 and 5 Time clock cycles I n I D Reg Reg s beq I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I D Reg Reg Instr 3 r I D Reg Reg d Instr 4 e r Branch comparator moved to Decode stage ALU as soon as instruction is decoded Opcode CS61C L29 CPU Design Pipelining to Improve Performance II 11 Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 10 CS61C L29 CPU Design Pipelining to Improve Performance II 12 Garcia Spring 2010 UCB Control Hazard Branching 6 9 Control Hazard Branching 7 9 bub bub bub bub bub O nop ble ble ble ble ble r D Reg Reg I d lw e Impact 2 clock cycles per branch instruction slow r bub O lw D Reg Reg I ble r d e Impact 2 clock cycles per branch instruction slow r ALU I Controller inserting a single bubble n Time clock cycles s I D Reg Reg t add r I D Reg Reg beq ALU I User inserting no op instruction n Time clock cycles s I D Reg Reg t add r I D Reg Reg beq ALU ALU ALU ALU Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 13 story about engineer physicist mathematician asked to build a fence around a flock of sheep using minimal fence CS61C L29 CPU Design Pipelining to Improve Performance II 14 Control Hazard Branching 8 9 Control Hazard Branching 9 9 Optimization 2 Redefine branches Notes on Branch Delay Slot Old definition if we take the branch none of the instructions after the branch get executed by accident New definition whether or not we take the branch the single instruction immediately following the branch gets executed called the branch delay slot The term Delayed Branch means we always execute inst after branch This optimization is used with MIPS Garcia Spring 2010 UCB CS61C L29 CPU Design Pipelining to Improve Performance II 15 Example Nondelayed vs Delayed Branch Nondelayed Branch or 8 9 10 Delayed Branch add 1 2 3 Worst Case Scenario can always put a no op in the branch delay slot Better Case can …


View Full Document

Berkeley COMPSCI 61C - CPU Design

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view CPU Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CPU Design and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?