DOC PREVIEW
Berkeley COMPSCI 61C - CPU Design

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

inst.eecs.berkeley.edu/~cs61c !UCB CS61C : Machine Structures Lecture 29 – CPU Design : Pipelining to Improve Performance II 2010-04-07 Cal researcher Marty Banks has put together a system to help with the eyestrain many viewers experience with 3D content on a small screen – the vergence / accomodation conflict. Lecturer SOE Dan Garcia www.technologyreview.com/computing/24976 CS61C L29 CPU Design : Pipelining to Improve Performance II (2) Garcia, Spring 2010 © UCB Review  Pipelining is a BIG idea  Optimal Pipeline  Each stage is executing part of an instruction each clock cycle.  One instruction finishes during each clock cycle.  On average, execute far more quickly.  What makes this work?  Similarities between instructions allow us to use same stages for all instructions (generally).  Each stage takes about the same amount of time as all others: little wasted time. CS61C L29 CPU Design : Pipelining to Improve Performance II (3) Garcia, Spring 2010 © UCB Problems for Pipelining CPUs  Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle  Structural hazards: HW cannot support some combination of instructions (single person to fold and put clothes away)  Control hazards: Pipelining of branches causes later instruction fetches to wait for the result of the branch  Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)  These might result in pipeline stalls or “bubbles” in the pipeline. CS61C L29 CPU Design : Pipelining to Improve Performance II (4) Garcia, Spring 2010 © UCB Read same memory twice in same clock cycle I$!Load Instr 1 Instr 2 Instr 3 Instr 4 ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!I n s t r. O r d e r Time (clock cycles) Structural Hazard #1: Single Memory (1/2) CS61C L29 CPU Design : Pipelining to Improve Performance II (5) Garcia, Spring 2010 © UCB Structural Hazard #1: Single Memory (2/2)  Solution:  infeasible and inefficient to create second memory  (We’ll learn about this more friday/next week)  …so simulate this by having two Level 1 Caches  (a temporary smaller [of usually most recently used] copy of memory)  have both an L1 Instruction Cache and an L1 Data Cache  need more complex hardware to control when both caches miss CS61C L29 CPU Design : Pipelining to Improve Performance II (6) Garcia, Spring 2010 © UCB Structural Hazard #2: Registers (1/2) Can we read and write to registers simultaneously? I$!sw Instr 1 Instr 2 Instr 3 Instr 4 ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!I n s t r. O r d e r Time (clock cycles)CS61C L29 CPU Design : Pipelining to Improve Performance II (7) Garcia, Spring 2010 © UCB Structural Hazard #2: Registers (2/2)  Two different solutions have been used: 1) RegFile access is VERY fast: takes less than half the time of ALU stage  Write to Registers during first half of each clock cycle  Read from Registers during second half of each clock cycle 2) Build RegFile with independent read and write ports  Result: can perform Read and Write during same clock cycle CS61C L29 CPU Design : Pipelining to Improve Performance II (8) Garcia, Spring 2010 © UCB Control Hazard: Branching (1/9) Where do we do the compare for the branch? I$!beq Instr 1 Instr 2 Instr 3 Instr 4 ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!I n s t r. O r d e r Time (clock cycles) CS61C L29 CPU Design : Pipelining to Improve Performance II (9) Garcia, Spring 2010 © UCB Control Hazard: Branching (2/9)  We had put branch decision-making hardware in ALU stage  therefore two more instructions after the branch will always be fetched, whether or not the branch is taken  Desired functionality of a branch  if we do not take the branch, don’t waste any time and continue executing normally  if we take the branch, don’t execute any instructions after the branch, just go to the desired label CS61C L29 CPU Design : Pipelining to Improve Performance II (10) Garcia, Spring 2010 © UCB Control Hazard: Branching (3/9)  Initial Solution: Stall until decision is made  insert “no-op” instructions (those that accomplish nothing, just take time) or hold up the fetch of the next instruction (for 2 cycles).  Drawback: branches take 3 clock cycles each (assuming comparator is put in ALU stage) CS61C L29 CPU Design : Pipelining to Improve Performance II (11) Garcia, Spring 2010 © UCB Control Hazard: Branching (4/9)  Optimization #1:  insert special branch comparator in Stage 2  as soon as instruction is decoded (Opcode identifies it as a branch), immediately make a decision and set the new value of the PC  Benefit: since branch is complete in Stage 2, only one unnecessary instruction is fetched, so only one no-op is needed  Side Note: This means that branches are idle in Stages 3, 4 and 5. CS61C L29 CPU Design : Pipelining to Improve Performance II (12) Garcia, Spring 2010 © UCB Control Hazard: Branching (5/9) Branch comparator moved to Decode stage. I$!beq Instr 1 Instr 2 Instr 3 Instr 4 ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!I n s t r. O r d e r Time (clock cycles)CS61C L29 CPU Design : Pipelining to Improve Performance II (13) Garcia, Spring 2010 © UCB Control Hazard: Branching (6/9)  User inserting no-op instruction add beq nop ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg! I$!I n s t r. O r d e r Time (clock cycles) bubble Impact: 2 clock cycles per branch instruction ⇒ slow lw bubble bubble bubble bubble CS61C L29 CPU Design : Pipelining to Improve Performance II (14) Garcia, Spring 2010 © UCB Control Hazard: Branching (7/9)  Controller inserting a single bubble add beq lw ALU! I$!Reg! D$! Reg!ALU! I$!Reg! D$! Reg!ALU!Reg! D$! Reg! I$!I n s t r. O r d e r Time (clock cycles) bubble Impact: 2 clock cycles per branch instruction ⇒ slow …story about engineer, physicist, mathematician asked to build a fence around a flock of sheep using minimal fence… CS61C L29 CPU Design : Pipelining to Improve Performance II (15) Garcia,


View Full Document

Berkeley COMPSCI 61C - CPU Design

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Download CPU Design
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CPU Design and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CPU Design 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?