CS61C Machine Structures Lecture 21 Introduction to Pipelined Execution November 15 2000 David Patterson http www inst eecs berkeley edu cs61c CS61C L21 1 Review 1 3 Datapath is the hardware that performs operations necessary to execute programs Control instructs datapath on what to do next Datapath needs access to storage general purpose registers and memory computational ability ALU helper hardware local registers and PC CS61C L21 2 Review 2 3 Five stages of datapath executing an instruction 1 Instruction Fetch Increment PC 2 Instruction Decode Read Registers 3 ALU Computation 4 Memory Access 5 Write to Registers ALL instructions must go through ALL five stages Datapath designed in hardware CS61C L21 3 4 1 Instruction Fetch CS61C L21 ALU Data memory rd rs rt registers PC instruction memory Review Datapath imm 2 Decode Register Read 3 Execute 4 Memory 5 Write Back 4 Outlin e Pipelining Analogy Pipelining Instruction Execution Hazards Advanced Pipelining Concepts by Analogy CS61C L21 5 Gotta Do Laundry Ann Brian Cathy Dave each have one load of clothes to wash dry fold and put away A B C D Washer takes 30 minutes Dryer takes 30 minutes Folder takes 30 minutes Stasher takes 30 minutes to put clothes into drawers CS61C L21 6 Sequential Laundry 6 PM 7 T a s k A 8 9 10 11 12 1 2 AM 3030 3030 3030 3030 3030 3030 3030 3030 Time B C O r D d e r Sequential laundry takes 8 hours for 4 loads CS61C L21 7 Pipelined Laundry 6 PM 7 8 9 30303030303030 T a A s k B C O D r d e r Pipelined 10 11 12 1 2 AM Time laundry takes 3 5 hours for 4 loads CS61C L21 8 General Definitions Latency time to completely execute a certain task for example time to read a sector from disk is disk access time or disk latency Throughput amount of work that can be done over a period of time CS61C L21 9 Pipelining Lessons 1 2 6 PM T a s k 8 9 Time 3030 30 30 30 3030 A B O r d e r 7 C D CS61C L21 Pipelining doesn t help latency of single task it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup Number pipe stages Time to fill pipeline and time to drain it reduces speedup 2 3X v 4X in this example 10 Pipelining Lessons 2 2 6 PM T a s k 8 9 Time 3030 30 30 30 3030 A B O r d e r 7 C D CS61C L21 Suppose new Washer takes 20 minutes new Stasher takes 20 minutes How much faster is pipeline Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages also reduces speedup 11 Steps in Executing MIPS 1 IFetch Fetch Instruction Increment PC 2 Decode Instruction Read Registers 3 Execute Mem ref Calculate Address Arith log Perform Operation 4 Memory Load Read Data from Memory Store Write Data to Memory 5 Write Back Write Data to Register CS61C L21 12 Pipelined Execution Representation Time IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB IFtch Dcd Exec Mem WB Every instruction must take same number of steps also called pipeline stages so some will go idle sometimes CS61C L21 13 ALU Data memory rd rs rt registers PC instruction memory Review Datapath for MIPS imm 4 Stage 1 1 Instruction Fetch Stage 2 Stage 3Stage 4 Stage 5 5 Write 2 Decode 3 Execute 4 Memory Back Register Read Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB CS61C L21 Reg ALU I D Reg 14 Graphical Pipeline Representation In Reg right half highlight read left half write Time clock cycles Reg Reg D Reg I Reg D Reg I Reg ALU D Reg I Reg ALU I D ALU Reg ALU CS61C L21 I ALU I n s Load t Add r Store O Sub r d Or e r D Reg 15 Example Suppose 2 ns for memory access 2 ns for ALU operation and 1 ns for register file read or write Nonpipelined Execution lw IF Read Reg ALU Memory Write Reg 2 1 2 2 1 8 ns add IF Read Reg ALU Write Reg 2 1 2 1 6 ns Pipelined Execution Max IF Read Reg ALU Memory Write Reg 2 ns CS61C L21 16 Pipeline Hazard Matching socks in later load 6 PM 7 T a A s k B C O D r d E e r F 8 9 30303030303030 10 11 12 1 2 AM Time bubble A depends on D stall since folder tied up CS61C L21 17 Administrivia Rest of 61C of 61C slower pace Rest 1 project 1 lab no more homeworks F 11 17 Performance Cache Sim Project W11 24 X86 PC buzzwords and 61C W11 29 Review Pipelines RAID Lab F 12 1 Review Caches TLB VM Section 7 5 M 12 4 Deadline to correct your grade record W 12 6 Review Interrupts A 7 Feedback lab F 12 8 61C Summary Your Cal heritage HKN Course Evaluation Sun 12 10 Tues 12 12 CS61C L21 Final Review 2PM 155 Dwinelle Final 5PM 1 Pimintel 18 Problems for Computers Limits to pipelining Hazards prevent next instruction from executing during its designated clock cycle Structural hazards HW cannot support this combination of instructions single person to fold and put clothes away Control hazards Pipelining of branches other instructions stall the pipeline until the hazard bubbles in the pipeline Data hazards Instruction depends on result of prior instruction still in the pipeline missing sock CS61C L21 19 Structural Hazard 1 Single Memory 1 2 Time clock cycles ALU I n I D Reg Reg Load s I D Reg Reg t Instr 1 r I D Reg Reg Instr 2 O I D Reg Reg Instr 3 r I D Reg Reg d Instr 4 e r Read same memory twice in same clock cycle ALU ALU ALU ALU CS61C L21 20 Structural Hazard 1 Single Memory 2 2 Solution infeasible and inefficient to create second memory so simulate this by having two Level 1 Caches have both an L1 Instruction Cache and an L1 Data Cache need more complex hardware to control when both caches miss CS61C L21 21 Structural Hazard 2 Registers 1 2 Reg Reg D Reg I Reg D Reg I Reg D Reg I Reg ALU I D ALU Reg ALU I ALU O Instr 2 r Instr 3 d e Instr 4 r Time clock cycles ALU I n s t Load r Instr 1 D Reg Can t read and write to registers simultaneously CS61C L21 22 Structural Hazard 2 Registers 2 2 Fact Register access is VERY fast takes less than half the time of ALU stage Solution introduce convention always Write to Registers during first half of each clock cycle always Read from Registers during second half of each clock cycle Result can perform Read and Write during same clock cycle …
View Full Document
Unlocking...