Administrivia Homework problems for Unit 1 due Thursday CMSC 411 Computer Systems y Architecture Lecture 4 Basic Pipelining Alan Sussman a s cs u d edu als cs umd edu 2 CMSC 411 4 from Patterson 5 Steps of MIPS Datapath 5 Steps of MIPS Datapath Figure A 17 Page A A 29 29 Figure g A 18 Page g A 31 Instr Decode Reg Fetch Execute Addr Calc MUX MUX MEM W WB D Data Me emory EX ME EM ALU U Imm MUX MUX M RS2 ID EX X WB Data Write Back Zero Reg File A Reg IRrs B Reg IRrt PC PC 4 IF ID D IR mem PC PC PC 4 Memory Access Next SEQ PC RS1 Memorry Addresss MUX L M D Sign Si Extend Reg IRrd Reg IRrs opIRop Reg IRrt Next SEQ PC 4 Data Memory M AL LU MUX MUX Imm Reg File F Instt Memory Addrress IR mem PC RD Execute Addr Calc Instr Decode Reg Fetch Next PC Zero RS1 RS2 Instruction Fetch Adder Next SEQ PC Adde er 4 Write Back MUX Next PC Memory Access WB Data Instruction Fetch Sign Extend RD RD RD rslt A op pIRop B WB rslt Reg IRrd WB CMSC 411 4 from Patterson 3 CMSC 411 4 from Patterson 4 Visualizing Pipelining Inst Set Processor Controller Figure A 2 Page A A 8 8 Time clock cycles IR mem PC Ifetch br if bop A b PC IRjaddr RI RR r A opIRop B r A opIRop IRim LD r A IRim Reg IRrd WB WB r Reg IRrd WB CMSC 411 4 from Patterson DMem Ifetch Reg DMem Reg DMem Reg O r d e r PC PC IRim WB r Reg ALU B Reg IRrt jmp Ifetch ALU opFetch DCD A Reg IRrs ALU I n s t r ALU C l 1 Cycle Cycle C l 2 Cycle C l 3 Cycle C l 4 Cycle C l 5 Cycle C l 6 Cycle C l 7 PC PC 4 WB Mem r Ifetch Ifetch Reg Reg Reg DMem Reg Reg IRrd WB 5 6 CMSC 411 4 from Patterson One Memory Port Structural Hazards Pipelining is not quite that easy Figure A 4 Page A A 14 14 Time clock cycles CMSC 411 4 from Patterson 7 Instr 3 Instr 4 Reg DMem Reg DMem Reg DMem Reg A ALU Instr 2 Ifetch DMem ALU O r d e r Reg ALU I Load Ifetch n s Instr 1 t r ALU Structural hazards HW cannot support this combination of instructions single person to fold and put clothes away Data hazards Instruction depends on result of prior instruction still in the pipeline missing sock Control hazards Caused by delay between the fetching of instructions and decisions about changes in control flow branches and jumps Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 ALU Limits to pipelining Hazards prevent next instruction from executing during its designated clock cycle Ifetch Ifetch Reg Ifetch CMSC 411 4 from Patterson Reg Reg Reg Reg g DMem 8 One Memory Port Structural Hazards Speed Up Equation for Pipelining Similar to Figure A 5 Page A 15 Time clock cycles Stall Reg DMem Reg AL LU I t 2 Instr Ifetch DMem Ifetch Bubble CPIpipelined Ideal CPI Average Stall cycles per Inst Reg Reg DMem Bubble Bubble Ifetch Instr 3 Reg Speedup Reg Bubble ALU O r d e r Reg ALU I Load Ifetch n s Instr 1 t r ALU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle Timeunpipelined Ideal CPI Pipeline depth Ideal CPI Pipeline stall CPI Cycle Timepipelined For simple RISC pipeline ideal CPI 1 Bubble Speedup Reg DMem Cycle Timeunpipelined Pipeline depth Cycle Timeppipelined 1 Pipeline stall CPI p How do you bubble the pipe CMSC 411 4 from Patterson 9 10 CMSC 411 4 from Patterson Data Hazard on R1 Figure A 6 Page A 16 Example p Dual port p vs Single port g p CMSC 411 4 from Patterson Ifetch sub r4 r1 r3 and r6 r1 r7 or r8 r1 r9 Reg Ifetch DMem Reg DMem Ifetch Reg DMem Ifetch Reg DMem Ifetch Reg g A ALU O r d e r add r1 r2 r3 xor r10 r10 r1 r11 r1 r11 11 WB ALU U I n s t r MEM ALU Machine A is 1 33 times faster IF ID RF EX ALU SpeedUpA Pipeline Depth 1 0 x clockunpipe clockpipe Pipeline Depth SpeedUpB Pipeline Depth 1 0 4 x 1 x clockunpipe clockunpipe 1 05 Pipeline Depth 1 4 x 1 05 0 75 0 75 x Pi Pipeline li Depth D th SpeedUpA SpeedUpB Pipeline Depth 0 75 x Pipeline Depth 1 33 Time clock cycles ALU Machine A Dual ported memory Harvard Architecture Machine B Single ported memory memory but its pipelined implementation has a 1 05 times faster clock rate Ideal CPI 1 for both Loads are 40 of instructions executed CMSC 411 4 from Patterson Reg Reg Reg Reg DMem 12 Reg g Three Generic Data Hazards Three Generic Data Hazards Write After Read WAR I t J writes Instr it operand d before b f I t I reads Instr d it Read ead After te Write te RAW InstrJ tries to read operand before InstrI writes it I sub r4 r1 r3 J add r1 r1 r2 r3 r2 r3 K mul r6 r1 r7 I add r1 r2 r3 J sub r4 r1 r3 Called an anti dependence p by y compiler p writers This results from reuse of the name r1 Caused by a Dependence in compiler nomenclature This hazard results from an actual need for communication communication 13 Forwarding to Avoid Data Hazard Three Generic Data Hazards Figure A 7 Page A 18 sub r4 r1 r3 and r6 r1 r7 6 1 or r8 r8 r1 r9 r1 r9 Reg Ifetch DMem Reg DMem Ifetch Reg DMem Ifetch Reg DMem Ifetch Reg ALU O r d e r Called an output dependence by compiler writers This also results from the reuse of name r1 Can t happen in MIPS 5 stage pipeline because All instructions take 5 stages stages and Writes are always in stage 5 Will see WAR and WAW in more complicated pipes add r1 r2 r3 Ifetch ALU I sub r1 r4 r3 J add r1 r2 r3 K mul r6 r1 r7 AL LU I n s t r ALU Time clock cycles Write After Write WAW InstrJ writes operand before InstrI writes it CMSC 411 4 from Patterson 14 CMSC 411 4 from Patterson ALU CMSC 411 4 from Patterson Can t Can t happen in MIPS 5 stage pipeline because All instructions take 5 stages and Reads are always in stage 2 and Writes are always in stage 5 xor r10 r1 r11 15 CMSC 411 4 from Patterson Reg Reg Reg Reg DMem 16 Reg HW Change for Forwarding Figure g A 23 Page g A 37 NextPC mux M MEM WR EX X MEM A ALU mux I ID …
View Full Document