Outline CPE 631 Review Pipelining Pipelined Execution 5 Steps in MIPS Datapath Pipeline Hazards Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic milenka ece uah edu http www ece uah edu milenka Structural Data Control AM 2 LaCASA Laundry Example by David Patterson Sequential Laundry 6 PM Four loads of clothes A B C D Task each one to wash dry and fold Resources A B C Washer takes 30 minutes Dryer takes 40 minutes T a s k AM AM LaCASA 8 9 10 11 Midnight Time D 7 O r d e r Folder takes 20 minutes 3 LaCASA 30 40 20 30 40 20 30 40 20 30 40 20 A B C D Sequential laundry takes 6 hours for 4 loads If they learned pipelining how long would laundry take 4 1 Pipelined Laundry Pipelining Lessons 6 PM Pipelined laundry takes 3 5 hours for 4 loads 6 PM 7 8 9 10 11 Midnight Time T a s k A C B D 6 AM LaCASA A Typical RISC Execute billions of instructions so throughput is what matters What is desirable in instruction sets for pipelining Pipelining doesn t help latency of single task it helps throughput of entire workload Pipeline rate is limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain reduce speedup LaCASA Computer Pipelines C AM 5 A D LaCASA 9 30 40 40 40 40 20 O r d e r B O r d e AM r 8 Time T a s k 30 40 40 40 40 20 7 Registers Data types Variable length instructions vs all instructions same length Memory operands part of any operation vs memory operands only in loads or stores Register operand many places in instruction format vs registers located in same place 7 LaCASA 8 bit bytes 16 bit half words 32 bit words 64 bit double words for integer data 32 bit single or 64 bit double precision numbers Addressing Modes for MIPS Data Transfers AM 32 64 bit general purpose integer registers R0 R31 32 64 bit floating point registers F0 F31 Load store architecture Immediate Displacement Memory is byte addressable with a 64 bit address Mode bit to select Big Endian or Little Endian 8 2 MIPS64 Instruction Formats 2120 16 15 Rd Rt Register Immediate 31 26 25 2120 16 15 Op Rs Rt 1110 65 shamt 0 funct 0 immediate Jump Call 31 26 25 0 address Op AM LaCASA Floating point FR 31 26 25 2120 16 15 Op Fs Fmt Ft 6 5 1110 Fd funct Floating point FI 31 26 25 2120 16 15 Op Fmt Ft immediate 0 AM 0 9 10 5 Steps of Simple RISC Datapath cont d Instr Decode Reg Fetch Execute Addr Calc Sign Extend RD RD RD MUX MEM WB Data Memory ALU EX MEM MUX MUX ID EX Reg File IF ID AM WB Data LaCASA Memory Imm Sign Extend Write Back Zero RS1 RS2 Memory Access Next SEQ PC Adder L M D MUX Data Memory ALU AM MUX MUX Imm Reg File Inst Memory Address RD Next SEQ PC 4 Zero RS1 Execute Addr Calc Instr Decode Reg Fetch Next PC Address Adder Next SEQ PC RS2 Instruction Fetch Write Back MUX Next PC Memory Access MUX Instruction Fetch Data Transfers LB LBU SB LH LHU SH LW LWU SW LD SD L S L D S S S D MFCO MTCO MOV S MOV D MFC1 MTC1 Arithmetic Logical DADD DADDI DADDU DADDIU DSUB DSUBU DMUL DMULU DDIV DDIVU MADD AND ANDI OR ORI XOR XORI LUI DSLL DSRL DSRA DSLLV DSRLV DSRAV SLT SLTI SLTU SLTIU Control BEQZ BNEZ BEQ BNE BC1T BC1F MOVN MOVZ J JR JAL JALR TRAP ERET Floating Point ADD D ADD S ADD PS SUB D SUB S SUB PS MUL D MUL S MUL PS MADD D MADD S MADD PS DIV D DIV S DIV PS CVT C D C S LaCASA 5 Steps of Simple RISC Datapath 4 MIPS Operations See Appendix B Figure B 26 WB Data Register Register 31 26 25 Op Rs MIPS64 Instructions Data stationary control 11 LaCASA local decode for each instruction phase pipeline stage 12 3 Visualizing Pipeline Instruction Flow through Pipeline CC 1 DM Reg Time clock cycles CC 6 CC 1 CC 7 IM ALU DM Reg ALU Reg DM ALU Reg DM LaCASA IF ID IR Mem PC if EX MEM cond IF ID NPC PC EX MEM ALUOUT else IF ID NPC PC PC 4 ALU Stage ID Reg 14 Simple RISC Pipeline Definition IE Stage IF Add R1 R2 R3 LaCASA Simple RISC Pipeline Definition IF ID DM 13 ALU Lw R4 0 R2 Nop Reg AM Reg Add R1 R2 R3 Nop Nop DM Sub R6 R5 R7 ALU DM Reg Reg IM LaCASA AM Nop Nop Reg Xor R9 R8 R1 Lw R4 0 R2 ALU DM AM Add R1 R2 R3 Nop ALU IM Sub R6 R5 R7 Reg Reg CC 4 Reg Reg IM O r d e r Lw R4 0 R2 Add R1 R2 R3 Reg I n s t r CC 3 CC 2 IM CC 5 IM Reg CC 4 IM CC 3 IM CC 2 ALU Time clock cycles load store ID EX A Regs IF ID IR6 10 ID EX B Regs IF ID IR11 15 ID EX Imm IF ID IR16 16 IF ID IR16 31 ID EX NPC IF ID NPC ID EX IR IF ID IR AM LaCASA EX MEM IR ID EX IR EX MEM B ID EX B EX MEM ALUOUT ID EX A ID EX Imm EX MEM cond 0 branch 15 EX MEM IR ID EX IR EX MEM ALUOUT ID EX A func ID EX B or EX MEM ALUOUT ID EX A func ID EX Imm EX MEM cond 0 EX MEM Aluout ID EX NPC ID EX Imm 2 EX MEM cond ID EX A func 0 16 4 Simple RISC Pipeline Def MEM WB Stage MEM MEM WB IR EX MEM IR MEM WB ALUOUT EX MEM ALUOUT MEM WB IR EX MEM IR MEM WB LMD Mem EX MEM ALUOUT or Mem EX MEM ALUOUT EX MEM B Stage WB ALU AM Regs MEM WB IR16 20 MEM WB ALUOUT or Regs MEM WB IR11 15 MEM WB ALUOUT AM load Structural hazards HW cannot support this combination of instructions Data hazards Instruction depends on result of prior instruction still in the pipeline Control hazards Caused by delay between the fetching of instructions and decisions about changes in control flow branches and jumps Regs MEM WB IR11 15 MEM WB LMD 17 LaCASA 18 LaCASA One Memory Port Structural Hazards One Memory Port Structural Hazards cont d Time clock cycles Time clock cycles DMem Reg Reg Ifetch Reg DMem Reg ALU Ifetch DMem ALU Ifetch Reg Instr 2 O r d Stall e AM r Instr 3 Reg DMem Reg 19 LaCASA Reg DMem Ifetch Reg DMem Reg Ifetch Bubble Reg …
View Full Document
Unlocking...