ISU CPRE 381 - Pipelining - D309597

Home> Schools> Iowa State University> Computer Engineering (CPRE) > CPRE 381> Pipelining

DOC PREVIEW

ISU CPRE 381 - Pipelining

School name Iowa State University

Course Cpre 381- Cptr Org &Asmb Prog

Pages 44

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Pipelining• Reconsider the data path we just did• Each instruction takes from 3 to 5 clock cycles• However, there are parts of hardware that are idle many time• We can reorganize the operation• Make each hardware block independent– 1. Instruction Fetch Unit– 2. Register Read Unit– 3. ALU Unit– 4. Data Memory Read/Write Unit– 5. Register Write Unit• Units in 3 and 5 cannot be independent, but operations can be• Let each unit just do its required job for each instruction• If for some instruction, a unit need not do anything, it can simply perform a noop2Gain of Pipelining• Improve performance by increasing instruction throughput• Ideal speedup is number of stages in the pipeline• Do we achieve this? No, why not?Instruction fetchReg ALUData accessReg8 nsInstruction fetchReg ALUData accessReg8 nsInstruction fetch 8 nsTimelw $1, 100($0)lw $2, 200($0)lw $3, 300($0)2 4 6 8 10 12 14 16 182 4 6 8 10 1214...Program execution order (in instructions)Instruction fetchReg ALUData accessRegTimelw $1, 100($0)lw $2, 200($0)lw $3, 300($0)2 nsInstruction fetchReg ALUData accessReg2 nsInstruction fetchReg ALUData accessReg2 ns 2 ns 2 ns 2 ns 2 ns Program execution order (in instructions)3Pipelining• What makes it easy– all instructions are the same length– just a few instruction formats– memory operands appear only in loads and stores• What makes it hard?– structural hazards: suppose we had only one memory– control hazards: need to worry about branch instructions– data hazards: an instruction depends on a previous instruction• We’ll study these issues using a simple pipeline• Other complication:– exception handling– trying to improve performance with out-of-order execution, etc.4Basic Idea• What do we need to add to actually split the datapathinto stages?Instruction memoryAddress4320AddAdd resultShift left 2InstructionM u x01AddPC0Write dataM u x1RegistersRead data 1Read data 2Read register 1Read register 216Sign extendWrite registerWrite dataRead dataAddressData memory1ALU resultM u xALUZeroIF: Instruction fetch ID: Instruction decode/ register file readEX: Execute/ address calculationMEM: Memory access WB: Write back5Pipelined Data PathCan you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?Instruction memoryAddress4320AddAdd resultShift left 2InstructionIF/ID EX/MEM MEM/WBM u x01AddPC0Write dataM u x1RegistersRead data 1Read data 2Read register 1Read register 216Sign extendWrite registerWrite dataRead data1ALU resultM u xALUZeroID/EXData memoryAddress6Corrected Data PathInstruction memoryAddress4320AddAdd resultShift left 2InstructionIF/ID EX/MEM MEM/WBM u x01AddPC0AddressWrite dataM u x1RegistersRead data 1Read data 2Read register 1Read register 216Sign extendWrite registerWrite dataRead dataData memory1ALU resultM u xALUZeroID/EX7Execution Time• Time of n instructions depends on – Number of instructions n– # of stages k– # of control hazard and penalty of each step– # of data hazards and penalty for each• Time = n + k -1 + load hazard penalty + branch penalty• Load hazard penalty is 1 or 0 cycle – depending on data use with forwarding• branch penalty is 3, 2, 1, or zero cycles depending on scheme8Design and Performance Issues With Pipelining• Pipelined processors are not EASY to design• Technology affect implementation• Instruction set design affect the performance, i.e., beq, bne• More stages do not lead to higher performance9Pipeline Operation• In pipeline one operation begins in every cycle• Also, one operation completes in each cycle• Each instruction takes 5 clock cycles (k cycles in general)• When a stage is not used, no control needs to be applied• In one clock cycle, several instructions are active • Different stages are executing different instructions• How to generate control signals for them is an issue10Graphically Representing Pipelines• Can help with answering questions like:– how many cycles does it take to execute this code?– what is the ALU doing during cycle 4?– use this representation to help understand datapathsIM Reg DM RegIM Reg DM RegCC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)lw $10, 20($1)Program execution order (in instructions)sub $11, $2, $3ALUALU11Instruction Format31 26 25 21 20 16 15 11 10 6 5 0JUMPJUMP ADDRESS31 26 25 21 20 16 15 11 10 6 5 0REG 1 REG 2BEQ/BNEBRANCH ADDRESS OFFSET31 26 25 21 20 16 15 11 10 6 5 0REG 1 REG 2SWSTORE ADDRESS OFFSET31 26 25 21 20 16 15 11 10 6 5 0REG 1 REG 2LWLOAD ADDRESS OFFSET31 26 25 21 20 16 15 11 10 6 5 0REG 1 REG 2 DSTR-TYPE SHIFT AMOUNT ADD/AND/OR/SLT12Operation for Each Instruction LW:1. READ INST2. READ REG 1READ REG 23. ADD REG 1 + OFFSET 4. READ MEM5. WRITE REG2SW:1. READ INST2. READ REG 1READ REG 23. ADD REG 1 + OFFSET 4. WRITE MEM5. R-Type:1. READ INST2. READ REG 1READ REG 23. OPERATE on REG 1 / REG 2 4. 5. WRITE DSTBR-Type:1. READ INST2. READ REG 1READ REG 23. SUB REG 2 from REG 14. 5. JMP-Type:1. READ INST2. 3. 4. 5.13Pipeline Data Path OperationPC4ADDINSTMEMORYIAINST31-00MUXMUXMUXControl20-0031-26REG FILE25-21 RA120-16 RA2RD1 RD2 WA WDMUXSignExtShiftLeft 2MUXMUXMUX20-1615-11ALUADD15-00MUXMEMWDADDR14Fetch UnitPC4ADDINSTMEMORYIAINST31-00MUXMUXMUXNPCINSTJump AddressJump Register AddressBranch Address15Register Fetch UnitControl20-0031-26REG FILE25-21 RA120-16 RA2RD1 RD2 WA WDNPCINST16ALU Operation and Branch LogicMUXSignExtShiftLeft 2MUXMUXMUX20-1615-11ALUADD15-00RD1RD2INST 20-00Branch addressRegWrite AddressWrite DataALU OUTPUT17Memory and Write back StageMUXMEMWDADDRWRITE DATAADDRData ReadData ALU18Pipeline Data Path OperationPC4ADDINSTMEMORYIAINST31-00MUXMUXMUXControl20-0031-26REG FILE25-21 RA120-16 RA2RD1 RD2 WA WDMUXSignExtShiftLeft

View Full Document