Unformatted text preview:

9/21/2007 1G22.2243-001High Performance Computer ArchitectureLecture 3PipeliningSeptember 19, 20079/21/2007 2Outline• Announcements– Homework Assignment 1 due today – Mini Quiz next week: Sep. 26thTo make sure we are on the same page with the basics; Covers thefollowing:• Chapter 1• Appendix A• Appendix B • Pipelining[ Hennessy/Patterson CA:AQA (4thEdition): Appendix A]9/21/2007 3Recap• Computers execute billions of instructions, so instruction throughput is what matters• Main idea behind pipelining: Divide instruction execution acrossseveral stages– each stage accesses only a subset of the CPU’s resources• Example: Classic 5-stage RISC pipelineIF ID EX MEM WB• Simultaneously have different instructions in different stages– Ideally, can issue a new instruction every cycle9/21/2007 4Recap (Cont’d)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5Time (clock cycles)Instr.orderIDIFi ALUIDi+1IFi+2IFMEMALUIDi+3IfetchWBMEMALUBubbleWBMEMBubbleRegIfetchi+4WBBubbleALUReg9/21/2007 5Pipelined Implementation of a RISC ISAInstructionMemoryMUXMUXDataMemoryMUXSignExtendAdderNext SEQ PCNext PCRDRS1RS2ImmMemoryAccessWriteBackInstructionFetchInstr. DecodeReg. FetchExecuteAddr. CalcReg FileALUZero?MUX4IRIF/IDID/EXEX/MEMMEM/WBPCABIRImmNPCNPCIRIRBIRcondAOLDMAO9/21/2007 6Pipeline stage: Instruction Fetch (IF)InstructionMemoryAdderNext PCInstructionFetchMUX4IRIF/IDPCIF/ID.IR Å Mem[PC];IF/ID.NPC,PC Å(if ((EX/MEM.opcode == branch) & EX/MEM.cond) {EX/MEM.ALUOutput} else {PC+4});Branch target address (EX/MEM.ALUOutput register)Branch comparison result (EX/MEM.cond register)NPCIR9/21/2007 7Pipeline stage: Instruction Decode (ID)SignExtendNext SEQ PCRDRS1RS2ImmInstr. DecodeReg. FetchReg FileIF/IDID/EXID/EX.AÅ Regs[IF/ID.IR[rs]]; ID/EX.B ÅRegs[IF/ID.IR[rt];ID/EX.NPCÅIF/ID.NPC; ID/EX.IRÅIF/ID.IR; ID/EX.ImmÅ sign-extend (IF/ID.IR[immediate field])ABNPCIRIRImmNPC9/21/2007 8Pipeline stage: Execute (EX)MUXMUXExecuteAddr. CalcALUZero?ID/EXEX/MEMALU instructionEX/MEM.IR Å ID/EX.IR;EX/MEM.ALUOutput ÅID/EX.A func ID/EX.B;orEX/MEM.ALUOutput ÅID/EX.A op ID/EX.Imm;Load/store instructionEX/MEM.IRÅID/EX.IR;EX/MEM.ALUOutput ÅID/EX.A + ID/EX.Imm;EX/MEM.B Å ID/EX.B;Branch instructionEX/MEM.ALUOutput ÅID/EX.NPC +(ID/EX.Imm <<2);EX/MEM.cond Å(ID/EX.A ==0);IRIRBBAALUOutputcondNPCImm9/21/2007 9Pipeline stage: Memory access (MEM)DataMemoryMemoryAccessEX/MEMMEM/WBALU instruction MEM/WB.IRÅEX/MEM.IR;MEM.WB.ALUOutputÅEX/MEM.ALUOutput;Branch target addressBranch comparison resultcompIRIRALUOutputALUOutputLMDBLoad/store instructionMEM/WB.IRÅEX/MEM.IR;MEM/WB.LMD ÅMem[EX/MEM.ALUOutput];orMem[EX/MEM.ALUOutput]ÅEX/MEM.B;9/21/2007 10Pipeline stage: Write Back (WB)MUXWriteBackMEM/WBALU instructionRegs[MEM/WB.IR[rd]]ÅMEM.WB.ALUOutput; orRegs[MEM/WB.IR[rt]]ÅMEM.WB.ALUOutput; To Register File(reg. identifier & value)ALUOutputLMDIRLoad instruction onlyRegs[MEM/WB.IR[rt]]ÅMEM/WB.LMDPipeline Hazards• Should we expect a CPI of 1 in practice?• Unfortunately, the answer to the question is NO.• Limit to pipelining: Hazards– Prevent next instruction from executing during its designated clock cycle• Three classes of hazardsStructural: Hardware cannot support this combination of instructions - two instructions need the same resource.Data: Instruction depends on result of prior instruction still in the pipelineControl: Pipelining of branches & other instructions that change the PC• Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline– To do this, hardware or software must detect that a hazard has occurred9/21/2007 12Pipeline Hazards (A): Structural Hazards• Occur when two or more instructions need the same resource• Common methods for eliminating structural hazards are:– Duplicate resources – Pipeline the resource– Reorder the instructions• It may be too expensive to eliminate a structural hazard, in which case the pipeline should stall– no new instructions are issued until the hazard has been resolved• What are some examples of structural hazards?9/21/2007 13One Memory Port Structural HazardCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5Time (clock cycles)Instr.orderRegIfetchi (LD)ALURegi+1Ifetchi+2IfetchDMemALURegi+3Ifetchi+3IfetchRegDMemALUBubbleBubblestallRegDMemBubbleRegIfetchi+4RegBubbleALUReg9/21/2007 14Pipeline Speedup Example: One or Two Memory Ports • Two machines– Machine A: Dual ported memory – Machine B: Single ported memory, but its pipelined implementation has a clock rate that is 1.05 times faster– Ideal CPI = 1 for both– Loads are 40% of instructions executed (cause stalls in machine B)•Which is faster?SpeedupA= (Pipeline Depth/(1 + 0)) x 1= Pipeline DepthSpeedupB= (Pipeline Depth/(1 + 0.4 x 1)) x 1.05= 0.75 x Pipeline DepthMachine A is 1.33 times faster pipelinedunpipelinedTimeCycleTimeCycleCPIstallPipelineCPIIdealdepthPipelineCPIIdealSpeedup×+×=9/21/2007 15Pipeline Hazards (B): Data HazardsThree generic types of data hazards•Read After Write (RAW)–InstrJtries to read operand beforeInstrI(I < J) writes it– Called a dependence•Write After Read (WAR)–InstrJwrites operand before InstrIreads it– Called an anti-dependence• Name dependence (renaming)• No value being transmitted•Write After Write (WAW)–InstrJwrites operand before InstrIwrites it– Called an output dependence• Name dependence (renaming)• No value being transmittedI: add r1,r2,r3J: sub r4,r1,r3I: sub r4,r1,r3 J: add r1,r2,r3I: sub r1,r4,r3 J: add r1,r2,r39/21/2007 16Data Hazards and Pipeline Stalls• Do all kinds of data hazards translate into pipeline stalls?•NO, whether or not a data hazard results in a stall depends on thepipeline structure• For the simple five-stage RISC pipeline– Only RAW hazards result in a pipeline stall• Instruction reading a register needs to wait until it is written– WAR and WAW hazards cannot occur because• All instructions take 5 stages• Reads happen in the 2ndstage (ID)• Writes happen in the 5thstage (WB)• No way for a write from a subsequent instruction to interfere with the read (or write) of a prior instruction• For more complicated pipelines (later in the course)– Both WAR and WAW hazards are possible if instructions execute out of order or access (read) data later in the pipeline9/21/2007 17add r1r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8, r1,r9xor r10,r1,r11RAW Hazards in the 5-stage


View Full Document

NYU CSCI-GA 2243 - Pipelining

Download Pipelining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?