Unformatted text preview:

2/1/2006 85G22.2243-001High Performance Computer ArchitectureLecture 3PipeliningFebruary 1, 20062/1/2006 86Outline• Announcements– Should be done with assignment 0 today.– Homework Assignment 1 out today, due back in one week: February 8th– Lab Assignment 1 out today, due back in two weeks: February 15th• Pipelining[ Hennessy/Patterson CA:AQA (3rdEdition): Appendix A]2/1/2006 87Recap• Computers execute billions of instructions, so instruction throughput is what matters• Main idea behind pipelining: Divide instruction execution acrossseveral stages– each stage accesses only a subset of the CPU’s resources• Example: Classic 5-stage RISC pipelineIF ID EX MEM WB• Simultaneously have different instructions in different stages– Ideally, can issue a new instruction every cycle2/1/2006 88Recap (Cont’d)Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5Time (clock cycles)Instr.orderIDIFi ALUIDi+1IFi+2IFMEMALUIDi+3IfetchWBMEMALUBubbleWBMEMBubbleRegIfetchi+4WBBubbleALUReg2/1/2006 89Pipelined Implementation of a RISC ISAInstructionMemoryMUXMUXDataMemoryMUXSignExtendAdderNext SEQ PCNext PCRDRS1RS2ImmMemoryAccessWriteBackInstructionFetchInstr. DecodeReg. FetchExecuteAddr. CalcReg FileALUZero?MUX4IRIF/IDID/EXEX/MEMMEM/WBPCABIRImmNPCNPCIRIRBIRcondAOLDMAO2/1/2006 90Pipeline stage: Instruction Fetch (IF)InstructionMemoryAdderNext PCInstructionFetchMUX4IRIF/IDPCIF/ID.IR Å Mem[PC];IF/ID.NPC,PC Å(if ((EX/MEM.opcode == branch) & EX/MEM.cond) {EX/MEM.ALUOutput} else {PC+4});Branch target address (EX/MEM.ALUOutput register)Branch comparison result (EX/MEM.cond register)NPCIR2/1/2006 91Pipeline stage: Instruction Decode (ID)SignExtendNext SEQ PCRDRS1RS2ImmInstr. DecodeReg. FetchReg FileIF/IDID/EXID/EX.AÅ Regs[IF/ID.IR[rs]]; ID/EX.B ÅRegs[IF/ID.IR[rt];ID/EX.NPCÅIF/ID.NPC; ID/EX.IRÅIF/ID.IR; ID/EX.ImmÅ sign-extend (IF/ID.IR[immediate field])ABNPCIRIRImmNPC2/1/2006 92Pipeline stage: Execute (EX)MUXMUXExecuteAddr. CalcALUZero?ID/EXEX/MEMALU instructionEX/MEM.IR Å ID/EX.IR;EX/MEM.ALUOutput ÅID/EX.A func ID/EX.B;orEX/MEM.ALUOutput ÅID/EX.A op ID/EX.Imm;Load/store instructionEX/MEM.IRÅID/EX.IR;EX/MEM.ALUOutput ÅID/EX.A + ID/EX.Imm;EX/MEM.B Å ID/EX.B;Branch instructionEX/MEM.ALUOutput ÅID/EX.NPC +(ID/EX.Imm <<2);EX/MEM.cond Å(ID/EX.A ==0);IRIRBBAALUOutputcondNPCImm2/1/2006 93Pipeline stage: Memory access (MEM)DataMemoryMemoryAccessEX/MEMMEM/WBALU instruction MEM/WB.IRÅEX/MEM.IR;MEM.WB.ALUOutputÅEX/MEM.ALUOutput;Branch target addressBranch comparison resultcompIRIRALUOutputALUOutputLMDBLoad/store instructionMEM/WB.IRÅEX/MEM.IR;MEM/WB.LMD ÅMem[EX/MEM.ALUOutput];orMem[EX/MEM.ALUOutput]ÅEX/MEM.B;2/1/2006 94Pipeline stage: Write Back (WB)MUXWriteBackMEM/WBALU instructionRegs[MEM/WB.IR[rd]]ÅMEM.WB.ALUOutput; orRegs[MEM/WB.IR[rt]]ÅMEM.WB.ALUOutput; To Register File(reg. identifier & value)ALUOutputLMDIRLoad instruction onlyRegs[MEM/WB.IR[rt]]ÅMEM/WB.LMDPipeline Hazards• Should we expect a CPI of 1 in practice?• Unfortunately, the answer to the question is NO.• Limit to pipelining: Hazards– Prevent next instruction from executing during its designated clock cycle• Three classes of hazardsStructural: Hardware cannot support this combination of instructions - two instructions need the same resource.Data: Instruction depends on result of prior instruction still in the pipelineControl: Pipelining of branches & other instructions that change the PC• Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline– To do this, hardware or software must detect that a hazard has occurred2/1/2006 96Pipeline Hazards (A): Structural Hazards• Occur when two or more instructions need the same resource• Common methods for eliminating structural hazards are:– Duplicate resources – Pipeline the resource– Reorder the instructions• It may be too expensive to eliminate a structural hazard, in which case the pipeline should stall– no new instructions are issued until the hazard has been resolved• What are some examples of structural hazards?2/1/2006 97One Memory Port Structural HazardCycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5Time (clock cycles)Instr.orderRegIfetchi (LD)ALURegi+1Ifetchi+2IfetchDMemALURegi+3Ifetchi+3IfetchRegDMemALUBubbleBubblestallRegDMemBubbleRegIfetchi+4RegBubbleALUReg2/1/2006 98Pipeline Speedup Example: One or Two Memory Ports • Two machines– Machine A: Dual ported memory – Machine B: Single ported memory, but its pipelined implementation has a clock rate that is 1.05 times faster– Ideal CPI = 1 for both– Loads are 40% of instructions executed (cause stalls in machine B)•Which is faster?SpeedupA= (Pipeline Depth/(1 + 0)) x 1= Pipeline DepthSpeedupB= (Pipeline Depth/(1 + 0.4 x 1)) x 1.05= 0.75 x Pipeline DepthMachine A is 1.33 times faster pipelinedunpipelinedTimeCycleTimeCycleCPIstallPipelineCPIIdealdepthPipelineCPIIdealSpeedup×+×=2/1/2006 99Pipeline Hazards (B): Data HazardsThree generic types of data hazards•Read After Write (RAW)–InstrJtries to read operand beforeInstrI(I < J) writes it– Called a dependence•Write After Read (WAR)–InstrJwrites operand before InstrIreads it– Called an anti-dependence• Name dependence (renaming)• No value being transmitted•Write After Write (WAW)–InstrJwrites operand before InstrIwrites it– Called an output dependence• Name dependence (renaming)• No value being transmittedI: add r1,r2,r3J: sub r4,r1,r3I: sub r4,r1,r3 J: add r1,r2,r3I: sub r1,r4,r3 J: add r1,r2,r32/1/2006 100Data Hazards and Pipeline Stalls• Do all kinds of data hazards translate into pipeline stalls?•NO, whether or not a data hazard results in a stall depends on thepipeline structure• For the simple five-stage RISC pipeline– Only RAW hazards result in a pipeline stall• Instruction reading a register needs to wait until it is written– WAR and WAW hazards cannot occur because• All instructions take 5 stages• Reads happen in the 2ndstage (ID)• Writes happen in the 5thstage (WB)• No way for a write from a subsequent instruction to interfere with the read (or write) of a prior instruction• For more complicated pipelines (later in the course)– Both WAR and WAW hazards are possible if instructions execute out of order or access (read) data later in the pipeline2/1/2006 101add r1r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8, r1,r9xor r10,r1,r11RAW Hazards in the 5-stage


View Full Document

NYU CSCI-GA 2243 - Pipelining

Download Pipelining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pipelining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pipelining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?