DOC PREVIEW
U of I CS 232 - Stalls and flushes

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stalls and flushesData hazard reviewForwarding to the rescue!What about loads?StallingStalling and forwardingStalling delays the entire pipelineSlide 9What about EXE, MEM, WBStall = Nop conversionSlide 13Detecting stallsDetecting Stalls, cont.Adding hazard detection to the CPUGeneralizing Forwarding/StallingBranches in the original pipelined datapathBranchesStalling is one solutionBranch predictionBranch mispredictionPerformance gains and lossesImplementing branchesImplementing flushesBranching without forwarding and load stallsSummaryJanuary 14, 2019 ©2003 Craig Zilles (derived from slides by Howard Huang)1Stalls and flushesLast time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing. —Many hazards can be resolved by forwarding data from the pipeline registers, instead of waiting for the writeback stage.—The pipeline continues running at full speed, with one instruction beginning on every clock cycle.Today we’ll see some real limitations of pipelining.—Forwarding may not work for data hazards from load instructions.—Branches affect the instruction fetch for the next clock cycle.In both of these cases we may need to slow down, or stall, the pipeline.January 14, 2019 Stalls and flushes 2Data hazard reviewA data hazard arises if one instruction needs data that isn’t ready yet.—Below, the AND and OR both need to read register $2.—But $2 isn’t updated by SUB until the fifth clock cycle.Dependency arrows that point backwards indicate hazards. DM Reg RegIM DM Reg RegIM DM Reg RegIMsub $2, $1, $3and $12, $2, $5or $13, $6, $2Clock cycle1 2 3 4 5 6 7January 14, 2019 Stalls and flushes 3Forwarding to the rescue!The desired value ($1 - $3) has actually already been computed—it just hasn’t been written to the registers yet.Forwarding allows other instructions to read ALU results directly from the pipeline registers, without going through the register file. DM Reg RegIM DM Reg RegIM DM Reg RegIMsub $2, $1, $3and $12, $2, $5or $13, $6, $2Clock cycle1 2 3 4 5 6 7January 14, 2019 Stalls and flushes 4What about loads?Imagine if the first instruction in the example was LW instead of SUB.—How does this change the data hazard? DM Reg RegIM DM Reg RegIMlw $2, 20($3)and $12, $2, $5Clock cycle1 2 3 4 5 6January 14, 2019 Stalls and flushes 6StallingThe easiest solution is to stall the pipeline.We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes called a bubble.Notice that we’re still using forwarding in cycle 5, to get data from the MEM/WB pipeline register to the ALU. DM Reg RegIM DM Reg RegIMlw $2, 20($3)and $12, $2, $5Clock cycle1 2 3 4 5 6 7January 14, 2019 Stalls and flushes 7Stalling and forwardingWithout forwarding, we’d have to stall for two cycles to wait for the LW instruction’s writeback stage.In general, you can always stall to avoid hazards—but dependencies are very common in real code, and stalling often can reduce performance by a significant amount. DM Reg RegIM DM Reg RegIMlw $2, 20($3)and $12, $2, $5Clock cycle1 2 3 4 5 6 7 8January 14, 2019 Stalls and flushes 8Stalling delays the entire pipelineIf we delay the second instruction, we’ll have to delay the third one too.—Why? (two reasons) DM Reg RegIM DM Reg RegIM DMReg RegIMlw $2, 20($3)and $12, $2, $5or $13, $12, $2Clock cycle1 2 3 4 5 6 7 8January 14, 2019 Stalls and flushes 9Stalling delays the entire pipelineIf we delay the second instruction, we’ll have to delay the third one too.—This is necessary to make forwarding work between AND and OR.—It also prevents problems such as two instructions trying to write to the same register in the same cycle. DM Reg RegIM DM Reg RegIM DMReg RegIMlw $2, 20($3)and $12, $2, $5or $13, $12, $2Clock cycle1 2 3 4 5 6 7 8January 14, 2019 Stalls and flushes 11But what about the ALU during cycle 4, the data memory in cycle 5, and the register file write in cycle 6?Those units aren’t used in those cycles because of the stall, so we can set the EX, MEM and WB control signals to all 0s.RegWhat about EXE, MEM, WB DM Reg RegIM RegIMIMlw $2, 20($3)and $12, $2, $5or $13, $12, $2 DMReg RegIM DM RegClock cycle1 2 3 4 5 6 7 8January 14, 2019 Stalls and flushes 12Stall = Nop conversion The effect of a load stall is to insert an empty or nop instruction into the pipeline DM Reg RegIM RegIMIMlw $2, 20($3)and -> nopand $12, $2, $5or $13, $12, $2 DMReg RegIM DMReg RegClock cycle1 2 3 4 5 6 7 8 DM RegJanuary 14, 2019 Stalls and flushes 13Stall = Nop conversion The effect of a load stall is to insert an empty or nop (“no operation”) instruction into the pipeline DM Reg RegIM RegIMIMlw $2, 20($3)and -> nopand $12, $2, $5or $13, $12, $2 DMReg RegIM DMReg RegClock cycle1 2 3 4 5 6 7 8 DM RegJanuary 14, 2019 Stalls and flushes 14Detecting stallsDetecting stall is much like detecting data hazards.Recall the format of hazard detection equations:if (EX/MEM.RegWrite = 1and EX/MEM.RegisterRd = ID/EX.RegisterRs)then Bypass Rs from EX/MEM stage latch DM Reg RegIM DM Reg RegIMsub $2, $1, $3and $12, $2, $5id/exif/idex/memmem\wbid/exif/idex/memmem\wbJanuary 14, 2019 Stalls and flushes 15Detecting Stalls, cont.When should stalls be detected?Reg DM Reg RegIM RegIMlw $2, 20($3)and $12, $2, $5 DM Regid/exif/idex/memmem\wbid/exif/idex/memmem\wbif/idWhat is the stall condition?if ()then stallJanuary 14, 2019 Stalls and flushes 17Adding hazard detection to the CPU 0 1AddrInstructionmemoryInstr Address Write dataDatamemoryReaddata10PC ExtendALUSrcResultZeroALUInstr [15 - 0]RegDstReadregister 1Readregister 2WriteregisterWritedataReaddata 2Readdata 1RegistersRdRt01 IF/IDID/EXEX/MEMMEM/WBEX MWB Control MWBWBRs012012ForwardingUnit EX/MEM.RegisterRdMEM/WB.RegisterRdHazardUnitApril 2, 2003 Stalls and flushes 20Generalizing Forwarding/StallingWhat if data memory access was so slow, we wanted to pipeline it over 2 cycles?How many bypass inputs would the muxes in EXE have?Which instructions in the following require stalling and/or bypassing?lw r13, 0(r11)add r7, r8, r9add r15, r7, r13Clock cycle 1 2 3 4 5 6 DM RegIM RegJanuary 14, 2019 Stalls and flushes 21Branches in the original pipelined datapathReadaddressInstructionmemoryInstruction[31-0]AddressWritedata Data


View Full Document

U of I CS 232 - Stalls and flushes

Documents in this Course
Goal

Goal

2 pages

Exam 1

Exam 1

5 pages

Exam 1

Exam 1

6 pages

Exam 2

Exam 2

6 pages

Exam 1

Exam 1

5 pages

Load more
Download Stalls and flushes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stalls and flushes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stalls and flushes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?