1PipeliningCS217InstructionProcessingSteps• Instructionfetch: Fetchanddecodeinstruction,retrieveoperandsfromregisters• Execute: Executearithmeticinstruction,computebranchtargetaddress,computeload/storememoryaddress• Memoryaccess: Accessmemoryforloadorstore,Fetchinstructionattargetofbranch• Storeresults: Writeinstructionresultstoregisters2PipeliningFetch Execute Memory StoreFetch Execute Memory StoreFetch Execute Memory StoreFetch Execute Memory Storeadd%i1,%i1,%o1add%i1,%o1,%o1sub%o1,3,%o1add%o1,%i2,%o112162024PC nPC1216162020242428PipelinedLoadInstructions• Problem:loadfollowedbyuseld[%o0],%o1F E M WF E M WF E M Wadd%o1,%o2,%o2ld[%o0],%o1F E M WF E M Wadd%o1,%o2,%o2loaddelayslotLoaddelayslotsareinsertedautomatically3PipelinedBranchInstructions• Problem:instructionafterbranchcmp%o0,%o1F E M WF E M WF E M WbleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M WF E M WF E M WbranchdelayslotF E M Wcmp%o0,%o1bleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M WUpdatingtheProgramCounter• FetchinstructionataddressstoredinnPCMostinstructions: nPC =PC+4Branchinstructions: nPC iscomputedinexecutestage• ExecuteinstructionataddressstoredinPCAfterexecute:PC=nPCPC nPC12161620203636404044cmp a,bble L1nopmov a,cba L2nopL1: mov b,cL2:...12162024283236404DelaySlots• Oneoption:usenop inalldelayslotsfor(i=0;i<n;i++)...#definei%l0#definen%l1clr iL1: cmp i,nbge L2; nop...inciba L1; nopDelaySlots• Optimizingcompilerstrytoavoiddelayslotsfor(i=0;i<n;i++)...#definei%l0#definen%l1clr iL1: cmp i,nbge L2; nop...inciba L1; nop#definei%l0#definen%l1clr iba L2;nopL1:...inciL2: cmp i,nbl L1; nop5DelaySlots• Optimizingcompilerstrytofilldelayslotsif(a>b)c=a;elsec=b;cmp a,b cmp a,bble L1;bleL1nopmovb,cmov a,c mova,cba L2;L1:…nopL1:movb,cL2:...PipelinedBranchInstructions• Problem:instructionafterbranchcmp%o0,%o1F E M WF E M WF E M WbleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M WF E M WF E M WbranchdelayslotF E M Wcmp%o0,%o1bleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M W6PipelinedBranchInstructions• Problem:instructionafterbranchcmp%o0,%o1F E M WF E M WF E M WbleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M WF E M WF E M WF E M Wcmp%o0,%o1bleL1mov%o0,%o1L1:add%o0,%o0,%o0F E M WProgrammershouldtrytoinsertindependentinstructionsinbranchdelayslotsAnnulBit• Controlstheexecutionofthedelay-slotinstructionbg,aL1mova,cthe,a causesthemov instructiontobeexecutedifthebranchistaken,andnotexecutedifthebranchisnottaken• Exceptionba,aL doesnot executethedelay-slotinstruction7AnnulBit(cont)• Optimizedfor(i=0;i<n;i++)1;2;…;nclri clribaL2 ba,aL2L1:1 L1:22 ......nninciinciL2:cmpi,nL2:cmp i,n bl,aL1blL1 1nopWhile-LoopExamplewhile(...){stmt1:stmtn}test:cmp...bx donenopstmt1:stmtnbatestnopdone:...3instr2instr8While-Loop(cont)• Movetesttoendoflooptest: cmp ...bx donenoploop:stmt1:stmtncmp ...bnx loopnopdone:...• Eliminatefirsttestbatestnoploop:stmt1:stmtntest: cmp ...bnx loopnop...While-Loop(cont)• Eliminatethe nop intheloopbatestnoploop:stmt2:stmtntest: cmp ...bnx,aloopstmt1...now2overheadinstructionsperloop9If-Then-ElseExampleif(...){t-stmt1:t-stmtn}else{e-stmt1:e-stmtm}Howoptimize?cmp ...bnxelsenopt-stmt1:t-stmtnbanextnopelse:e-stmt1e-stmt2:e-stmtmnext:...If-Then-ElseExampleif(...){t-stmt1:t-stmtn}else{e-stmt1:e-stmtm}Howoptimize?cmp ...bnx,a elsee-stmt1t-stmt1:t-stmtnbanextnopelse:e-stmt2:e-stmtmnext:...10If-Then-ElseExampleif(...){t-stmt1:t-stmtn}else{e-stmt1:e-stmtm}Howoptimize?cmp ...bnx,aelsee-stmt1t-stmt1:ba
View Full Document