CMSC 411CMSC 411Computer Systems ArchitectureLecture 11Instruction Level Parallelism (cont.)Instruction Level Parallelism (cont.)Alan Sussmanl@ d [email protected]• Wanli will give lecture on Thursday•Exam #1answers posted•Exam #1 –answers posted– Mean: 67 Median: 66 Standard Dev.: 12.5• Homework #3 posted, from H&P Chapter 2dMh24–due March 24• Read Chapter 3 of H&P– but not too deeply – there’s way too much detail in the experiments/comparisonsexperiments/comparisonsCMSC 411 - 11 (from Patterson)2ADDING SPECULATION TO TOMASULO’SALGORITHMTOMASULO SALGORITHMCMSC 411 - 11 (from Patterson)3Reorder Buffer operation• Holds instructions in FIFO order, exactly as issued• When instructions complete, results placed into ROB–Supplies operands to other instruction between execution complete & commit ⇒ more registers like RS– Tag results with ROB buffer number instead of reservation stationIiilhdfROBldi•Instructions commit ⇒values at head of ROB placed in registers•As a result, easy to undoReorderAs a result, easy to undo speculated instructions on mispredicted branches or on exceptionsReorderBufferFPOpQueueFP Regsor on exceptionsRes Stations Res StationsCommit pathCMSC 411 - 11 (from Patterson)4FP AdderFP AdderRecall: 4 Steps of Speculative Tomasulo Algorithmpp g1. Issue—get instruction from FP Op QueueIf reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)2. Execution—operate on operands (EX)When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)3Write result—finish execution (WB)3.Write resultfinish execution (WB)Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available.4Commit—update register with reorder result4.Commitupdate register with reorder resultWhen instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)CMSC 411 - 11 (from Patterson)5Tomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F0LD F0 10(R2)NOldestReorder BufferToROB1F0LD F0,10(R2)NRegistersToMemoryDestDestfrom MemoryRegistersFP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)6FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F10F10F0ADDD F10,F4,F0LD F0 10(R2)NNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegistersFP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)7FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)8FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5F0 ADDD F0,F4,F6 NF4 LD F4,0(R3) NNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDDROB5R(F6)ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)6ADDD ROB5, R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2Dest5 0+R3CMSC 411 - 11 (from Patterson)9FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0ROB5 ST 0(R3),F4ADDD F0,F4,F6NNF4 LD F4,0(R3) NNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDDROB5R(F6)ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)6ADDD ROB5, R(F6)FP ddFP ddFP lti liFP lti liReservation StationsyDest1 10+R25 0+R3CMSC 411 - 11 (from Patterson)10FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0M[10] ST 0(R3),F4ADDD F0,F4,F6YNF4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)ToMemoryDestDestfrom MemoryRegisters2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDD M[10] R(F6)3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2Dest6ADDD M[10],R(F6)CMSC 411 - 11 (from Patterson)11FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0M[10]<val2>ST 0(R3),F4ADDD F0,F4,F6YExF4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)12FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5F4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)13FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?F4 M[10] LD F4,0(R3) YFP OpQueueROB7ROB6ROB5Newest-- BNE F2,<…>NROB4ROB3ROB2ROB1F2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferWhat about memoryToROB1F0LD F0,10(R2)NRegistersWhat about memoryhazards???3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)14FP addersFP addersFP multipliersFP multipliersAvoiding Memory Hazards• WAW and WAR hazards through memory are eliminated with speculation because actual updating of memory occurs in dh
View Full Document