DOC PREVIEW
UMD CMSC 411 - Lecture 11 Instruction Level Parallelism

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 411CMSC 411Computer Systems ArchitectureLecture 11Instruction Level Parallelism (cont.)Instruction Level Parallelism (cont.)Alan Sussmanl@ d [email protected]• Wanli will give lecture on Thursday•Exam #1answers posted•Exam #1 –answers posted– Mean: 67 Median: 66 Standard Dev.: 12.5• Homework #3 posted, from H&P Chapter 2dMh24–due March 24• Read Chapter 3 of H&P– but not too deeply – there’s way too much detail in the experiments/comparisonsexperiments/comparisonsCMSC 411 - 11 (from Patterson)2ADDING SPECULATION TO TOMASULO’SALGORITHMTOMASULO SALGORITHMCMSC 411 - 11 (from Patterson)3Reorder Buffer operation• Holds instructions in FIFO order, exactly as issued• When instructions complete, results placed into ROB–Supplies operands to other instruction between execution complete & commit ⇒ more registers like RS– Tag results with ROB buffer number instead of reservation stationIiilhdfROBldi•Instructions commit ⇒values at head of ROB placed in registers•As a result, easy to undoReorderAs a result, easy to undo speculated instructions on mispredicted branches or on exceptionsReorderBufferFPOpQueueFP Regsor on exceptionsRes Stations Res StationsCommit pathCMSC 411 - 11 (from Patterson)4FP AdderFP AdderRecall: 4 Steps of Speculative Tomasulo Algorithmpp g1. Issue—get instruction from FP Op QueueIf reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)2. Execution—operate on operands (EX)When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)3Write result—finish execution (WB)3.Write resultfinish execution (WB)Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available.4Commit—update register with reorder result4.Commitupdate register with reorder resultWhen instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)CMSC 411 - 11 (from Patterson)5Tomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F0LD F0 10(R2)NOldestReorder BufferToROB1F0LD F0,10(R2)NRegistersToMemoryDestDestfrom MemoryRegistersFP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)6FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F10F10F0ADDD F10,F4,F0LD F0 10(R2)NNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegistersFP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)7FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5NewestROB4ROB3ROB2ROB1F2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)8FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5F0 ADDD F0,F4,F6 NF4 LD F4,0(R3) NNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDDROB5R(F6)ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)6ADDD ROB5, R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2Dest5 0+R3CMSC 411 - 11 (from Patterson)9FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0ROB5 ST 0(R3),F4ADDD F0,F4,F6NNF4 LD F4,0(R3) NNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDDROB5R(F6)ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)6ADDD ROB5, R(F6)FP ddFP ddFP lti liFP lti liReservation StationsyDest1 10+R25 0+R3CMSC 411 - 11 (from Patterson)10FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0M[10] ST 0(R3),F4ADDD F0,F4,F6YNF4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)ToMemoryDestDestfrom MemoryRegisters2 ADDD R(F4),ROB12 ADDD R(F4),ROB16ADDD M[10] R(F6)3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2Dest6ADDD M[10],R(F6)CMSC 411 - 11 (from Patterson)11FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5--F0M[10]<val2>ST 0(R3),F4ADDD F0,F4,F6YExF4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)12FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?FP OpQueueROB7ROB6ROB5F4 M[10] LD F4,0(R3) YNewestROB4ROB3ROB2ROB1-- BNE F2,<…>NF2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferToROB1F0LD F0,10(R2)NRegisters3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)13FP addersFP addersFP multipliersFP multipliersTomasulo With Reorder buffer:Done?F4 M[10] LD F4,0(R3) YFP OpQueueROB7ROB6ROB5Newest-- BNE F2,<…>NROB4ROB3ROB2ROB1F2F10F10F0DIVD F2,F10,F6ADDD F10,F4,F0LD F0 10(R2)NNNOldestReorder BufferWhat about memoryToROB1F0LD F0,10(R2)NRegistersWhat about memoryhazards???3DIVDROB2R(F6)2 ADDD R(F4),ROB12 ADDD R(F4),ROB1ToMemoryDestDestfrom MemoryRegisters3DIVD ROB2,R(F6)FP ddFP ddFP lti liFP lti liReservation Stationsy1 10+R2DestCMSC 411 - 11 (from Patterson)14FP addersFP addersFP multipliersFP multipliersAvoiding Memory Hazards• WAW and WAR hazards through memory are eliminated with speculation because actual updating of memory occurs in dh


View Full Document

UMD CMSC 411 - Lecture 11 Instruction Level Parallelism

Documents in this Course
Load more
Download Lecture 11 Instruction Level Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 11 Instruction Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 11 Instruction Level Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?