DOC PREVIEW
UMD CMSC 411 - Lecture 10 Instruction Level Parallelism

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 411CMSC 411Computer Systems ArchitectureLecture 10Instruction Level Parallelism (cont.)Instruction Level Parallelism (cont.)Alan Sussmanl@ d [email protected]• Finish reading Chapter 2 of H&P•First exam scheduled for this Thursday•First exam scheduled for this Thursday– on Units 1-3– Wanli will be giving itCMSC 411 - 10 (from Patterson)2Outline• ILP• Compiler techniques to increase ILPpq• Loop Unrolling• Static Branch Prediction• Dynamic Branch Prediction• Overcoming Data Hazards with Dynamic SchedulingScheduling• Tomasulo Algorithm•ConclusionConclusionCMSC 411 - 8 (from Patterson)3Tomasulo OrganizationFrom H&P Figure 2.9From MemFP RegistersFP OpQueueLoad BuffersLoad BuffersLoad1Load2Load3Load4Store BuffersLoad4Load5Load6Add1Add2Add3Mult1Mult2Reservation To MemFP addersFP addersFP multipliersFP multipliersStationsCMSC 411 - 10 (from Patterson)Common Data Bus (CDB)4Reservation Station ComponentsReservation Station ComponentsOp: Operation to perform in the unit (e.g., + or –)Vj, Vk: Value of Source operands– Store buffers have V field, result to be storedQj, Qk: Reservation stations producing source registers (value to be Qj,Qpg g(written)– Note: Qj,Qk=0 => ready–Store buffers only have Qifor RS producing resultSoebu eso y a eQo S p oduc g esuBusy: Indicates reservation station or FU is busyAlso:Register result status table—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. CMSC 411 - 10 (from Patterson)5Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op QueueIf reservation station free (no structural hazard), control issues instr & sends operands (renames registers).2. Execute—operate on operands (EX)When both operands ready then execute;When both operands ready then execute;if not ready, watch Common Data Bus for result3. Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)DthbdtCMSC 411 - 10 (from Patterson)6–Does the broadcastTomasulo ExampleInstruction streamInstruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F23 Load/BuffersReservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoFU count3 FP Adder R SAdd3 NoMult1 NoMult2 NoRegister result status:FU countdown3 FP Adder R.S.2 FP Mult R.S.Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F300FUClock cycle CMSC 411 - 10 (from Patterson)7Clock cycle counterTomasulo Example Cycle 1Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 NoMult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F301FULoad1CMSC 411 - 10 (from Patterson)8Tomasulo Example Cycle 2Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 NoMult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F302FULoad2 Load1CMSC 411 - 10 (from Patterson)9Note: Can have multiple loads outstandingTomasulo Example Cycle 3Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 YesMULTD R(F4) Load2Mult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F303FUMult1 Load2 Load1• Note: registers names are removed (“renamed”) in Reservation CMSC 411 - 10 (from Patterson)10g()Stations; MULT issued• Load1 completing; what is waiting for Load1? Tomasulo Example Cycle 4M(A1) result of first load…Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 24 Load2Yes45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 Yes SUBD M(A1) Load2Add2NoAdd3 NoMult1 YesMULTD R(F4) Load2Mult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F304FUMult1 Load2 M(A1) Add1•Load2 completing; what is waiting for Load2?CMSC 411 - 10 (from Patterson)11•Load2 completing; what is waiting for Load2? Tomasulo Example Cycle 5Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(A1) M(A2)Add2NoAdd3 No10 Mult1 YesMULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1Register result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F305FUMult1 M(A2) M(A1) Add1 Mult2fCMSC 411 - 10 (from Patterson)12• Timer starts down for Add1, Mult1Tomasulo Example Cycle 6Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2)Add1()Add3 No9Mult1 YesMULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1Register result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F306FUMult1 M(A2) Add2 Add1 Mult2CMSC 411 - 10 (from Patterson)13• Issue ADDD here despite name dependency on F6? Tomasulo Example Cycle 7Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 47DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2)Add1()Add3 No8Mult1 YesMULTDM(A2)


View Full Document

UMD CMSC 411 - Lecture 10 Instruction Level Parallelism

Documents in this Course
Load more
Download Lecture 10 Instruction Level Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 Instruction Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 Instruction Level Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?