CMSC 411CMSC 411Computer Systems ArchitectureLecture 10Instruction Level Parallelism (cont.)Instruction Level Parallelism (cont.)Alan Sussmanl@ d [email protected]• Finish reading Chapter 2 of H&P•First exam scheduled for this Thursday•First exam scheduled for this Thursday– on Units 1-3– Wanli will be giving itCMSC 411 - 10 (from Patterson)2Outline• ILP• Compiler techniques to increase ILPpq• Loop Unrolling• Static Branch Prediction• Dynamic Branch Prediction• Overcoming Data Hazards with Dynamic SchedulingScheduling• Tomasulo Algorithm•ConclusionConclusionCMSC 411 - 8 (from Patterson)3Tomasulo OrganizationFrom H&P Figure 2.9From MemFP RegistersFP OpQueueLoad BuffersLoad BuffersLoad1Load2Load3Load4Store BuffersLoad4Load5Load6Add1Add2Add3Mult1Mult2Reservation To MemFP addersFP addersFP multipliersFP multipliersStationsCMSC 411 - 10 (from Patterson)Common Data Bus (CDB)4Reservation Station ComponentsReservation Station ComponentsOp: Operation to perform in the unit (e.g., + or –)Vj, Vk: Value of Source operands– Store buffers have V field, result to be storedQj, Qk: Reservation stations producing source registers (value to be Qj,Qpg g(written)– Note: Qj,Qk=0 => ready–Store buffers only have Qifor RS producing resultSoebu eso y a eQo S p oduc g esuBusy: Indicates reservation station or FU is busyAlso:Register result status table—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. CMSC 411 - 10 (from Patterson)5Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op QueueIf reservation station free (no structural hazard), control issues instr & sends operands (renames registers).2. Execute—operate on operands (EX)When both operands ready then execute;When both operands ready then execute;if not ready, watch Common Data Bus for result3. Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)DthbdtCMSC 411 - 10 (from Patterson)6–Does the broadcastTomasulo ExampleInstruction streamInstruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F23 Load/BuffersReservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoFU count3 FP Adder R SAdd3 NoMult1 NoMult2 NoRegister result status:FU countdown3 FP Adder R.S.2 FP Mult R.S.Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F300FUClock cycle CMSC 411 - 10 (from Patterson)7Clock cycle counterTomasulo Example Cycle 1Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 NoMult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F301FULoad1CMSC 411 - 10 (from Patterson)8Tomasulo Example Cycle 2Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 NoMult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F302FULoad2 Load1CMSC 411 - 10 (from Patterson)9Note: Can have multiple loads outstandingTomasulo Example Cycle 3Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 NoAdd2NoAdd3 NoMult1 YesMULTD R(F4) Load2Mult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F303FUMult1 Load2 Load1• Note: registers names are removed (“renamed”) in Reservation CMSC 411 - 10 (from Patterson)10g()Stations; MULT issued• Load1 completing; what is waiting for Load1? Tomasulo Example Cycle 4M(A1) result of first load…Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 24 Load2Yes45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj QkAdd1 Yes SUBD M(A1) Load2Add2NoAdd3 NoMult1 YesMULTD R(F4) Load2Mult2 NoRegister result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F304FUMult1 Load2 M(A1) Add1•Load2 completing; what is waiting for Load2?CMSC 411 - 10 (from Patterson)11•Load2 completing; what is waiting for Load2? Tomasulo Example Cycle 5Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(A1) M(A2)Add2NoAdd3 No10 Mult1 YesMULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1Register result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F305FUMult1 M(A2) M(A1) Add1 Mult2fCMSC 411 - 10 (from Patterson)12• Timer starts down for Add1, Mult1Tomasulo Example Cycle 6Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2)Add1()Add3 No9Mult1 YesMULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1Register result status:Register result status:ClockF0 F2 F4 F6 F8 F10 F12 ... F306FUMult1 M(A2) Add2 Add1 Mult2CMSC 411 - 10 (from Patterson)13• Issue ADDD here despite name dependency on F6? Tomasulo Example Cycle 7Instruction status:Exec WriteInstructionjkIssue Comp ResultBusy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 47DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations:S1 S2 RS RSTime NameBusy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2)Add1()Add3 No8Mult1 YesMULTDM(A2)
View Full Document