Unformatted text preview:

CPE 631 Lecture 11 Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenkovi milenka ece uah edu Electrical and Computer Engineering University of Alabama in Huntsville CPE 631 AM Techniques to exploit parallelism Technique Section in the textbook Reduces Forwarding and bypassing Section A 2 Data hazard DH stalls Delayed branches A 2 Control hazard stalls Basic dynamic scheduling A 8 DH stalls RAW Dynamic scheduling with register renaming 3 2 WAR and WAW stalls Dynamic branch prediction 3 4 CH stalls Issuing multiple instruction per cycle 3 6 Ideal CPI Speculation 3 7 Data and control stalls Dynamic memory disambiguation 3 2 3 7 RAW stalls w memory Loop Unrolling 4 1 CH stalls Basic compiler pipeline scheduling A 2 4 1 DH stalls Compiler dependence analysis 4 4 Ideal CPI DH stalls Software pipelining and trace scheduling 4 3 Ideal CPI and DH stalls Compiler speculation 4 4 Ideal CPI and D CH stalls 14 01 19 UAH CPE631 2 CPE 631 AM Tomasulo based FPU for MIPS FP Op Queue Load Buffers From Mem From Instruction Unit FP Registers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Store1 Store2 Store3 Mult1 Mult2 Reservation Stations FP FP FPadders adders FPmultipliers multipliers To Mem Common Data Bus CDB 14 01 19 UAH CPE631 3 CPE 631 AM Reservation Station Components Op Operation to perform in the unit e g or Vj Vk Value of Source operands Store buffers has V field result to be stored Qj Qk Reservation stations producing source registers value to be written Note Qj Qk 0 source operand is already available in Vj Vk Store buffers only have Qi for RS producing result Busy Indicates reservation station or FU is busy Register result status Indicates which functional unit will write each register if one exists Blank when no pending instructions that will write that register 14 01 19 UAH CPE631 4 CPE 631 AM Three Stages of Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station free no structural hazard control issues instr sends operands renames registers 2 Execute operate on operands EX When both operands ready then execute if not ready watch Common Data Bus for result 3 Write result finish execution WB Write it on Common Data Bus to all awaiting units mark reservation station available Normal data bus data destination go to bus Common data bus data source come from bus 64 bits of data 4 bits of Functional Unit source address Write if matches expected Functional Unit produces result Does the broadcast Example speed 2 clocks for Fl pt 10 for 40 clks for 14 01 19 UAH CPE631 5 CPE Tomasulo 631 Instruction stream AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result Load1 Load2 Load3 Register result status Clock 0 No No No 3 Load Buffers Reservation Stations Time Name Busy Add1 No Add2 No FU count Add3 No down Mult1 No Mult2 No Busy Address Op S1 Vj S2 Vk RS Qj RS Qk 3 FP Adder R S 2 FP Mult R S F0 F2 F4 F6 F8 F10 F12 F30 FU Clock cycle counter 14 01 19 UAH CPE631 6 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 1 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 Reservation Stations Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status Clock 1 FU 14 01 19 Busy Address Load1 Load2 Load3 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Yes No No 34 R2 F10 F12 F30 Load1 UAH CPE631 7 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 2 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 Reservation Stations Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status Clock 2 FU Busy Address Load1 Load2 Load3 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Load2 Yes Yes No 34 R2 45 R3 F10 F12 F30 Load1 Note Can have multiple loads outstanding 14 01 19 UAH CPE631 8 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 3 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 Reservation Stations Time Name Busy Op Add1 No Add2 No Add3 No Mult1 Yes MULTD Mult2 No Register result status Clock 3 FU F0 Busy Address 3 S1 Vj Load1 Load2 Load3 S2 Vk RS Qj Yes Yes No 34 R2 45 R3 F10 F12 RS Qk R F4 Load2 F2 Mult1 Load2 F4 F6 F8 F30 Load1 Note registers names are removed renamed in Reservation Stations MULT issued Load1 is waiting for Load1 14 01 19 completing whatUAH CPE631 9 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 4 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 Reservation Stations Busy Address 3 4 4 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 No Yes No 45 R3 F10 F12 Time Name Busy Op Add1 Yes SUBD M A1 Load2 Add2 No Add3 No Mult1 Yes MULTD R F4 Load2 Mult2 No Register result status Clock 4 FU F0 Mult1 Load2 F30 M A1 Add1 Load2 completing what is waiting for Load2 UAH CPE631 14 01 19 10 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 5 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 Reservation Stations Busy Address 3 4 4 5 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op 2 Add1 Yes SUBD M A1 M A2 Add2 No Add3 No 10 Mult1 Yes MULTD M A2 R F4 Mult2 Yes DIVD M A1 Mult1 Register result status Clock 5 FU F0 Mult1 M A2 No No No F10 F12 F30 M A1 Add1 Mult2 Timer starts down for Add1 Mult1 UAH CPE631 14 01 19 11 CPE Tomasulo 631 AM Instruction status Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34 45 F2 F6 F0 F8 Example Cycle 6 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations Busy Address 3 4 4 5 Load1 Load2 Load3 S1 Vj …


View Full Document

UAH CPE 631 - Instruction Level Parallelism

Loading Unlocking...
Login

Join to view Instruction Level Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Instruction Level Parallelism and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?