CS152 Computer Architecture and Engineering Lecture 16 Dynamic Scheduling Speculation and ILP March 31 1999 John Kubiatowicz http cs berkeley edu kubitron lecture slides http www inst eecs berkeley edu cs152 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Review Compiler techniques for parallelism Loop unrolling Multiple iterations of loop in software Amortizes loop overhead over several iterations Gives more opportunity for scheduling around stalls Software Pipelining Take one instruction from each of several iterations of the loop Software overlapping of loop iterations Today will show hardware overlapping of loop iterations Very Long Instruction Word machines VLIW Multiple operations coded in single long instruction Requires sophisticated compiler to decide which operations can be done in parallel Trace scheduling find common path and schedule code as if branches didn t exist add fixup code All of these require additional registers 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Review Dynamic hardware for out of order execution HW exploitation of ILP Works when can t know dependence at compile time Code for one machine runs well on another Scoreboard ala CDC 6600 in 1963 Centralized control structure No register renaming no forwarding Pipeline stalls for WAR and WAW hazards Are these fundamental limitations No Reservation stations ala IBM 360 91 in 1966 Distributed control structures Implicit renaming of registers dispatched pointers WAR and WAW hazards eliminated by register renaming Results broadcast to all reservation stations for RAW Reservation stations are like additional registers 3 31 99 UCB Spring 1999 CS152 Kubiatowicz The Big Picture Where are We Now The Five Classic Components of a Computer Processor Input Control Memory Datapath Output Today s Topics Recap last lecture Hardware loop unrolling with Tomasulo algorithm Administrivia Speculation branch prediction Reorder buffers 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Three Stages of Tomasulo Algorithm 1 Issue get instruction from FP Op Queue If reservation station free no structural hazard control issues instr sends operands renames registers 2 Execution operate on operands EX When both operands ready then execute if not ready watch Common Data Bus for result 3 Write result finish execution WB Write on Common Data Bus to all awaiting units mark reservation station available Normal data bus data destination go to bus Common data bus data source come from bus 64 bits of data 4 bits of Functional Unit source address Write if matches expected Functional Unit produces result Does the broadcast 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Tomasulo Organization FP Registers From Mem FP Op Queue Load Buffers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Mult1 Mult2 Reservation Stations FP FP FPadders adders FPmultipliers multipliers To Mem Common Data Bus CDB 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Tomasulo Loop Example Loop MULTD SD F4 SUBI BNEZ LD F4 0 R1 R1 F0 0 F0 F2 R1 R1 8 Loop R1 Assume Multiply takes 4 clocks Assume first load takes 8 clocks cache miss second load takes 1 clock hit To be clear will show clocks for SUBI BNEZ Reality integer instructions ahead 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Loop Example Instruction status ITER Instruction 1 1 1 2 2 2 LD MULTD SD LD MULTD SD F0 F4 F4 F0 F4 F4 j k 0 F0 0 0 F0 0 R1 F2 R1 R1 F2 R1 Reservation Stations Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Op Vj Exec Write Issue CompResult S1 Vk S2 Qj RS Qk Busy Addr Load1 Load2 Load3 Store1 Store2 Store3 No No No No No No Code LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 Fu 0 F0 0 R1 Loop R1 F2 R1 8 F30 Register result status Clock 0 3 31 99 F0 R1 80 F2 F4 F6 F8 F10 F12 Fu UCB Spring 1999 CS152 Kubiatowicz Loop Example Cycle 1 Instruction status ITER Instruction 1 1 1 2 2 2 LD MULTD SD LD MULTD SD F0 F4 F4 F0 F4 F4 j k 0 F0 0 0 F0 0 R1 F2 R1 R1 F2 R1 Reservation Stations Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Op Vj Exec Write Issue CompResult 1 S1 Vk S2 Qj RS Qk Busy Addr Fu Load1 Load2 Load3 Store1 Store2 Store3 Yes No No No No No 80 Code LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 0 F0 0 R1 Loop R1 F2 R1 8 F30 Register result status Clock 1 3 31 99 R1 80 F0 F2 F4 F6 F8 F10 F12 Fu Load1 UCB Spring 1999 CS152 Kubiatowicz Loop Example Cycle 2 Instruction status ITER Instruction 1 1 1 2 2 2 LD MULTD SD LD MULTD SD F0 F4 F4 F0 F4 F4 j k 0 F0 0 0 F0 0 R1 F2 R1 R1 F2 R1 Reservation Stations Time Name Busy Op Add1 No Add2 No Add3 No Mult1 Yes Multd Mult2 No Vj Exec Write Issue CompResult 1 2 S1 Vk S2 Qj RS Qk R F2 Load1 Busy Addr Fu Load1 Load2 Load3 Store1 Store2 Store3 Yes No No No No No 80 Code LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 0 F0 0 R1 Loop R1 F2 R1 8 F30 Register result status Clock 2 3 31 99 R1 80 F0 Fu Load1 F2 F4 F6 F8 F10 F12 Mult1 UCB Spring 1999 CS152 Kubiatowicz Loop Example Cycle 3 Instruction status ITER Instruction 1 1 1 2 2 2 LD MULTD SD LD MULTD SD F0 F4 F4 F0 F4 F4 j k 0 F0 0 0 F0 0 R1 F2 R1 R1 F2 R1 Reservation Stations Time Name Busy Op Add1 No Add2 No Add3 No Mult1 Yes Multd Mult2 No Vj Exec Write Issue CompResult 1 2 3 S1 Vk S2 Qj RS Qk R F2 Load1 Busy Addr Fu Load1 Load2 Load3 Store1 Store2 Store3 Yes No No Yes No No 80 80 Mult1 Code LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 0 F0 0 R1 Loop R1 F2 R1 8 F30 Register result status Clock 3 R1 80 F0 Fu Load1 F2 F4 F6 F8 F10 F12 Mult1 Implicit renaming sets UCB up Spring DataFlow graph 1999 3 31 99 CS152 Kubiatowicz What does this mean physically From Mem FP Op Queue Load Buffers FP Registers F0 F0 Load Load11 F4 F4 Mult1 Mult1 Load1 addr addr 80 80 Load2 Load3 Load4 Load5 Load6 Add1 Add2 Add3 Store Buffer s Addr 80 Mult1 Addr 80 Mult1 Mult1 mul R F2 Mult2 Load1 Reservation Stations FP FP FPadders adders FPmultipliers multipliers To Mem Common Data Bus CDB 3 31 99 UCB Spring 1999 CS152 Kubiatowicz Loop Example Cycle 4 Instruction status ITER Instruction 1 1 1 2 2 2 LD MULTD SD LD MULTD SD F0 F4 F4 F0 F4 F4 j k 0 F0 0 0 F0 0 R1 F2 R1 R1 F2 R1 Reservation Stations Time Name Busy Op Add1 No Add2 No Add3 No Mult1 Yes Multd Mult2 No Vj Exec Write Issue CompResult 1 2 3 S1 Vk S2 Qj RS Qk R F2 Load1 Busy Addr Fu Load1 Load2 …
View Full Document
Unlocking...