Unformatted text preview:

Lecture 6 ILP Techniques Laxmi N Bhuyan CS 162 Spring 2003 DAP Spr 98 UCB 1 HW Schemes Instruction Parallelism Why in HW at run time Works when can t know real dependence at compile time Compiler simpler Code for one machine runs well on another Key idea Allow instructions behind stall to proceed DIVD ADDD SUBD F0 F2 F4 F10 F0 F8 F12 F8 F14 Enables out of order execution out of order completion ID stage checks for hazards If no hazards issue the instn for execution Scoreboard dates to CDC 6600 in 1963 DAP Spr 98 UCB 2 How ILP Works Issuing multiple instructions per cycle would require fetching multiple instructions from memory per cycle called Superscalar degree or Issue width To find independent instructions we must have a big pool of instructions to choose from called instruction buffer IB As IB length increases complexity of decoder control increases that increases the datapath cycle time Prefetch instructions sequentially by an IFU that operates independently from datapath control Fetch instruction PC L where L is the IB size or as directed by the branch predictor See Fig 6 1 Pentium diagram DAP Spr 98 UCB 3 Pentium Datapath Pentium consists of two pipes U pipe and V pipe operating in parallel U pipe contains an 8 stage FP pipeline see Pentium Figure Two stages of Decode Decode and control one stage Register read 2nd stage See I cache and D cache in Fig 6 1 What is TLB How does the Virtual memory work DAP Spr 98 UCB 4 HW Schemes Instruction Parallelism Two types Scoreboard and Tomasulo Scoreboard EX PENTIUM Out of order execution divides ID stage 1 Issue decode instructions check for structural hazards 2 Read operands wait until no data hazards then read operands Scoreboards allow instruction to execute whenever there is no structural hazard or not waiting for prior instructions So the instructions are issued in order but can bypass the waiting instructions in the read operand stage In order issue Out of Order execution Out of Order completion Named after CDC 6600 Scoreboard which developed this capability DAP Spr 98 UCB 5 Scoreboard Implications Scoreboard replaces ID EX WB with 4 stages Out of order completion WAR WAW hazards Solutions for WAR Wait at the WB stage until the other instruction completes For WAW must detect hazard at the ID stage stall until other completes Need to have multiple instructions in execution phase multiple execution units or pipelined execution units Scoreboard keeps track of dependencies state or operations DAP Spr 98 UCB 6 Four Stages of Scoreboard Control 1 Issue decode instructions check for structural hazards ID1 If a functional unit for the instruction is free and no other active instruction has the same destination register WAW the scoreboard issues the instruction to the functional unit and updates its internal data structure If a structural or WAW hazard exists then the instruction issue stalls and no further instructions will issue until these hazards are cleared 2 Read operands wait until no data hazards then read operands ID2 A source operand is available if no earlier issued active instruction is going to write it or if the register containing the operand is being written by a currently active functional unit If the source operands are available for an instn the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution The scoreboard resolves RAW hazards dynamically in this step and instructions may be sent into execution out of order DAP Spr 98 UCB 7 Four Stages of Scoreboard Control 3 Execution operate on operands EX The functional unit begins execution upon receiving operands When the result is ready it notifies the scoreboard that it has completed execution 4 Write result finish execution WB Once the scoreboard is aware that the functional unit has completed execution the scoreboard checks for WAR hazards If none it writes results If WAR then it stalls the instruction Example DIVD F0 F2 F4 ADDD F10 F0 F8 SUBD F8 F8 F14 CDC 6600 scoreboard would stall SUBD until ADDD reads operands DAP Spr 98 UCB 8 Design of the Scoreboard 1 Instruction status which of 4 steps the instruction is in 2 Functional unit status Indicates the state of the functional unit FU 9 fields for each functional unit Busy Indicates whether the unit is busy or not Op Operation to perform in the unit e g or Fi Destination register Fj Fk Source register numbers Qj Qk Functional units producing source registers Fj Fk Rj Rk Flags indicating when Fj Fk are ready 3 Register result status Indicates which functional unit will write each register if one exists Blank when no pending instructions will write that register DAP Spr 98 UCB 9 Detailed Scoreboard Pipeline Control Instruction status Wait until Bookkeeping Issue Not busy FU and not result D Busy FU yes Op FU op Fi FU D Fj FU S1 Fk FU S2 Qj Result S1 Qk Result S2 Rj not Qj Rk not Qk Result D FU Read operands Rj and Rk Rj No Rk No Execution complete Functional unit done Write result f Fj f Fi FU or Rj f No Fk f Fi FU or Rk f No f if Qj f FU then Rj f Yes f if Qk f FU then Rj f Yes Result Fi FU 0 Busy FU No DAP Spr 98 UCB 10 CDC 6600 Scoreboard Speedup 1 7 from compiler 2 5 by hand BUT slow memory no cache limits benefit Limitations of 6600 scoreboard No forwarding hardware Limited to instructions in basic block small window Small number of functional units structural hazards especailly integer load store units Do not issue on structural hazards Wait for WAR hazards Prevent WAW hazards DAP Spr 98 UCB 11 Summary Instruction Level Parallelism ILP in SW or HW Loop level parallelism is easiest to see SW parallelism dependencies defined for program hazards if HW cannot resolve SW dependencies compiler sophistication determine if compiler can unroll loops Memory dependencies hardest to determine HW exploiting ILP Works when can t know dependence at run time Code for one machine runs well on another Key idea of Scoreboard Allow instructions behind stall to proceed Decode Issue instr read operands Enables out of order execution out of order completion ID stage checked both for structural DAP Spr 98 UCB 12 Tomasulo Algorithm Implemented in IBM 360 91 in 1966 Control buffers distributed with Function Units FU vs centralized in scoreboard FU buffers called reservation stations have pending operands Registers in instructions replaced by values or pointers to reservation stations RS called register renaming avoids WAR WAW hazards More reservation stations than registers so can do


View Full Document

UCR CS 162 - LECTURE 6 ILP Techniques

Download LECTURE 6 ILP Techniques
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE 6 ILP Techniques and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE 6 ILP Techniques 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?