Unformatted text preview:

2/22/2006 173G22.2243-001High Performance Computer ArchitectureLecture 6Tomasulo’s AlgorithmHardware Support for SpeculationSuperscalar ProcessorsFebruary 22, 20062/22/2006 174Outline• Announcements– HW Assignment 2 due back today– Lab Assignment 2 due back next week: March 1st• Last lecture: Scoreboarding• Reducing the impact of data hazards: Tomasulo’s algorithm• Further reducing pipeline stalls:– Hardware speculation• Multiple-issue processors (achieving IPC > 1)– Superscalar processors[ Hennessy/Patterson CA:AQA (3rd Edition): Chapter 3]2/22/2006 175Recap: Scoreboarding• Scoreboard: Keeps track of instruction and machine status, permitting the instruction to execute as soon as its operands are available– Instructions are issued inissued in--orderorder can execute outexecute out--ofof--orderorder• Pipeline Stages:– Issue (ID1) decode instructions• check for structural and WAW hazards; stall until structural and WAW hazards are resolved; no further issues– Read operands (ID2) collect source operands• wait until no RAW hazards then read operands– Execution (EX) operate on operands• may be multiple cycles - notify scoreboard when done– Write result (WB) finish execution • stall if WAR hazard• Limitations– No forwarding logic– Limited to instructions in basic blocks (not beyond branches)– Stalls for WAW hazards– Wait for WAR hazards before WB• Are there better solutions?2/22/2006 176Tomasulo’s Algorithm for Dynamic Scheduling• Developed for the IBM 360/91 in 1967 - about 3 years after CDC 6600– Goal: High performance without special compilers• Differences between IBM 360 & CDC 6600– IBM 360 has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM 360 has register-memory instructions– IBM 360 has 4 FP registers vs. 8 in CDC 6600– IBM 360 has pipelined functional units (3 adds, 2 multiplies)• Tomasulo’s algorithm is designed to handle name dependencies (WAW and WAR hazards) efficiently SUB F1, F2, F0 DIVF F2, F3, F2 ADDF F3, F0, F0 MULF F3, F1, F1WAR hazardWAW hazardBoth hazards can be avoided by register renamingF1, F2, F0TT, F3, F2SS, F0, F0F3, F1, F12/22/2006 177Tomasulo’s Algorithm: Hardware Organization• Key idea: Reservation Stations– Buffer operands for instructions waiting to issue• As soon as operand is available– Effectively provide an arbitrary number of (internal) registers• Benefits1. HW renaming of registers to avoid WAR, WAW hazards2. Distributed hazard detection and resolution vs. centralized scoreboard and register file• Using the Common Data Bus• Bypassing of results to the FUs• Led to concepts used in the Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, …2/22/2006 178Three Stages of Tomasulo’s Algorithm• Issue: Get instruction from the head of the instruction queue– If reservation station free, allocate and issue instruction• If operands are available, send them to the reservation station• If not, keep track of FUs that will produce the operands– If not free, stall issue• Execution: Operate on operands (EX)– If both operands ready, execute– If not ready, watch CDB for result– Instructions along predicted paths not allowed to execute unless branch direction is resolved (simple implementation: prevent any instruction from leaving Issue if there are pending branch in the pipeline) • Write result: Finish execution (WB)– Write on CDB, mark reservation station availableStructural hazardsWAR, WAW hazardsRAW hazardsBypassing2/22/2006 179Reservation Station ComponentsOp Operation to perform in the unit (e.g., + or –)Qj, Qk Reservation stations producing source operands VjVj,,VkVkValue of Source operandsValue of Source operandsRj, Rk Flags indicating when Vj, Vk are readyBusy Indicates reservation station and FU is busyRegister result status — Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.2/22/2006 180Out-Of-Order Execution• Instructions can become ready for execution out-of-order• Dynamic scheduling is beneficial only if these instructions are allowed to “jump” over others• FP units can pick any of the ready instructions to execute– All side effects in “registers”: can be tracked• LD/ST instructions require additional support– Hardware support for dynamic memory disambiguationAllow memory operations to different addresses to proceed out-of-order– Can be achieved by noting that LD/ST instructions first need to compute address, then access memory• Ensuring effective address calculation happens in program order; AND• Not issuing a load (to memory unit) if detect a store to the same addressNot issuing a store (to memory unit) if detect a load/store to same address2/22/2006 181Example of Tomasulo’s AlgorithmADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F2MUL.D F0, F2, F4L.D F2, 45(R3)L.D F6, 34(R2)WriteExecuteIssueInstructionRkRjQkQjVkOpNoMult2NoMult1NoAdd3NoAdd2NoAdd1VjBusyNameTimeF16F12F10F8F6F4F2F0InstructionsReservationstationsRegistersNoLoad3NoLoad2NoLoad1AddressBusyNameLoad/Store Units221024022/22/2006 182Example of Tomasulo’s Algorithm: Cycle 1ADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F2MUL.D F0, F2, F4L.D F2, 45(R3)1L.D F6, 34(R2)WriteExecuteIssueInstructionRkRjQkQjVkOpNoMult2NoMult1NoAdd3NoAdd2NoAdd1VjBusyNameTimeLoad1F16F12F10F8F6F4F2F0InstructionsRegistersNoLoad3NoLoad234+R2YesLoad1AddressBusyNameLoad/Store Units22102402Reservationstations2/22/2006 183Example of Tomasulo’s Algorithm: Cycle 2ADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F2MUL.D F0, F2, F42L.D F2, 45(R3)1L.D F6, 34(R2)WriteExecuteIssueInstructionRkRjQkQjVkOpNoMult2NoMult1NoAdd3NoAdd2NoAdd1VjBusyNameTimeLoad1Load2F16F12F10F8F6F4F2F0InstructionsRegistersNoLoad345+R3YesLoad234+R2YesLoad1AddressBusyNameLoad/Store Units22102402Reservationstations2/22/2006 184Example of Tomasulo’s Algorithm: Cycle 3ADD.D F6, F8, F2DIV.D F10, F0, F6SUB.D F8, F6, F23MUL.D F0, F2, F42L.D F2, 45(R3)31L.D F6, 34(R2)WriteExecuteIssueInstructionYesRkNoRjQkLoad2QjR(F4)VkMULOpNoMult2YesMult1NoAdd3NoAdd2NoAdd1VjBusyNameTimeLoad1Load2Mult1F16F12F10F8F6F4F2F0InstructionsRegistersNoLoad345+R3YesLoad234+R2YesLoad1AddressBusyNameLoad/Store Units22102402Reservationstations2/22/2006 185Example of Tomasulo’s Algorithm: Cycle 4ADD.D F6, F8, F2DIV.D F10, F0, F64SUB.D F8, F6, F23MUL.D F0, F2, F442L.D F2, 45(R3)431L.D F6,


View Full Document

NYU CSCI-GA 2243 - Tomasulo’s Algorithm

Download Tomasulo’s Algorithm
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tomasulo’s Algorithm and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tomasulo’s Algorithm 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?