Tomasulo Architecture Out of Order Execution Lab 7 Report Names alphabetical order Jim Chen Ray Juang Ted Lee Tim Lee cs152 etjim cs152 juangr cs152 timclee cs152 leet Group name Alt 0165 1 Abstract The goal of this project was to design a processor based on an architecture of our choosing We chose to create an Out of Order execution processor based on the Tomosulo Algorithm This architecture is very modular and provides a good base for optimizations but also allows execution to continue when some instructions such as multiply divide take a long time to finish This is very advantageous on systems with units that take many cycles to complete or on systems that have poor memory subsystems In our case we decided to make a write through direct mapped 8KB cache composed of 8 word cache lines with a 4 entry write buffer We were successful and ended up with a design that operates at 12 3 MHz with a CPI count of 1 1 or roughly 1 2 Division of Labor The division of labor for this project was mainly a division in time We decided that we would set timelines for portions of the project to be completed and took shifts for completing each module Every member took a part in the design of every module However each member took responsibility for a tested and working module design during the coding stage The responsibilities were divided as follows Component Person Responsible Cache controller cs152 leet CDB arbiter cs152 etjim Memory controller cs152 juangr Memory arbiter cs152 leet Load Store arbiter cs152 timclee Functional units cs152 juangr Reservation station cs152 juangr Register file cs152 etjim Fetch Decoder cs152 timclee Table 1 Main components and person responsible for ensuring correctness Our development was divided into the following stages Stage Name Description Top Level Planning Paper implementation of top level goals and what the overall design would look like Detailed Level Planning Paper implementation of what detailed components of the design would look like Coded Implementation Replication of paper designs into Verilog coded modules Top Level Structural Verification Analysis of all modules thrown together to ensure no signals or components were missing Component Verification and Individual testing of components to verify that they Modification were functioning Incremental Testing Phase Incremental tests to verify correct behaviour in interaction between various modules Overall Testing Phase Top level testing of overall design Debug phase from MIPS code in simulation Timing Analysis Synthesis Substitution of time synthesis programmable modules to verify correctness of synthesized board Optimizations Optimizations to reduce critical path reduce CPI and improve overall performance Table 2 Development stages and description of the stage In summary this can be broken down into a Top Level Planning stage a Detailed Level Component Design stage a Testing phase and an Optimization stage 3 Detailed Strategy 3 1 Top Level Planning In this stage we planned out on paper what our top level datapath would look like This would provide us with insight as to how individual components would function with the rest of the datapath 3 1 1 Datapath Our Datapath is composed of 3 main parts The Fetch Issue Unit the Functional Units and the Register File A typical instruction will have a lifetime as follows be fetched decoded and issued by the Fetch Issue unit Issuing consists of broadcasting a tag opcode and operand values to all functional units Issuing also puts the issued tag in the destination register s tag field in the register file The register file is constantly monitoring the Common Data Bus and the specific destination register is waiting for this tag When it sees the tag on the CDB it will take the result The tag is a 9 bit field specifying the functional unit and reservation station 6 functional units 3 reservation stations The functional units then take the instruction wait to resolve dependencies by looking on the Common Data Bus CDB and then put the instruction result on the CDB Putting a result on the CDB is done by sending a request to the CDB Arbiter which then broadcasts a single result to the rest of the functional units and register file The register file will see the tag that is on the CDB and take the result from the CDB store it into its data register and clear its tag 3 1 2 Control Our Control is handled entirely by the Fetch Issue unit and by the architecture of the Datapath Once an instruction is issued the Datapath will put the result of the instruction in the destination register after resolving dependencies and hazards by itself Cases that cannot be handled by the Datapath such as lw and sw address ambiguity result in stalls at issue Our initial top level designed consisted of three functional units as specified in the project specifications However we decided later on to split the Integer functional unit into R type I type and Shift functional units Figure 1 Initial sketch of top level design 3 2 Detailed Level Planning and Component Design This stage consisted of doing detail planning of each of the components in the top level design and coding them in Verilog The following sections will describe in detail the functionality and design of each component 3 2 1 The Fetch Issue Unit The fetcher is the same as that for an ordinary pipelined processor It contains a register for the PC which is updated after a mux to select pc 4 the branch pc a jump pc or a jr The logic for the selector to this mux is handled in a module called branch magic which waits like a reservation station for branch and jr results to be broadcast on the cdb Actually right now we don t have branch prediction so we keep issuing branches until the dependency is resolved so this unit only has to input register file outputs This unit fetches one instruction at a time from the instruction cache and stores it into an instruction register The instruction in this register is the current instruction that will be decoded and issued by the decode issue unit later on Below is a depiction of our planned fetch unit Figure 2 Fetch decode block designs In the decode issue unit the instruction is decoded for type to see what functional unit to send to and then the status of the reservation stations is checked If there is an open reservation station in the appropriate functional unit the instruction is issued to it by specifying the correct tag on the Issue Bus The tag is one hot encoded with the top 3 bits for the
View Full Document
Unlocking...