DOC PREVIEW
Berkeley COMPSCI 152 - Tomasulo Architecture - Out of Order Execution

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ComponentPerson ResponsibleStage NameDescription4 ResultsNumber of TBUFs 7690 out of 19520 39%5 ConclusionTomasulo Architecture: Out of Order Execution Lab 7 ReportNames (alphabetical order)Jim Chen (cs152-etjim)Ray Juang (cs152-juangr)Ted Lee (cs152-timclee)Tim Lee (cs152-leet)Group name: Alt-0165 (¥)1. AbstractThe goal of this project was to design a processor based on an architecture of our choosing. We chose to create an Out-of-Order execution processor based on the Tomosulo Algorithm. This architecture is very modular and provides a good base for optimizations, but also allows execution to continue when some instructions, such as multiply/divide, take a long time to finish. This is very advantageous on systems with units that take many cycles to complete or on systems that have poor memory subsystems. In our case, we decided to make a write-through, direct-mapped 8KB cachecomposed of 8 word cache lines with a 4-entry write buffer. We were successful and ended up with a design that operates at 12.3 MHz with a CPI count of 1.1 (or roughly 1).2. Division of LaborThe division of labor for this project was mainly a division in time. We decided that we would set timelines for portions of the project to be completed and took shifts for completing each module. Every member took a part in the design of every module. However, each member took responsibility for a tested and working module design during the coding stage. The responsibilities were divided as follows:Component Person ResponsibleCache controller cs152-leetCDB arbiter cs152-etjim Memory controller cs152-juangr Memory arbiter cs152-leetLoad/Store arbiter cs152-timclee Functional units cs152-juangrReservation station cs152-juangrRegister file cs152-etjimFetch/Decoder cs152-timcleeTable 1: Main components and person responsible for ensuring correctnessOur development was divided into the following stages:Stage Name DescriptionTop-Level PlanningPaper implementation of top-level goals and what the overall design would look like.Detailed-Level PlanningPaper implementation of what detailed components ofthe design would look like Coded ImplementationReplication of paper designs into Verilog coded modulesTop-Level Structural VerificationAnalysis of all modules thrown together to ensure no signals or components were missing.Component Verification andModificationIndividual testing of components to verify that they were functioningIncremental Testing PhaseIncremental tests to verify correct behaviour in interaction between various modulesOverall Testing PhaseTop-level testing of overall design. Debug phase from MIPS code in simulation.Timing Analysis/SynthesisSubstitution of time synthesis programmable modulesto verify correctness of synthesized board.OptimizationsOptimizations to reduce critical path, reduce CPI, and improve overall performanceTable 2: Development stages and description of the stageIn summary, this can be broken down into a Top-Level Planning stage, a Detailed-Level Component Design stage, a Testing phase, and an Optimization stage.3. Detailed Strategy3.1 Top-Level PlanningIn this stage, we planned out on paper, what our top-level datapath would look like. This would provide us with insight as to how individual components would function with the rest of the datapath.3.1.1 DatapathOur Datapath is composed of 3 main parts: The Fetch/Issue Unit, the Functional Units and the Register File. A typical instruction will have a lifetime as follows: be fetched, decoded and issued by the Fetch/Issue unit. Issuing consists of broadcasting a tag, op-code and operand values to all functional units. Issuing also puts the issued tag in the destination register’s tag field in the register file. The register file is constantly monitoringthe Common Data Bus, and the specific destination register is waiting for this tag. When it sees the tag on the CDB, it will take the result. The tag is a 9-bit field specifying the functional unit and reservation station (6 functional units, 3 reservation stations). The functional units then take the instruction, wait to resolve dependencies by looking on the Common Data Bus (CDB), and then put the instruction result on the CDB. Putting a result on the CDB is done by sending a request to the CDB Arbiter which then broadcasts a single result to the rest of the functional units and register file. The register file will see the tag that is on the CDB and take the result from the CDB, store it into its data register and clear its tag.3.1.2 ControlOur Control is handled entirely by the Fetch/Issue unit and by the architecture of the Datapath. Once an instruction is issued, the Datapath will put the result of the instructionin the destination register after resolving dependencies and hazards by itself. Cases thatcannot be handled by the Datapath, such as lw and sw address ambiguity result in stalls at issue.Our initial top-level designed consisted of three functional units, as specified in the project specifications. However, we decided later on to split the Integer functional unit into R-type, I-type, and Shift functional units.Figure 1: Initial sketch of top-level design3.2 Detailed Level Planning and Component DesignThis stage consisted of doing detail planning of each of the components in the top-level design and coding them in Verilog. The following sections will describe in detail the functionality and design of each component.3.2.1 The Fetch/Issue UnitThe fetcher is the same as that for an ordinary pipelined processor. It contains a register for the PC which is updated after a mux to select pc+4, the branch pc, a jump pc, or a jr. The logic for the selector to this mux is handled in a module called “branch magic” whichwaits (like a reservation station) for branch and jr results to be broadcast on the cdb. Actually, right now we don’t have branch prediction so we keep issuing branches until the dependency is resolved, so this unit only has to input register file outputs. This unit fetches one instruction at a time from the instruction cache and stores it into an instruction register. The instruction in this register is the current instruction that will be decoded and issued by the decode/issue unit later on.Below is a depiction of our planned fetch unit:Figure 2: Fetch/decode block designsIn the decode/issue unit, the instruction is decoded for type, to see what functional unit to send to, and then the status of the reservation stations is checked. If there is an open reservation station


View Full Document

Berkeley COMPSCI 152 - Tomasulo Architecture - Out of Order Execution

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Tomasulo Architecture - Out of Order Execution
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Tomasulo Architecture - Out of Order Execution and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tomasulo Architecture - Out of Order Execution 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?