Unformatted text preview:

LAB 7 Moctezuma By Woojin Yu Woongeun Jung Erik Olson Wilson Cheung Chuong Pham Introduction and Summary a Feature Summary The Moctezuma processor uses a Tomasulo like architecture in an attempt to achieve maximum instruction throughput Moctezuma supports all standard MIPS Integer operations load and store instructions and branch instructions essentially all the MIPS instructions required from Lab5 There are six integer operation reservation stations six load operation reservation stations six store operation reservation stations and twelve reorder buffer slots There is a 32x32bit entry result register file connected to all reservation stations with another 32x32bit entry register file connected to the Reorder buffer for In order Commit data Various control blocks facilitate the passage of signals between registers and buses in the datapath The main control blocks include Issue Control Reorder Buffer Control IOP RS Control Load RS Control Store RS Control and CDB Control b Moctezuma Design The following diagram shows the high level view of the Moctezuma architecture Instructions get partially decoded by the Issue Controller and once the instruction type has been determined lw sw integer operations or control the Issue Controller sends the instruction to the proper Reservation Station for calculation and to the Reorder Buffer for In Order Commit Each Reservation Station is an autonomous unit that requests data from the Result Register file performs all necessary calculations and broadcasts the resulting data on the Common Data Bus The Reorder Buffer stores the instructions and the calculated data in the same order as the instruction issue and performs the writeback to the Register File using an In Order Commit scheme c Performance Summary Feature Descriptions a Features Tomasulo architecture Moctezuma implements a Tomasulo like architecture complete with register renaming and out of order execution The Moctezuma pipeline consists of four Fetch Issue Execute Commit stages namely Fetch Issue Execute and Commit The Fetch stage accesses the instruction cache for the instructions to execute In the case of a cache hit it takes two clock cycles to return the next instruction and in the case of a cache miss it can take more than ten cycles to fetch the instruction The Issue stage retrieves the fetched instruction and determines which PC to go to next In addition there is a Branching Unit that supports all data flow control functions such as JAL JR J BEQ BNE BGEZ and BLTZ with an unintelligent form of branch prediction which directly branches to a location without any operand checks When JAL and J instructions are encountered the new PC address is calculated and the next instruction after the jump can be fetched with one 1 delayed slot In the case of JR it has to wait until the Result RegFile finishes calculating the value of the register requested by the JR before it allows any further instruction fetch The Execute stage can execute multiple instructions simultaneously given that all the dependencies are resolved in the reservation stations After the value is calculated it will be broadcasted to the CDB for other reservation stations to pick up The Commit stage allows in order memory access and in order Register File commit Only operations that do write back and memory access operations go through the Commit stage The Commit stage does not slow down the datapath unless there is a load operation at the head of the Reorder Buffer Resolving Data Dependencies All Reservation Station slots and Reorder Buffer slots have self contained logic blocks that monitor CDB broadcasts and set enable signals to latch data values from the CDB When a CDB Source top 5 bits of the 37 bit Common Data Bus matches the CDB Name the lower 5 bits that are stored in a slot of the Reservation Station or Reorder Buffer it will update its value with the data provided by the CDB Since all instruction operands are renamed in the Issue stage this scheme requires no special forwarding hardware As soon as an instruction has finished the Reservation Station will broadcast the resulting value on the CDB allowing all dependant instructions to latch the value and continue execution In the event that an operand is waiting for a value from a load instruction it must wait until the load instruction reaches the head of the reorder buffer where it will perform the memory access We made this decision early on in the design process believing it to be the best way to insure data continuity in the DRAM since all memory instructions would be forced to be in order However we have since discovered that this has the potential of significantly delaying the processor since memory accesses are the slowest operation out of all implemented instructions forcing the Reorder Buffer to hold the instruction at the head for a long period of time thus increasing the likelihood that the Buffer will become completely full A future version of this processor would avoid this problem by implementing a memory access buffer that could perform all the dependant memory operations in order while allowing all other operations to continue Load Store Reservation Stations Each load and store Reservation Station only calculates the address of the memory access instruction Issue control renames the target of a Load RS to a Reorder Buffer ROB block and sets a pointer in the newly activated ROB block to listen to the CDB for the address broadcasted from the Load RS A Load RS contains a customized 32 bit adder that adds the offset to the address specified by the value stored in the given RS register Since the adder is custom built and uses minimal hardware we anticipate that its impact on the hardware size is small In addition an independent Load and Store RS makes testing easier since all RS are independent of each other and can be tested individually Again a future version of the processor may minimize hardware by reducing the functional units to one per Reservation Station After a Load RS calculates the address and finishes broadcasting it can release the station setting an empty bit and allowing for other instructions to come in The actual memory access is done via the Reorder Buffer Integer Ops Reservation Stations Like the Load and Store Reservation Stations all six of the Integer Ops RS is completely autonomous each block containing its own ALU The separation of the ALU allows us to quickly build and test an IntOp slice independently Furthermore we foresaw the


View Full Document

Berkeley COMPSCI 152 - LAB 7 Moctezuma

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view LAB 7 Moctezuma and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LAB 7 Moctezuma and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?