Unformatted text preview:

The Moron CS 152 Final Project Professor Kubiatowicz Superscalar Branch Prediction John Gibson cs152 jgibson John Truong cs152 jstruong Albert Wang cs152 albrtaco Timothy Wong cs152 timwong 1 of 18 John Gibson Albert Wang Tim Wong John Truong CS 152 Section 101 Page 2 Final Project I Abstract The goal of this project is to construct a working superscalar processor with branch prediction The memory module from lab 6 needed to be reworked to properly functioning This phase we decided to emphasize robustness and functionality ie a working processor rather than speed so the memory was scaled back to a direct mapped write through cache Although the superscalar architecture itself was straightforward the primary complication lay in increasing the number of ports in the register file and cache to support 2 pipelines We introduced striping in the cache to handle this situation with relatively few stalls II Division of Labor Datapath Enhancement This part involved updating the datapath to include 2 pipelines adding additional forwarding logic and updating the memory register modules to support dual input output when necessary Initial revision Albert John G Cache Striping Dual Issue This part involved writing a direct mapped writethrough cache with bursting striping instructions within the cache and adapting the cache for reading 2 instructions at once Initial revision Testing John G Tim John G Branch Predictor This part involved writing and Initial revision Testing John T John T Tim Distributor This part consisted of a VHDL component that distributes the instructions between the two pipelines based on dependencies and other constraints Initial revision Testing Tim Albert Forwarding Hazards This involved updating the forwarding and hazard units to support the two pipelines 2 of 18 Initial revision Testing Tim Tim Albert Integration Integration primarily involved updating the toplevel modules to support the new modules introduced by superscalar Integration Everybody Overall Testing Testing was done on each element that we implemented followed by thorough testing of the datapath after integration of each component as well as ensuring that it worked on the board correctly Testing Everybody III Detailed Strategy Sections 0 Superscalar 1 Stall Arbiter 2 Dual Issue 3 Memory Subsystem 4 Instruction Distribution 5 Forwarding 6 Hazards 7 Branch Prediction Section 0 Superscalar superscalar sch Because our 5 stage pipelined processor was already working reasonably well extending it to a superscalar architecture was relatively straightforward The two pipelines are referred to as the EVEN and ODD pipeline Alternatively the control signals distinguish between the two pipelines as Pipeline 1 EVEN and 2 ODD this is slightly confusing however we were able to distinguish the names between ourselves and decided that going back to change all the names would be tedious and could cause annoying bugs if we were not careful Each pipeline maintains their own copies of the instructions PCs and control signals they process The goal of this is to isolate each pipeline as much as possible in order to simplify the debugging process and minimize complexity With this project we had the opportunity to use many of the lessons we learned from Lab 6 s non functioning cache Most notably we kept the Keep It Simple Stupid motto in mind throughout the design process Because we wanted to reduce the complexity of our design we decided to limit the functionality of the pipelines For instance all branch 3 of 18 and jump instructions must be processed in the EVEN pipeline whereas all memory instructions must be process in the ODD pipeline We also kept an invariant that the earlier instruction must always be in the EVEN pipeline The rationale behind this decision was that we wanted to keep the pipelines synched so that forwarding hazards and prediction mechanisms would be easier to design and test Although this invariant inevitably increases our CPI our goal was to have a working processor first and then include additional features Keeping this in mind we tried to design our processor so that it would be easy to integrate optimizations later Restricting the pipeline reduced the number of corner cases we had to worry about The branch pipeline was intentionally set as the EVEN pipeline the earlier one so that branch delay slots could be handled more cleanly Since branch and jump instructions will always be sent to the EVEN pipeline with their delay slots in the ODD pipeline our distributor doesn t have to keep states and remember that a delay slot instructions has to be fetched Restricting the pipelines also reduced the complexity of forwarding between the pipelines because data does not need to be forwarded to the memory stage of the EVEN pipeline and nor does data have to forwarded to the decode stage of the ODD pipeline Section 1 Stall Arbiter stallarbiter v When multiple components request a stall or a bubble the stall arbiter decides which stall has precedence Until the final project stalls had been handled in an ad hoc manner with several simple logic gates and latch signals While it was easy to use the ad hoc system for lab 5 only the hazard unit could stall so no arbitration was necessary we began to see stalling issues in lab 6 when we created two additional components the data and instruction caches that needed to stall the processor However with a few more gates we were able to retain our old stalling system Unfortunately this system became inadequate during the development of lab 7 when we created three new stalling signals that needed to be handled The first is bubble which is asserted when the instruction in the decode stage of the ODD pipeline is dependent upon the instruction in the decode stage of the EVEN pipeline The second and third signals are jumpflush and branchflush which are asserted when a jump is detected or bad guess is made by the branch predictor This proved to be far too many signals to handle with simple logic gates so a new module was created to give preference to the various signals 1 Data Cache Stall Freezes the entire pipeline 2 Hazard Stall Freezes the fetch and decode stages inserts bubbles into the execute stage 3 Instruction Cache Stall Freezes the fetch and decode stages inserts bubbles into the execute stage 4 Bubble Inserts a bubble into the execute stage of the ODD pipeline It also fills the decode stage of the EVEN pipeline with the decode instruction from the ODD 4 of 18 pipeline Finally


View Full Document

Berkeley COMPSCI 152 - CS 152 Final Project

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view CS 152 Final Project and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 Final Project and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?