DOC PREVIEW
Berkeley COMPSCI 152 - CS 152 Final Project

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Table Of Contents0. Abstract1. Division of Labor2. Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed / Unsigned Multiply/Divide.4. Results4.1 Write-Back Cache & Victim Cache:4.2 Branch Predictor4.3 Deep Pipelining5. ConclusionSpring 2004 CS 152Final Project8-Stage Deep-PipelinedMIPS ProcessorMembers:Otto Chiu (cs152-ae)Charles Choi (cs152-bm)Teddy Lee (cs152-ac)Man-Kit Leung (cs152-al)Bruce Wang (cs152-am)Table Of Contents0. Abstract1. Division of Labor2. Detailed Strategy 2.1 Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed/Unsigned Multiply/Divide.4. Results5. Conclusion6. Appendix I (Notebooks)7. Appendix II (Schematics)8. Appendix III (Verilog Files)9. Appendiex IV (Testing)0. AbstractThe goal of our final project was to improve theperformance of the 5-stage pipelined processor fromprevious labs. Aiming at this goal, we converted ourprocessor into a 8-stage deep-pipelined one (22 pt.).Since an increase in the number of branch delay slots isan intrinsic drawback to adding more pipeline stages, wedecided to add a branch predictor to cut down thenumber of stalled cycles in most cases (8 pt.). Thisproblem also appears during a jr instruction. Thus, weinstalled a jump-target predictor (8 pt.) to abate the stallsassociated with the instruction. In addition, weimplemented a write-back cache (7 pt.) and added avictim-cache (4 pt.) to minimize first-level cache misspenalties. Finally, we added to our multiplier/divider theability to handle signed numbers (4 pt.). Our projectimplemented a total of 53 pt. out of the required 37.5 pt.for our group.We implemented our design successfully andthoroughly. We noted significant improvement inperformance. The final clock speed for our processor is27 MHz.1. Division of LaborThe project specifications allowed us to split up thework by the major components: branch predictor, write-back cache, victim cache, jump-target predictor, andsigned multiplier/divider. The entire group was involvedin changing and verifying the existing design. Here is amore detailed division of labor:Otto: Branch predictor, signed mult/divCharles: Branch predictor, memory controllerTeddy: Datapath, mult/div, testbenches Man-Kit: Write-back cache, victim cacheBruce: Write-back cache, jump-target predictor2. Detailed Strategy2.1 Datapath. In order to achieve the 28ns cycle timerequirement, we began by splitting up each memorystage because the timing analysis tool told us that thosestages together with forwarding paths took significantlymore time than other stages. The idea here is toprogressively split up the stages with long critical paths.Since a lot of work is involved in splitting a stage, we cutthe critical paths by shifting components across stageswhenever possible as an alternative. By doing this, wepotentially introduced extra cycle delays, but this ispartially remedied by the higher clock speed. We havesplit up our pipeline into the following stages:IF PR ID EX MR MW WB FWFigure 1: Pipeline Stages2.2 Stage SummaryIF: instruction fetchPR: predict and register- Branch predict- Jump-target predict- Register file read/writeID: decode- Mult/DivEX: execute- Resolve branchesMR: memory readMW:memory writeWB: write backFW: forward- forward WB value to ID stageWe initially split up our memory stages to atag-lookup stage and a data-lookup stage.However, we soon moved the data-lookup parallelto the tag-lookup and registered the data in theMW stage because we found that the critical pathhappened with data-lookup + forward logics. Bymoving data-lookup one stage earlier we can splitup data-lookup from the forwarding logics. Inaddition, the routing time is shortened.2.3 Stage DetailsIF: The instruction is fetched in this stage. We simultaneously look up the tag file and the instruction cache and then check the tag to determine whether we should output it or stall. Our original design was doing only the tag lookup in this stage, but we realized that we could lookup the instruction at the same time.PR: The register file is located here. It was moved from the ID stage to here because of the amount of forwarding logic in the ID stage. We also do our branch and jump-target prediction in this stage.ID: We decode the instruction in this stage. Most of the forwarded values are forwarded to this stage. Our multiplier and divider are also located in this stage.EX: Arithmetic operations except for multiplication and division in this stage. We also resolve the branch and jump predictions we have made in this stage.MR: Like the IF stage, we could do a tag lookup and a cache read in the same cycle. But since we need to know whether there has been a cache hit before we can write to cache, we cannot perform stores in this stage.MW: We perform stores to cache in this stage. When we have a sequence of store followed by a load to the same address, the store and the load would be happening on the same cycle. Since the result from writing and reading from the same line at the same time in a RAM is undefined, we could not let the cache handlethis case by itself. But when we have a store followed bya load to the same address, we know that the value loaded definitely should be the value that we stored. Thus, we simply latch in the value of the store, and output the value on the load.WB: A typical write back stage writing results back to register. We also have forwarding paths from here to ID and EX.FW: This stage only forwards values to the ID stage. We decided to add this stage because we wanted to forward to the ID stage instead of the PR stage. The details are explained below.2.4 Forwarding Paths The forwarding paths that we had in lab 5 stayed inour design. Because we added 3 stages to our pipeline,we had to introduce more forwarding paths. Spliting thedata cache stage into MR and MW stages meant that wehad to add a forwarding path from the MR stage to theID stage, as well a path from the MR stage to EX stage. We added a forwarding path from the FW stage toID to handle the


View Full Document

Berkeley COMPSCI 152 - CS 152 Final Project

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download CS 152 Final Project
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 152 Final Project and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 Final Project 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?