DOC PREVIEW
Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Table Of Contents0. Abstract1. Division of Labor2. Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed / Unsigned Multiply/Divide.4. Results4.1 Write-Back Cache & Victim Cache:4.2 Branch Predictor4.3 Deep Pipelining5. ConclusionSpring 2004 CS 152Final Project8-Stage Deep-PipelinedMIPS ProcessorMembers:Otto Chiu (cs152-ae)Charles Choi (cs152-bm)Teddy Lee (cs152-ac)Man-Kit Leung (cs152-al)Bruce Wang (cs152-am)Table Of Contents0. Abstract1. Division of Labor2. Detailed Strategy 2.1 Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed/Unsigned Multiply/Divide.4. Results5. Conclusion6. Appendix I (Notebooks)7. Appendix II (Schematics)8. Appendix III (Verilog Files)9. Appendiex IV (Testing)0. AbstractThe goal of our final project was to improve theperformance of the 5-stage pipelined processor fromprevious labs. Aiming at this goal, we converted ourprocessor into a 8-stage deep-pipelined one (22 pt.).Since an increase in the number of branch delay slots isan intrinsic drawback to adding more pipeline stages, wedecided to add a branch predictor to cut down thenumber of stalled cycles in most cases (8 pt.). Thisproblem also appears during a jr instruction. Thus, weinstalled a jump-target predictor (8 pt.) to abate the stallsassociated with the instruction. In addition, weimplemented a write-back cache (7 pt.) and added avictim-cache (4 pt.) to minimize first-level cache misspenalties. Finally, we added to our multiplier/divider theability to handle signed numbers (4 pt.). Our projectimplemented a total of 53 pt. out of the required 37.5 pt.for our group.We implemented our design successfully andthoroughly. We noted significant improvement inperformance. The final clock speed for our processor is27 MHz.1. Division of LaborThe project specifications allowed us to split up thework by the major components: branch predictor, write-back cache, victim cache, jump-target predictor, andsigned multiplier/divider. The entire group was involvedin changing and verifying the existing design. Here is amore detailed division of labor:Otto: Branch predictor, signed mult/divCharles: Branch predictor, memory controllerTeddy: Datapath, mult/div, testbenches Man-Kit: Write-back cache, victim cacheBruce: Write-back cache, jump-target predictor2. Detailed Strategy2.1 Datapath. In order to achieve the 28ns cycle timerequirement, we began by splitting up each memorystage because the timing analysis tool told us that thosestages together with forwarding paths took significantlymore time than other stages. The idea here is toprogressively split up the stages with long critical paths.Since a lot of work is involved in splitting a stage, we cutthe critical paths by shifting components across stageswhenever possible as an alternative. By doing this, wepotentially introduced extra cycle delays, but this ispartially remedied by the higher clock speed. We havesplit up our pipeline into the following stages:IF PR ID EX MR MW WB FWFigure 1: Pipeline Stages2.2 Stage SummaryIF: instruction fetchPR: predict and register- Branch predict- Jump-target predict- Register file read/writeID: decode- Mult/DivEX: execute- Resolve branchesMR: memory readMW:memory writeWB: write backFW: forward- forward WB value to ID stageWe initially split up our memory stages to atag-lookup stage and a data-lookup stage.However, we soon moved the data-lookup parallelto the tag-lookup and registered the data in theMW stage because we found that the critical pathhappened with data-lookup + forward logics. Bymoving data-lookup one stage earlier we can splitup data-lookup from the forwarding logics. Inaddition, the routing time is shortened.2.3 Stage DetailsIF: The instruction is fetched in this stage. Wesimultaneously look up the tag file and the instructioncache and then check the tag to determine whether weshould output it or stall. Our original design was doingonly the tag lookup in this stage, but we realized that wecould lookup the instruction at the same time.PR: The register file is located here. It was moved fromthe ID stage to here because of the amount offorwarding logic in the ID stage. We also do our branchand jump-target prediction in this stage.ID: We decode the instruction in this stage. Most of theforwarded values are forwarded to this stage. Ourmultiplier and divider are also located in this stage.EX: Arithmetic operations except for multiplication anddivision in this stage. We also resolve the branch andjump predictions we have made in this stage.MR: Like the IF stage, we could do a tag lookup and acache read in the same cycle. But since we need toknow whether there has been a cache hit before we canwrite to cache, we cannot perform stores in this stage.MW: We perform stores to cache in this stage. Whenwe have a sequence of store followed by a load to thesame address, the store and the load would behappening on the same cycle. Since the result fromwriting and reading from the same line at the same timein a RAM is undefined, we could not let the cache handlethis case by itself. But when we have a store followed bya load to the same address, we know that the valueloaded definitely should be the value that we stored.Thus, we simply latch in the value of the store, andoutput the value on the load.WB: A typical write back stage writing results back toregister. We also have forwarding paths from here to IDand EX.FW: This stage only forwards values to the ID stage.We decided to add this stage because we wanted toforward to the ID stage instead of the PR stage. Thedetails are explained below.2.4 Forwarding Paths The forwarding paths that we had in lab 5 stayed inour design. Because we added 3 stages to our pipeline,we had to introduce more forwarding paths. Spliting thedata cache stage into MR and MW stages meant that wehad to add a forwarding path from the MR stage to


View Full Document

Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download 8-Stage Deep-Pipelined MIPS Processor
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 8-Stage Deep-Pipelined MIPS Processor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 8-Stage Deep-Pipelined MIPS Processor 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?