DOC PREVIEW
Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Table Of Contents0. Abstract1. Division of Labor2. Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed / Unsigned Multiply/Divide.4. Results4.1 Write-Back Cache & Victim Cache:4.2 Branch Predictor4.3 Deep Pipelining5. Conclusion6. Appendix I (Notebooks)7. Appendix II (Schematics)8. Appendix III (Verilog Files)9. Appendiex IV (Test Files)Spring 2004 CS 152 Final Project 8-Stage Deep-Pipelined MIPS Processor Members: Otto Chiu (cs152-ae) Charles Choi (cs152-bm) Teddy Lee (cs152-ac) Man-Kit Leung (cs152-al) Bruce Wang (cs152-am)Table Of Contents 0. Abstract 1. Division of Labor 2. Detailed Strategy 2.1 Detailed Strategy2.2 Stage Summary2.3 Stage Details2.4 Forwarding Paths3. Testing 3.1 General testing.3.2 Victim Cache test.3.3 Branch Predictor.3.4 Signed/Unsigned Multiply/Divide.4. Results 5. Conclusion 6. Appendix I (Notebooks) 7. Appendix II (Schematics) 8. Appendix III (Verilog Files) 9. Appendiex IV (Testing)0. Abstract The goal of our final project was to improve the performance of the 5-stage pipelined processor from previous labs. Aiming at this goal, we converted our processor into a 8-stage deep-pipelined one (22 pt.). Since an increase in the number of branch delay slots is an intrinsic drawback to adding more pipeline stages, we decided to add a branch predictor to cut down the number of stalled cycles in most cases (8 pt.). This problem also appears during a jr instruction. Thus, we installed a jump-target predictor (8 pt.) to abate the stalls associated with the instruction. In addition, we implemented a write-back cache (7 pt.) and added a victim-cache (4 pt.) to minimize first-level cache miss penalties. Finally, we added to our multiplier/divider the ability to handle signed numbers (4 pt.). Our project implemented a total of 53 pt. out of the required 37.5 pt. for our group. We implemented our design successfully and thoroughly. We noted significant improvement in performance. The final clock speed for our processor is 27 MHz. 1. Division of Labor The project specifications allowed us to split up the work by the major components: branch predictor, write-back cache, victim cache, jump-target predictor, and signed multiplier/divider. The entire group was involved in changing and verifying the existing design. Here is a more detailed division of labor: Otto: Branch predictor, signed mult/div Charles: Branch predictor, memory controller Teddy: Datapath, mult/div, testbenches Man-Kit: Write-back cache, victim cache Bruce: Write-back cache, jump-target predictor 2. Detailed Strategy 2.1 Datapath. In order to achieve the 28ns cycle time requirement, we began by splitting up each memory stage because the timing analysis tool told us that those stages together with forwarding paths took significantly more time than other stages. The idea here is to progressively split up the stages with long critical paths. Since a lot of work is involved in splitting a stage, we cut the critical paths by shifting components across stages whenever possible as an alternative. By doing this, we potentially introduced extra cycle delays, but this is partially remedied by the higher clock speed. We have split up our pipeline into the following stages: IF PR ID EX MR MW WB FWFigure 1: Pipeline Stages 2.2 Stage Summary IF: instruction fetch PR: predict and register - Branch predict - Jump-target predict - Register file read/write ID: decode - Mult/Div EX: execute - Resolve branches MR: memory read MW: memory write WB: write back FW: forward - forward WB value to ID stage We initially split up our memory stages to a tag-lookup stage and a data-lookup stage. However, we soon moved the data-lookup parallel to the tag-lookup and registered the data in the MW stage because we found that the critical path happened with data-lookup + forward logics. By moving data-lookup one stage earlier we can split up data-lookup from the forwarding logics. In addition, the routing time is shortened. 2.3 Stage Details IF: The instruction is fetched in this stage. We simultaneously look up the tag file and the instruction cache and then check the tag to determine whether we should output it or stall. Our original design was doing only the tag lookup in this stage, but we realized that we could lookup the instruction at the same time. PR: The register file is located here. It was moved from the ID stage to here because of the amount of forwarding logic in the ID stage. We also do our branch and jump-target prediction in this stage. ID: We decode the instruction in this stage. Most of the forwarded values are forwarded to this stage. Our multiplier and divider are also located in this stage. EX: Arithmetic operations except for multiplication and division in this stage. We also resolve the branch and jump predictions we have made in this stage.MR: Like the IF stage, we could do a tag lookup and a cache read in the same cycle. But since we need to know whether there has been a cache hit before we can write to cache, we cannot perform stores in this stage. MW: We perform stores to cache in this stage. When we have a sequence of store followed by a load to the same address, the store and the load would be happening on the same cycle. Since the result from writing and reading from the same line at the same time in a RAM is undefined, we could not let the cache handle this case by itself. But when we have a store followed by a load to the same address, we know that the value loaded definitely should be the value that we stored. Thus, we simply latch in the value of the store, and output the value on the load. WB: A typical write back stage writing results back to register. We also have forwarding paths from here to ID and EX. FW: This stage only forwards values to the ID stage. We decided to add this stage because we wanted to forward to the ID stage instead of the PR stage. The details are explained below. 2.4 Forwarding Paths The forwarding paths that we had in lab 5 stayed in our design. Because we added 3 stages to our pipeline, we had to introduce more forwarding paths. Spliting the data cache stage into MR and MW stages meant that we had to add a forwarding path from the MR stage to the ID stage, as well a path from the MR stage to EX stage. We added a forwarding path from the FW stage to ID to handle the case where there are


View Full Document

Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download 8-Stage Deep-Pipelined MIPS Processor
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 8-Stage Deep-Pipelined MIPS Processor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 8-Stage Deep-Pipelined MIPS Processor 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?