Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor - D2965127

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> 8-Stage Deep-Pipelined MIPS Processor

DOC PREVIEW

Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Spring 2004 CS 152 Final Project 8 Stage Deep Pipelined MIPS Processor Members Otto Chiu cs152 ae Charles Choi cs152 bm Teddy Lee cs152 ac Man Kit Leung cs152 al Bruce Wang cs152 am Table Of Contents 0 Abstract 1 Division of Labor 2 Detailed Strategy 2 1 Detailed Strategy 2 2 Stage Summary 2 3 Stage Details 2 4 Forwarding Paths 3 Testing 3 1 General testing 3 2 Victim Cache test 3 3 Branch Predictor 3 4 Signed Unsigned Multiply Divide 4 Results 5 6 7 8 9 Conclusion Appendix I Notebooks Appendix II Schematics Appendix III Verilog Files Appendiex IV Testing 0 Abstract The goal of our final project was to improve the performance of the 5 stage pipelined processor from previous labs Aiming at this goal we converted our processor into a 8 stage deep pipelined one 22 pt Since an increase in the number of branch delay slots is an intrinsic drawback to adding more pipeline stages we decided to add a branch predictor to cut down the number of stalled cycles in most cases 8 pt This problem also appears during a jr instruction Thus we installed a jump target predictor 8 pt to abate the stalls associated with the instruction In addition we implemented a write back cache 7 pt and added a victim cache 4 pt to minimize first level cache miss penalties Finally we added to our multiplier divider the ability to handle signed numbers 4 pt Our project implemented a total of 53 pt out of the required 37 5 pt for our group We implemented our design successfully and thoroughly We noted significant improvement in performance The final clock speed for our processor is 27 MHz 1 Division of Labor The project specifications allowed us to split up the work by the major components branch predictor writeback cache victim cache jump target predictor and signed multiplier divider The entire group was involved in changing and verifying the existing design Here is a more detailed division of labor Otto Branch predictor signed mult div Charles Branch predictor memory controller Teddy Datapath mult div testbenches Man Kit Write back cache victim cache Bruce Write back cache jump target predictor 2 Detailed Strategy 2 1 Datapath In order to achieve the 28ns cycle time requirement we began by splitting up each memory stage because the timing analysis tool told us that those stages together with forwarding paths took significantly more time than other stages The idea here is to progressively split up the stages with long critical paths Since a lot of work is involved in splitting a stage we cut the critical paths by shifting components across stages whenever possible as an alternative By doing this we potentially introduced extra cycle delays but this is partially remedied by the higher clock speed We have split up our pipeline into the following stages IF PR ID EX MR MW WB FW Figure 1 Pipeline Stages 2 2 Stage Summary IF instruction fetch PR predict and register Branch predict Jump target predict Register file read write ID decode Mult Div EX execute Resolve branches MR memory read MW memory write WB write back FW forward forward WB value to ID stage We initially split up our memory stages to a taglookup stage and a data lookup stage However we soon moved the data lookup parallel to the tag lookup and registered the data in the MW stage because we found that the critical path happened with data lookup forward logics By moving data lookup one stage earlier we can split up data lookup from the forwarding logics In addition the routing time is shortened 2 3 Stage Details IF The instruction is fetched in this stage We simultaneously look up the tag file and the instruction cache and then check the tag to determine whether we should output it or stall Our original design was doing only the tag lookup in this stage but we realized that we could lookup the instruction at the same time PR The register file is located here It was moved from the ID stage to here because of the amount of forwarding logic in the ID stage We also do our branch and jump target prediction in this stage ID We decode the instruction in this stage Most of the forwarded values are forwarded to this stage Our multiplier and divider are also located in this stage EX Arithmetic operations except for multiplication and division in this stage We also resolve the branch and jump predictions we have made in this stage MR Like the IF stage we could do a tag lookup and a cache read in the same cycle But since we need to know whether there has been a cache hit before we can write to cache we cannot perform stores in this stage MW We perform stores to cache in this stage When we have a sequence of store followed by a load to the same address the store and the load would be happening on the same cycle Since the result from writing and reading from the same line at the same time in a RAM is undefined we could not let the cache handle this case by itself But when we have a store followed by a load to the same address we know that the value loaded definitely should be the value that we stored Thus we simply latch in the value of the store and output the value on the load WB A typical write back stage writing results back to register We also have forwarding paths from here to ID and EX 2 6 Branch Predictor In this project we experimented with 5 different kinds of branch prediction scheme 1 GAs 2 Gshare 3 Gshare Bimodal 4 Always true 5 Always false We will give the performance of each predictor in some of our test benches but we will first describe how we implemented each Since an always true or always false branch predictor is trivial to implement we will only show the statistics for them We used the simple 4 state FSM in Figure 2 to predict whether to take branch or not The transitional edges are labeled by the actual resolved branch Taken Not Taken T Taken FW This stage only forwards values to the ID stage We decided to add this stage because we wanted to forward to the ID stage instead of the PR stage The details are explained below 2 4 Forwarding Paths The forwarding paths that we had in lab 5 stayed in our design Because we added 3 stages to our pipeline we had to introduce more forwarding paths Spliting the data cache stage into MR and MW stages meant that we had to add a forwarding path from the MR stage to the ID stage as well a path from the MR stage to EX stage We added a forwarding path from the FW stage to ID to handle the case where there are 5 instructions between the write and the read Since our register lookup is in the 2nd stage of the pipeline and

View Full Document

Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

Berkeley COMPSCI 152 - 8-Stage Deep-Pipelined MIPS Processor

Sign up for free to view:

Please select your school