DOC PREVIEW
Berkeley COMPSCI 152 - CS 152 FINAL PROJECT

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ID/EXIF/IDEX/MEMALUMuxThe “Bigmouth” ProcessorKubiVictor WenDecember 8, 1999Diagram of ProcessorPerformance SummaryProcessor FeaturesControlDatapathMemory SystemFully Associative Cache with Content Adressable Memory64 Bit BusExtra StuffPerformance SummaryTop 3 Critical PathsPerformance AnalysisWhy Performance ImprovedTesting PhilosophyAppendix IncludesCS 152 FINAL PROJECTThe “Bigmouth” ProcessorThe Team:James ChienThomas LeeDanh NguyenJoe SuhThe Prof:KubiThe Ta: Victor WenThe Date:December 8, 1999About the ProcessorThe “Bigmouth Processor” is a 4 stage pipelined processor which implements a subset of the MIPS instruction set. In addition to the basic functionality, the datapath features a 32 bit multiplier/divider unit. The processor also has an optimized memory system which features a 64-bit memory bus with fully associative cache lookup, content addressable memory and a 4 line victim cache and FIFO write buffer. In addition, the processor has a TLB which supports a virtual memory system of 64 words. The “Bigmouth Processor” team is also in the process of designing a 7 stage pipeline. Through the tracer test, we have determined that most of the instructions work insimple cases, but due to our limited time schedule the deep pipelined processor remains in testing. Diagram of ProcessorPerformance SummaryOur processor runs at a minimum cycle time of 50 ns. (20 Mhz)The CPI on the Quicksort program was 3691 cycles/1596 instructions = 2.31.From Lab 6 to Lab 7, the number of cycles dropped by 17% on Lab6_mystery program.Lab 6 vs. Lab 7 processor on a number of different programs.05000100001500020000joetest.slab 5 mysterylab 6 mysterylab 7 mystery harderstoretestcyclesLab 6 Processor Lab 7 (with victim cacheand write buffer)IF/ID ID/EX EX/MEMALUMuxMuxInstructionMemoryRegister FileIF/ID ID/EXEX/MEMDataMemoryALUMuxMuxProcessor FeaturesControlThe Main Control unit is a VHDL component which accepts an instruction during the Instruction Decode Stage of our pipeline and outputs the proper signals depending on the instruction. There is also a hazard detection unit which detects data hazards in the pipeline as well as necessary forwarding logic to handle these cases. DatapathCurrently, our working processor has a 5 stage pipeline which is similar to the design in Pattersonand Henessy, but lacks a IX/MEM register to write to. So the memory writes take place the same cycle that the next instruction is being fetched.Our team was also workgin on a 7 Stage Not So Deep Pipelining and was entering the testing phases but time was not permitting. The Pipeline has been tested and works for some of the less hazardous cases, but is not yet fully functioning.Memory SystemFully Associative Cache with Content Adressable MemoryThe cache is fully associative – meaning that a new entry could replace any old entry depending on the Replacement policy (LRU in our case). Although this improved the hit rate of the cache, it might have increased the cycle time also because we had to add several comparators to compare the incoming address with all the cache tags.64 Bit BusWe utilized spacial locality by incorporating a 64 bit Bus. On a cache miss, we load in 2 sets of data and put them both in the same block (of 2 words). This created a major (unforseen) problem. Writesonly write to one memory location, but it also updates the tag for BOTH words in the block. Therefore the other word in the block is no longer correct (with the new tag). We had to correct this problem by introducing two additional sets of registers to keep track of which “odd” and “even” words in the cache are valid.Victim CacheWe have two victim caches (one for instruction and one for data). Each has 4 lines and holds a total of 8 words. When the main cache is full, it bumps out a set of its data to the victim cache. On subsequent memory access, if there is a main cache miss, the victim cache is also checked to see if one of its tags matches with the address. If it does, the data from the victim cache will be outputted.Write BufferThe write buffer was used to enhance our write-through policy. Before we had a write-through buffer, we had to stall the pipeline on every sw instruction to write through to both the cache and the memory. Now with the write buffer, we can write the sw data into a FIFO buffer and leave it there untilthe memory is ready to write the data in. Thus the pipeline need not be stalled and the store’s 3 cycles canbe performed when the memory is ready. Our write buffer was 4 words in size. The write buffer has to stall when the 4 word blocks get filled up. The write buffer was most efficient for programs that do manystores via a loop. Especially if there are some other instructions like R-format instructions in between the sw’s to make sure the write-buffer can get cleared and doesn’t fill up. (ex merge sort and quick sort)Translation-Lookaside Buffer (TLB)The TLB holds up to 8 entries takes in the top 20 bits of the virtual address during the instruction fetch and uses 5 of these bits to check if the page exists in the main memory. If the page has not been accessed recently and does not have an entry sitting in the TLB, the datapath stalls to fetch the correct entry from the page table. This results in a penalty of abou 4 cycles in order to fetch the TLB entry from the main memory. However, it is certainly worth the penalty compared to not having a TLB and having toaccess memory between every instruction to test if the page was in memory. In fact, on the lab_6 mysteryprogram, the TLB had an over 99% hit rate after 3000 cycles completed. To simplify things, we decided to implement a page table with a maximum of 32 pages, which meant using 5 bits for our Virtual page number. The bottom 6 bits of the Virtual address make up the page offset, so we support pages of 64 words. The translation is also a direct linear mapping, so the virtual page number corresponds linearly to the physical page number (The operating system would take care of this usually). We included the TLB for the instruction accesses, but it could be used for the data instructions as well and in this case, it holds adirty bit, reference bit and uses a pseudo random page replacement algorithm which can send kicked out pages to be written to memory.VERY IMPORTANT NOTE: TO MAKE THE DATAPATH (with tlb) WORK, THE PROGRAM MUST HAVE FOR ITS FIRST 32 INSTRUCTIONS THE NUMBERS (0, 1, 2, …. 31). (SO THE ACTUAL


View Full Document

Berkeley COMPSCI 152 - CS 152 FINAL PROJECT

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download CS 152 FINAL PROJECT
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 152 FINAL PROJECT and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 FINAL PROJECT 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?