Unformatted text preview:

CS 152 FINAL PROJECT The Bigmouth Processor The Team James Chien Thomas Lee Danh Nguyen Joe Suh The Prof Kubi The Ta Victor Wen The Date December 8 1999 About the Processor The Bigmouth Processor is a 4 stage pipelined processor which implements a subset of the MIPS instruction set In addition to the basic functionality the datapath features a 32 bit multiplier divider unit The processor also has an optimized memory system which features a 64 bit memory bus with fully associative cache lookup content addressable memory and a 4 line victim cache and FIFO write buffer In addition the processor has a TLB which supports a virtual memory system of 64 words The Bigmouth Processor team is also in the process of designing a 7 stage pipeline Through the tracer test we have determined that most of the instructions work insimple cases but due to our limited time schedule the deep pipelined processor remains in testing Diagram of Processor IF ID ID EX IF ID EX MEM ID EX Instruction Memory Register File Mux Mux EX MEM ALU ALU Mux Mux Performance Summary Our processor runs at a minimum cycle time of 50 ns 20 Mhz The CPI on the Quicksort program was 3691 cycles 1596 instructions 2 31 From Lab 6 to Lab 7 the number of cycles dropped by 17 on Lab6 mystery program 20000 Lab 6 Processor 15000 Lab 7 with victim cache and write buffer cycles 10000 5000 re tes t ha rd er sto 7m ys ter y lab ter y la b 6m ys ter y 5m ys la b jo e tes t s 0 Lab 6 vs Lab 7 processor on a number of different programs Data Me mor y Processor Features Control The Main Control unit is a VHDL component which accepts an instruction during the Instruction Decode Stage of our pipeline and outputs the proper signals depending on the instruction There is also a hazard detection unit which detects data hazards in the pipeline as well as necessary forwarding logic to handle these cases Datapath Currently our working processor has a 5 stage pipeline which is similar to the design in Patterson and Henessy but lacks a IX MEM register to write to So the memory writes take place the same cycle that the next instruction is being fetched Our team was also workgin on a 7 Stage Not So Deep Pipelining and was entering the testing phases but time was not permitting The Pipeline has been tested and works for some of the less hazardous cases but is not yet fully functioning Memory System Fully Associative Cache with Content Adressable Memory The cache is fully associative meaning that a new entry could replace any old entry depending on the Replacement policy LRU in our case Although this improved the hit rate of the cache it might have increased the cycle time also because we had to add several comparators to compare the incoming address with all the cache tags 64 Bit Bus We utilized spacial locality by incorporating a 64 bit Bus On a cache miss we load in 2 sets of data and put them both in the same block of 2 words This created a major unforseen problem Writes only write to one memory location but it also updates the tag for BOTH words in the block Therefore the other word in the block is no longer correct with the new tag We had to correct this problem by introducing two additional sets of registers to keep track of which odd and even words in the cache are valid Victim Cache We have two victim caches one for instruction and one for data Each has 4 lines and holds a total of 8 words When the main cache is full it bumps out a set of its data to the victim cache On subsequent memory access if there is a main cache miss the victim cache is also checked to see if one of its tags matches with the address If it does the data from the victim cache will be outputted Write Buffer The write buffer was used to enhance our write through policy Before we had a write through buffer we had to stall the pipeline on every sw instruction to write through to both the cache and the memory Now with the write buffer we can write the sw data into a FIFO buffer and leave it there until the memory is ready to write the data in Thus the pipeline need not be stalled and the store s 3 cycles can be performed when the memory is ready Our write buffer was 4 words in size The write buffer has to stall when the 4 word blocks get filled up The write buffer was most efficient for programs that do many stores via a loop Especially if there are some other instructions like R format instructions in between the sw s to make sure the write buffer can get cleared and doesn t fill up ex merge sort and quick sort Translation Lookaside Buffer TLB The TLB holds up to 8 entries takes in the top 20 bits of the virtual address during the instruction fetch and uses 5 of these bits to check if the page exists in the main memory If the page has not been accessed recently and does not have an entry sitting in the TLB the datapath stalls to fetch the correct entry from the page table This results in a penalty of abou 4 cycles in order to fetch the TLB entry from the main memory However it is certainly worth the penalty compared to not having a TLB and having to access memory between every instruction to test if the page was in memory In fact on the lab 6 mystery program the TLB had an over 99 hit rate after 3000 cycles completed To simplify things we decided to implement a page table with a maximum of 32 pages which meant using 5 bits for our Virtual page number The bottom 6 bits of the Virtual address make up the page offset so we support pages of 64 words The translation is also a direct linear mapping so the virtual page number corresponds linearly to the physical page number The operating system would take care of this usually We included the TLB for the instruction accesses but it could be used for the data instructions as well and in this case it holds a dirty bit reference bit and uses a pseudo random page replacement algorithm which can send kicked out pages to be written to memory VERY IMPORTANT NOTE TO MAKE THE DATAPATH with tlb WORK THE PROGRAM MUST HAVE FOR ITS FIRST 32 INSTRUCTIONS THE NUMBERS 0 1 2 31 SO THE ACTUAL PROGRAM STARTS ON LINE 33 THIS IS NECESSARY BECAUSE THAT S WHERE THE PAGE TABLE WOULD BE FILLED Extra Stuff The Monitor module traces the number of cycles that pass There are also a number of cache and hit miss counters that record the hit miss rates for the cache write buffer and tlb Performance Summary Top 3 …


View Full Document

Berkeley COMPSCI 152 - CS 152 FINAL PROJECT

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view CS 152 FINAL PROJECT and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 FINAL PROJECT and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?