Unformatted text preview:

CS 152 Computer Architecture Final Project Report Spring 2001 Prof Kubiatowicz Section TA 2 4 pm Wednesday Ed Liao Team members Lin Zhou Sonkai Lao Wen Hui Guan Wilma Yeung 0 14255590 14429797 12270914 14499996 Table of Content Abstract Division of Labors Detailed Strategies Results Conclusion Appendix I notebooks Appendix II schematics Appendix III VHDL files Appendix IV testing Appendix V delay time table 1 2 2 2 12 12 13 13 14 14 15 Abstract We implemented a deep pipelined processor with seven stages along with a branch predictor separate instruction and data cache and stream buffer Based on the original 5 staged pipeline we divided the instruction fetch stage into two the same as the execution stage At the first fetch stage the PC is read from the instruction cache while the branch decision is predicted at the second fetch stage Since the ALU is the function unit that accounts for the worst delay time 15ns to achieve optimal performance we broke it down into two 16 bit adders Each adder is paired with a logical unit and is placed on different execution stages We also added a branch predictor on the second stage With this implementation our pipelined processor can sustained a cycle time of 22 5ns with the ideal memory system To optimize our memory system we implemented two cache subsystems one for the data and the other for the instruction Furthermore due to the frequency of sequential memory accesses to the instruction cache we attached a stream buffer to it And finally the performance metrics that we used for the cache are the hit time and miss penalty The hit time for both data and instruction caches is 7ns The miss penalty for the data cache is estimated as 124 5ns while that for the instruction cache is 422ns based on the result from running the mystery program with the ideal memory Division of Labor Project Design Areas Processor datapath and controllers Cache system controllers and arbiter Branch Target predictor Tracer monitor Forwarding Unit Report write up Team Member s Involved Sonkai Lao Lin Zhou Wilma Yeung Wen Hui Guan Sonkai Lao Wen Hui Guan Wilma Yeung Lin Zhou Wen Hui Guan Detailed Strategy a General Design Strategy Memory System 1 Instruction Cache and Stream Buffer 2 Stream buffer is added to reduce the compulsory and capacity misses Since instructions are likely executed in sequence as a group of four to five instructions due to branches and jumps We design the instruction cache and stream buffer with fully associative lookup method Thus we associate each cache and buffer block with a comparator For the instruction cache we implement the FIFO replacement policy which is enforced by using a counter to select a block for replacement in a fashion of a clock Therefore when read miss occurs on both the cache and the buffer the buffer would be flushed The counter would advance after the instruction cache read in 2 sequential words for a block The buffer controller would then fill the buffer with next 4 sequential words This sequence of reads corresponds to two memory accesses under our implementation of burst request of 3 words Based on this emplementation the miss penalty is about 124 5ns However when a hit is found on the cache the hit time is 7ns If the requested word misses on cache but hits on the buffer it only costs one extra ns to bring the word to the NPC 2 Data Cache Re design with fully associative burst request and write back We implement fully associative access method for the data cache along with write back write policy Since we employ two 32 bit DRAM banks in parallel we connect the data cache to the memory via a 64 bit bus We also implement the FIFO replacement policy This policy is enforced by using a counter to select a block for replacement in a fashion of a clock as for the instruction cache Thus the counter only advances when there is a read miss or write miss To increase the performance of the cache we design the cache in such a way that when a write to the DRAM is required the cache only writes one word instead of two to the memory However when there is a read miss and a block is selected for replacement two words would be read into the block to take advantage of the spatial locality This implementation also reduces the overhead for write since only one word is written back to the DRAM if the dirty bit corresponding to that word is set Therefore we implement each block with two dirty bits Write miss and write hit involves a little complication since we need to update only one word 32 bit via a 64 bit bus and make the implementation simple For write miss the block selected for replacement is handled in the same manner as for read miss In this case we just request a read of two words from the DRAMs and set the valid bit after writing the dirty word s to the memory if any However 3 to put these two scenarios together and fit them into a 64 bit input bus we need to consider several write cases Our approach is to keep two valid bits and two dirty bit for each block For example on write hit we simply update the appropriate word and set the dirty bit For a compulsory write miss we would need to update the tag the cache word and the corresponding valid bit Therefore by keeping two valid bit we could do non allocate on write to reduce the overhead Evidently due to the data source for write can come from either CPU or DRAM and the fact the target block may be empty all valid bits are not set and the input data is from a 64 bit bus we need to appropriately choose the 64 bit input data In this regard we implement the data cache controller to distinguish the input source and generate a 3 bit source selection signal to choose the input data 3 Data Cache controller design From the data cache design section write hit and read hit are handled similarly Read hit involves selecting the right word to be output to the CPU On write hit the appropriate word needs to be updated appropriately These two cases can be handled on the same state by recognizing the type of hit and generate the corresponding selection signals For both misses the cache would request permission to access the DRAM through the arbiter describe below It has to wait for the grant signal from the arbiter before performing the write or read The diagram below shows that the hit is handled at the START stage When a miss is triggered the controller would go to either GRANT W or GRANT R states until the write or read permission is granted Once permission is granted the controller would make a


View Full Document

Berkeley COMPSCI 152 - CS 152 Computer Architecture Final Project Report

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view CS 152 Computer Architecture Final Project Report and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 Computer Architecture Final Project Report and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?