DOC PREVIEW
Berkeley COMPSCI 152 - CS 152 Computer Architecture Final Project Report

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Computer ArchitectureFinal Project ReportSection: 2-4 pm WednesdayTable of Content:Abstract …………………………………………………………… 2Division of Labors ………………………………………………… 2Detailed Strategies ………………………………………………… 2Results …………………………………………………………….. 12Conclusion ………………………………………………………… 12Appendix V (delay time table) …………………………………….. 15Memory SystemDatapathMemory System Testing MethodologyComponentCommand fileOther filesComponentDalay timeStream buffer controller6nsPredict entry3nsHazard controller (forwarding)5nsBranch table controller6nsData cache controller6nsInstruction cache controller6nsCS 152 Computer ArchitectureFinal Project ReportSpring 2001Prof. KubiatowiczSection: 2-4 pm WednesdayTA: Ed LiaoTeam members:Lin Zhou 14255590Sonkai Lao 14429797Wen Hui Guan 12270914Wilma Yeung 14499996- 0 -Table of Content:Abstract …………………………………………………………… 2Division of Labors ………………………………………………… 2Detailed Strategies ………………………………………………… 2Results …………………………………………………………….. 12Conclusion ………………………………………………………… 12Appendix I (notebooks) …………………………………………… 13Appendix II (schematics) ………………………………………….. 13Appendix III (VHDL files) ………………………………………… 14Appendix IV (testing) ……………………………………………… 14Appendix V (delay time table) …………………………………….. 15- 1 -Abstract:We implemented a deep-pipelined processor with seven stages, along with a branchpredictor, separate instruction and data cache, and stream buffer. Based on theoriginal 5-staged pipeline, we divided the instruction fetch stage into two, the same asthe execution stage. At the first fetch stage, the PC is read from the instruction cachewhile the branch decision is predicted at the second fetch stage. Since the ALU is thefunction unit that accounts for the worst delay time (15ns), to achieve optimalperformance, we broke it down into two 16-bit adders. Each adder is paired with alogical unit and is placed on different execution stages. We also added a branchpredictor on the second stage. With this implementation, our pipelined processor cansustained a cycle time of 22.5ns, with the ideal memory system. To optimize our memory system, we implemented two cache subsystems, one for thedata and the other for the instruction. Furthermore, due to the frequency of sequentialmemory accesses to the instruction cache, we attached a stream buffer to it. Andfinally, the performance metrics that we used for the cache are the hit time and misspenalty. The hit time for both data and instruction caches is 7ns. The miss penaltyfor the data cache is estimated as 124.5ns while that for the instruction cache is422ns, based on the result from running the mystery program with the ideal memory. Division of Labor:Project Design Areas Team Member(s) InvolvedProcessor datapath, and controllersSonkai Lao,Lin ZhouCache system, controllers, and arbiterWilma Yeung,Wen Hui GuanBranch Target predictor Sonkai LaoTracer monitor Wen Hui GuanForwarding Unit Wilma YeungReport write-up Lin Zhou, Wen Hui GuanDetailed Strategy:a. General Design StrategyMemory System(1) Instruction Cache and Stream Buffer- 2 -Stream buffer is added to reduce the compulsory and capacity misses.Since instructions are likely executed in sequence as a group of four tofive instructions due to branches and jumps. We design the instructioncache and stream buffer with fully associative lookup method. Thus,we associate each cache and buffer block with a comparator. For the instruction cache, we implement the FIFO replacement policy,which is enforced by using a counter to select a block for replacementin a fashion of a clock. Therefore, when read-miss occurs on both thecache and the buffer, the buffer would be flushed. The counter wouldadvance after the instruction cache read in 2 sequential words for ablock. The buffer controller would then fill the buffer with next 4sequential words. This sequence of reads corresponds to two memoryaccesses under our implementation of burst request of 3 words. Basedon this emplementation, the miss penalty is about 124.5ns. However,when a hit is found on the cache, the hit time is 7ns. If the requestedword misses on cache but hits on the buffer, it only costs one extra nsto bring the word to the NPC.(2) Data Cache Re-design (with fully associative, burst request, andwrite-back)We implement fully associative access method for the data cache alongwith write-back write policy. Since we employ two 32-bit DRAMbanks in parallel, we connect the data cache to the memory via a 64-bitbus. We also implement the FIFO replacement policy. This policy isenforced by using a counter to select a block for replacement in afashion of a clock as for the instruction cache. Thus, the counter onlyadvances when there is a read-miss or write-miss.To increase the performance of the cache, we design the cache in sucha way that when a write to the DRAM is required, the cache onlywrites one word, instead of two, to the memory. However, when thereis a read-miss and a block is selected for replacement, two wordswould be read into the block to take advantage of the spatial locality.This implementation also reduces the overhead for write since onlyone word is written back to the DRAM if the dirty bit corresponding tothat word is set. Therefore, we implement each block with two dirtybits. Write-miss and write-hit involves a little complication since we needto update only one word (32-bit) via a 64-bit bus and make theimplementation simple. For write-miss, the block selected forreplacement is handled in the same manner as for read-miss. In thiscase, we just request a read of two words from the DRAMs and set thevalid bit after writing the dirty word(s) to the memory if any. However,- 3 -to put these two scenarios together and fit them into a 64-bit input bus,we need to consider


View Full Document

Berkeley COMPSCI 152 - CS 152 Computer Architecture Final Project Report

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download CS 152 Computer Architecture Final Project Report
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 152 Computer Architecture Final Project Report and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 152 Computer Architecture Final Project Report 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?