DOC PREVIEW
Berkeley COMPSCI 152 - Project - Definitely Outta Hand

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Richard AllenDefinitelyOutta Hand(D’OH!)Andrew YeeRichard AllenNathan WoosterJames HillmanTA: Kelvin LwinDecember 9, 19991) Introduction and Summarya) Feature Summary: The Definitely Outta Hand (D’OH!) processor is based on theMIPS Instruction Set. Its features include: a five-stage pipeline with forwarding, two 16-word, 2-way set associative data and instruction caches implementing a write back policy in fast page mode, a 4 block victim cache, and non-blocking loads with a reorder buffer.b) Overall top-level block diagram of the processor:c) A performance summary for the final test programs:Using lab6_mystery as a benchmark:Lab 6 Processor CPI: 5.43Lab 7 Processor CPI: 4.93  9.2% performance increase2) Description of FeaturesWrite back policy: When a block in the cache is replaced by another block coming up from memory, we look at the “dirty bit” associated with the block. If the dirty bit is set, we need to write back to main memory, since the copy in the cache is more up to date. We chose this option, as we hope to save time when the block has not been written to (dirty). With the other option, “write through,” we write to main memory, regardless of whether the block had been modifed.DRAM & ControllerArbiterInstructionCacheDataCacheData CacheControllerInstruction CacheControllerProcessorVictimCacheMSHRVictim CacheControllerFast page mode: We hoped to gain an advantage in speed by grabbing two words from DRAM by using two CAS_L signals in rapid succession. This is more complex, since wecan only grab successive even-odd addresses, rather than grabbing the address needed, but one word at a time.Victim cache: This cache holds 8 words of data and sits between the data cache and the DRAM module. We implemented this module in order to minimize the ping-pong effect where data items bounce between the data cache and memory..Non-blocking loads (MSHR): We chose this so instructions that are not dependent on the result of the load can pass the memory stage, and go directly to the reorder buffer, instead of having the entire pipeline stall on any load, regardless of dependencies. This can potentially save cycles.Reorder buffer: This feature is closely coupled with the non-blocking load scheme. As the non-load-dependent instructions go past the MSHR we need to put them into the reorder buffer so that we need to commit things to the register file in the proper order. This reduces stalls after a load word.3) Performance Summarya) Critical Pathi) Top 3 critical paths in the processor:1) DRAM controller - 26 ns per half-cycle = 52 ns2) Transition from stall to no-stall - 26 ns per half-cycle = 52 nsMSHR  DcacheCtrl  MSHR  StallCtrl  MUX  Reorder Buffer 3) JAL - 16 ns per half-cycle = 32 nsregister  MUX  adder  Reorder Bufferii) Latencies from memory:There is only a 10 ns latency to get data or instructions from the cache. There is a 9 cycle latency from a missed cache request until getting the data or the instruction. If the missed cache item happens to be in the victim cache then there is only a 4 cycle latency to get back the data. Furthermore there is a fivecycle latency for an instruction to get through the pipeline (provided no stalls occur). If there is a loadword and an instruction bypasses it (because of our non-blocking load system) then there could be up to a 9 cycle delay for it’s answer to get written from the reorder buffer to the register file.b) Performance Analysisi) Comparison with the Lab 6 processor:Using lab6_mystery as a benchmark:Lab 6 Processor CPI: 5.43Lab 7 Processor CPI: 4.93  9.2% performance increaseLab 6 cycle time = 52 nsLab 7 cycle time = 52 ns -> 0% performance increaseLab 6 executed 2118 instructionsLab 7 executed 2118 instructionsLab 6 took 11500 cyclesLab 7 took 10441 cyclesLab 6 took 598,000 nsLab 7 took 542,932 ns Using Lab5_mystery as a benchmark:Almost identical performanceii) Explanations for better/worse performance?The Lab 7 processor performed better mostly due to a combination of the non-blocking load scheme and the program we used as a benchmark. Since the lab6_mystery program relied on sequential non-dependent loads, non-blocking loads were very beneficial. On a different program, the results would be less encouraging. Our performance was greatly diminished by the fact that we improperly did operations on both edges of the clock within the memory system. Because of the way we allowed rising edge operations to leak into the processor, things in several spots had to be completed in half a cycle, thus doubling our cycletime.4) Testing PhilosophyFirst of all, we tested the individual blocks before integrating these blocks into the rest of the processor. We used testbenches and command files with stimuli and probed the output, comparing it with our expected output, and attempted to test all the common cases, as well as any special cases we could think of. The memory was tested in multiple layers – we added it one component at a time, testing the system before the next component was added.We used the “mystery programs” from the previous labs as tests for our processor. We looked at waves for timing issues, which was very useful. We could probe values onbuses at specific times. Also, the annotations with the values on the schematic were very useful at times. Finally, to verify the correct output from our processor, we ran the “mystery programs” on SPIM and compared the


View Full Document

Berkeley COMPSCI 152 - Project - Definitely Outta Hand

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Project - Definitely Outta Hand
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Project - Definitely Outta Hand and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Project - Definitely Outta Hand 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?