DOC PREVIEW
Berkeley COMPSCI 152 - Processor With Nearly Everything Dual

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 152 Spring 2004 Lab 6 : Final Project PWNED Processor With Nearly Everything Dual Jason McDowell(ab) - Charlie Talley(bd) - Philip Kwan(bj) - Jeff Hao(bk) - John Tsai(bl) Spokesperson - Charlie Talley Abstract - Our final processor is a dual issue fully superscalar CPU with branch prediction and a pseudo-victim cache. It has a 5 stage pipeline, with the ability to issue any two arithmetic instructions in parallel. Memory instructions are also issued simultaneously. Control instruction issuing depends on which pipeline the instruction maps to. If the control instruction maps to the second pipeline, then it is stalled and the previous instruction is issued alone. Our processor uses a bi-modal branch predictor to help in the issuing of instructions, and a victim cache without data swapping to reduce cache miss penalty time. By issuing two instructions at a time, our processor becomes more prone to stalls and data hazards, but overall improvements to the CPI are achieved. Division of Labor - Philip handled write buffer modifications and testing, datapath modifications, and overall processor testing. Jason duties included creating the branch predictor and it’s testing, along with overall processor testing. Jeff handled issuer modifications, arbiter modifications, victim cache and IO space modifications, and synthesis plus board testing. Charlie worked on the victim cache and IO modifications, component and processor testing, along with spokesperson duties. John’s responsibilities included controller modifications, cache modifications, and overall processor testing. Detailed Strategy - We chose to superscale our processor instead of deepening the pipline. We felt superscaling was the best option in terms of complexity and payoffs in implementation. By issuing two instructions at a time, we are lowering the CPI to less than 1 on average. Basically, by choosing to do superscalar instead of deeper pipelines, we chose to deal with more complex controls in regards to data hazards and structural hazards, rather than more complex timing issues. Once the decision was made to go with superscalar, we discussed the merits of making our processor fully superscalar. As memory operations are fairly common, we decided to make our processor a fully superscalar one, with the ability to execute two memory instructions at the same time. Our processor has two pipelines, referred to as the first and second pipeline hereafter. The issuer will issue two instructions unless one of them is a control instruction, in which case it always goes into the first pipeline. If the control instruction maps to the second pipeline, the previous instruction isissued in the first pipeline along with a nop in the second pipe. If the control instruction maps to the first pipeline, it is issued in the first pipeline, with the instruction immediately following being issued in the second pipeline. This ensures our processor stays with the one instruction delay slot convention that was consistent in our processor throughout the semester. If the control instruction is a branch, the branch predictor helps to eliminate the one clock cycle delay needed after a branch has been issued. By implementing a branch predictor with a good success rate, over 90% or more, we keep our pipeline filled with instructions most of the time, even right after a branch has been issued. After implementing the branch predictor, we decided to add jump prediction functionality to it to further help reduce control delay penalties. To allow two instructions to be executed simultaneously, i.e. fully superscalar, we added a second ALU along with more datapath components such as muxes and registers. We also dual-ported the register file, instruction cache, data cache, and level 0 boot ROM. By doubling up the pipeline, we added more stalling conditions which we were careful to test, insuring proper forwarding for all combinations of instruction issuing. One point of importance to our controller was coordinating stores with the write buffer. Since there is only one write buffer, the controller must be smart enough to notice that when there are two stores being issued down the pipeline. Only one can be allowed to write to the write buffer while the other must stall until the write buffer is emptied by the arbiter. Our initial plan was to also implement a victim cache to reduce data cache miss penalties. After drafting up a plan on how to implement the victim cache into our pipeline, we came to the conclusion that given the time allotted, we might not be able to implement and debug data swapping between the data cache and victim cache. Doing so created a stall condition which would have to be handled by the controller, making our controller more complicated. In the end we decided to implement a pseudo-victim cache, one which held mutually exclusive data with the data cache, but did not swap blocks with it. Time permitting, we planned to implement data swapping after getting our processor functioning. Issuer - The issuer checks for hazards between the next two instructions in the instruction stream before they are issued. EX to EX, MEM to EX, and MEM to MEM dependencies can't be issued together on the same cycle since there are no forwarding paths in our datapath to handle them. Instead, the first instruction in the pair is issued in the first pipeline and the second pipeline is stalled. Pairs of instructions that can't be issued together are shown in Figure 1. First Pipeline Second Pipeline Any instruction Branch or Jump Load instruction Instruction that uses the loaded value Multiply mfhi or mflo Result calculated in execute Result needed in execute Figure 1 : Unissuable instruction combinations When the issuer issues two instructions, it always issues the earlier instructioninto the first pipeline, allowing the controller to know which instruction inside the pipelines goes before the other. This ensures proper forwarding and order of execution. When the issuer issues only one instruction, that instruction is placed into the first pipeline, and a nop is sent down the second pipeline. The unissued instruction is scheduled for the first pipeline on the next cycle, and the next instruction in the instruction stream is scheduled into the second pipeline. If the pipelines are stalled for any reason, the issuer issues zero instructions. The issuer restricts branches to only be issued into the first


View Full Document

Berkeley COMPSCI 152 - Processor With Nearly Everything Dual

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Processor With Nearly Everything Dual
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Processor With Nearly Everything Dual and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Processor With Nearly Everything Dual 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?