CS 152 Spring 2004 Lab 6 Final Project PWNED Processor With Nearly Everything Dual Jason McDowell ab Charlie Talley bd Philip Kwan bj Jeff Hao bk John Tsai bl Spokesperson Charlie Talley Abstract Our final processor is a dual issue fully superscalar CPU with branch prediction and a pseudo victim cache It has a 5 stage pipeline with the ability to issue any two arithmetic instructions in parallel Memory instructions are also issued simultaneously Control instruction issuing depends on which pipeline the instruction maps to If the control instruction maps to the second pipeline then it is stalled and the previous instruction is issued alone Our processor uses a bi modal branch predictor to help in the issuing of instructions and a victim cache without data swapping to reduce cache miss penalty time By issuing two instructions at a time our processor becomes more prone to stalls and data hazards but overall improvements to the CPI are achieved Division of Labor Philip handled write buffer modifications and testing datapath modifications and overall processor testing Jason duties included creating the branch predictor and it s testing along with overall processor testing Jeff handled issuer modifications arbiter modifications victim cache and IO space modifications and synthesis plus board testing Charlie worked on the victim cache and IO modifications component and processor testing along with spokesperson duties John s responsibilities included controller modifications cache modifications and overall processor testing Detailed Strategy We chose to superscale our processor instead of deepening the pipline We felt superscaling was the best option in terms of complexity and payoffs in implementation By issuing two instructions at a time we are lowering the CPI to less than 1 on average Basically by choosing to do superscalar instead of deeper pipelines we chose to deal with more complex controls in regards to data hazards and structural hazards rather than more complex timing issues Once the decision was made to go with superscalar we discussed the merits of making our processor fully superscalar As memory operations are fairly common we decided to make our processor a fully superscalar one with the ability to execute two memory instructions at the same time Our processor has two pipelines referred to as the first and second pipeline hereafter The issuer will issue two instructions unless one of them is a control instruction in which case it always goes into the first pipeline If the control instruction maps to the second pipeline the previous instruction is issued in the first pipeline along with a nop in the second pipe If the control instruction maps to the first pipeline it is issued in the first pipeline with the instruction immediately following being issued in the second pipeline This ensures our processor stays with the one instruction delay slot convention that was consistent in our processor throughout the semester If the control instruction is a branch the branch predictor helps to eliminate the one clock cycle delay needed after a branch has been issued By implementing a branch predictor with a good success rate over 90 or more we keep our pipeline filled with instructions most of the time even right after a branch has been issued After implementing the branch predictor we decided to add jump prediction functionality to it to further help reduce control delay penalties To allow two instructions to be executed simultaneously i e fully superscalar we added a second ALU along with more datapath components such as muxes and registers We also dual ported the register file instruction cache data cache and level 0 boot ROM By doubling up the pipeline we added more stalling conditions which we were careful to test insuring proper forwarding for all combinations of instruction issuing One point of importance to our controller was coordinating stores with the write buffer Since there is only one write buffer the controller must be smart enough to notice that when there are two stores being issued down the pipeline Only one can be allowed to write to the write buffer while the other must stall until the write buffer is emptied by the arbiter Our initial plan was to also implement a victim cache to reduce data cache miss penalties After drafting up a plan on how to implement the victim cache into our pipeline we came to the conclusion that given the time allotted we might not be able to implement and debug data swapping between the data cache and victim cache Doing so created a stall condition which would have to be handled by the controller making our controller more complicated In the end we decided to implement a pseudovictim cache one which held mutually exclusive data with the data cache but did not swap blocks with it Time permitting we planned to implement data swapping after getting our processor functioning Issuer The issuer checks for hazards between the next two instructions in the instruction stream before they are issued EX to EX MEM to EX and MEM to MEM dependencies can t be issued together on the same cycle since there are no forwarding paths in our datapath to handle them Instead the first instruction in the pair is issued in the first pipeline and the second pipeline is stalled Pairs of instructions that can t be issued together are shown in Figure 1 First Pipeline Any instruction Load instruction Multiply Result calculated in execute Second Pipeline Branch or Jump Instruction that uses the loaded value mfhi or mflo Result needed in execute Figure 1 Unissuable instruction combinations When the issuer issues two instructions it always issues the earlier instruction into the first pipeline allowing the controller to know which instruction inside the pipelines goes before the other This ensures proper forwarding and order of execution When the issuer issues only one instruction that instruction is placed into the first pipeline and a nop is sent down the second pipeline The unissued instruction is scheduled for the first pipeline on the next cycle and the next instruction in the instruction stream is scheduled into the second pipeline If the pipelines are stalled for any reason the issuer issues zero instructions The issuer restricts branches to only be issued into the first pipeline If a branch instruction is decoded for the second pipeline the issuer will issue only the first instruction in the pair to allow rescheduling of the branch into the first pipeline on the
View Full Document
Unlocking...