Unformatted text preview:

Superscalar Processors byOverviewWhat are superscalars and how do they differ from pipelines?Development & History of SuperscalarsInstruction Processing ModelSuperscalar ImplementationFrom Sequential to Parallel…DependenciesOvercoming Control Dependencies ExampleControl Dependencies & Branch PredicitionData DependenciesData Dependency ExampleParallel Execution MethodSuperscalar MicroarchitectureSuperscalar MicroarchitectureInstruction Fetch & Branch PredictionProcessing Conditional BranchesSlide 18Slide 19Slide 20Instruction Decoding, Renaming, & DispatchInstruction DecodingRegister RenamingRegister Renaming IRegister Renaming II (using a reorder buffer)Instruction Issuing & Parallel ExecutionSingle Queue MethodMultiple Queue MethodReservation StationsMemory Operation Analysis & ExecutionMemory Hazard DetectionInstruction Reorder & CommitThe Role of SoftwareA Look At 3 Superscalar ProcessorsMIPS R10000Alpha 21164Alpha 21164 Superscalar OrganizationAMD-K5AMD K5 Superscalar OrganizationThe Future of Superscalar ProcessingReferenceSlide 42Superscalar ProcessorsbySherri SparksOverview1. What are superscalar processors?2. Program Representation, Dependencies, & Parallel Execution3. Micro architecture of a typical superscalar processor4. A look at 3 superscalar implementations5. Conclusion: The future of superscalar processingWhat are superscalars and how do they differ from pipelines?In simple pipelining, you are limited to fetching 1 single instruction into the pipeline per clock cycle. This causes a performance bottleneck.Superscalar processors overcome the 1 instruction per clock cycle limit of simple pipelines and possess the ability to fetch multiple instructions during the same clock cycle. They also employ advanced techniques like “branch prediction” to ensure an uninterrupted stream of instructions.Development & History of SuperscalarsPipelining was developed in the late 1950’s and became popular in the 1960’s.Examples of early pipelined architectures are the CDC 6600 and the IBM 360/91 (Tomasulo’s algorithm)Superscalars appeared in the mid to late 1980’sInstruction Processing ModelNeed to maintain software compatibility.The assembly instruction set was the level chosen to maintain compatibility because it did not affect existing software.Need to maintain at least a semblance of a “sequential execution model” for programmers who rely on the concept of sequential execution in software design.A superscalar processor may execute instructions out of order at the hardware level, but execution must *appear* sequential at the programming level.Superscalar ImplementationInstruction fetch strategies that simultaneously fetch multiple instructions often by using branch prediction techniques.Methods for determining data dependencies and keeping track of register values during executionMethods for issuing multiple instructions in parallelResources for parallel execution of many instructions including multiple pipelined functional units and memory hierarchies capable of simultaneously servicing multiple memory references.Methods for communicating data values through memory through load and store instructions.Methods for committing the process state in correct order. This is to maintain the outward appearance of sequential execution.From Sequential to Parallel…Parallel execution often results in instructions completing non sequentially.Speculative execution means that some instructions may be executed when they would not have been executed at all according to the sequential model (i.e. incorrect branch prediction).To maintain the outward appearance of sequential execution for the programmer, storage cannot be updated immediately. The results must be held in temporary status until the storage us updated. Meanwhile, these temporary results must be usable by dependant instructions. When its determined that the sequential model would have executed an instruction, the temporary results are made permanent by updating the outward state of the machine. This process is called “committing” the instruction.DependenciesParallel Execution introduces 2 types of dependenciesControl dependencies due to incrementing or updating the program counter in response to conditional branch instructions. Data dependencies due to resource contention as instructions may need to read / write to the same storage or memory locations.Overcoming Control Dependencies Example L2: mov r3,r7lw r8,(r3)add r3,r3,4lw r9,(r3)ble r8,r9,L3move r3,r7sw r9,(r3)add r3,r3,4sw r8,(r3)add r5,r5,1L3: add r6,r6,1add r7,r7,4blt r6,r4,L2Blocks are issued are initiated into the “window of execution”.Block 1Block 2Block 3Control Dependencies & Branch Predicition To gain the most parallelism, control dependencies due to conditional branches has to be overcome.Branch prediction attempts to overcome this by predicting the outcome of a branch and speculatively fetching and executing instructions from the predicted path. If the predicted path is correct, the speculative status of the instructions is removed and they affect the state of the machine like any other instruction.If the predicted path is wrong, then recovery actions are taken so as not to incorrectly modify the state of the machine.Data DependenciesData dependencies occur because instructions may access the same register or memory location3 Types of data dependencies or “hazards”RAW (“read after write) : occurs because a later instruction can only read a value after a previous instruction has written it.WAR (“write after read”) : occurs when an instruction needs to write a new value into a storage location but must wait until all preceding instructions needing to read the old value have done so.WAW (“write after write”) : occurs when multiple instructions update the same storage location; it must appear that these updates occur in the proper sequence.Data Dependency Examplemov r3,r7lw r8,(r3)add r3,r3,4lw r9,(r3)ble r8,r9,L3 WAWRAWWARParallel Execution Method1. Instructions are fetched using branch prediction to form a dynamic stream of instructions2. Instructions are examined for dependencies and dependencies are removed3. Examined instructions are dispatched to the “window of execution” (These instructions are no longer in sequential order, but are ordered according to their data dependencies.4. Instructions are issued from the window


View Full Document

UCF CDA 5106 - Superscalar Processors

Download Superscalar Processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Superscalar Processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Superscalar Processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?