Unformatted text preview:

Superscalar OverviewInstruction Fetch1. Instructions after the jump or taken branch should not be executed.2. The jump or branch address can be into the middle of a cache block, that is, the desired fetch group is misaligned with the cache blocks that the cache can provide.Fetching Executable InstructionsMisalignmentInstruction Decode1. dependences between instructions which must be determined so that only independent instructions are executed in parallel, and2. taken branches and unconditional jumps must be identified quickly so that the Fetch stage can avoid stalling.Pre decodingDispatchExecutionMultiple Execution UnitsTABLE 1. Probability of No Bank Conflict for Two Load/Store UnitsExecution Unit MixWriteback BussesComplete and RetireExceptions and InterruptsECEN 6253 Advanced Digital Computer Design Superscalar Overview January 13, 2006 page 1 of 11Superscalar OverviewThe TEM pipeline (fig. 4-10, p. 191) serves as a model for discussion of the organization of superscalar processors. Although the TEM pipeline looks like a linear pipeline, it is actually a dynamic pipeline since the buffers between stages are complex multi-entry buff-ers. The Execute stage includes multiple diversified execution pipelines. Not all super-scalar designs have precisely these stages, but they must implement the operations corresponding to these generalized stages.Instruction FetchThis stage must get s instructions out of the instruction cache (I-cache) during each clock cycle. These instructions are called the “fetch group.” The execution order of instructions usually corresponds to the consecutive order they are stored in memory. Recall that consecutive memory locations are combined into cache blocks that are stored in the cache as a unit. Cache blocks correspond to a physical row if memory cells in the cache that can be read during one clock cycle (fig. 4-11a, p. 192). Obviously, the cache block size must be at least as large as s, the number of instructions in the fetch group. The high degree of sequentiality of instruction addresses makes it feasible to store consec-utive cache blocks with the same tag (fig. 4-11b). The consecutive cache blocks are called a “cache line.” It normally requires more than one clock cycle to read more than one block from the same cache line.As long as the instructions to be fetched from cache (the fetch group) are contained in a single cache block, the Fetch stage can provide enough instructions to keep the rest of the processor busy. Unfortunately, the presence of jumps and taken branches in the fetch group causes two problems.1. Instructions after the jump or taken branch should not be executed.2. The jump or branch address can be into the middle of a cache block, that is, the desired fetch group is misaligned with the cache blocks that the cache can provide.Fetching Executable Instructions. Let us assume that the Fetch stage fetches s instruc-tions from consecutive memory locations. Let p be the probability that an instruction is a taken branch or an unconditional jump.Let us further assume that the Fetch stage can predict whether the branches are taken (we will discuss how in more detail later). Untaken branches are not a problem since they cause fetching to continue from consecutive memory locations. Taken branches or uncon-ditional jumps cause fetching to continue at some other memory location outside of the current fetch group.If the first instruction in the fetch group is a taken branch or an unconditional jump, then the rest of the instructions in the fetch group should not be executed. This gives a proba-ECEN 6253 Advanced Digital Computer Design Superscalar Overview January 13, 2006 page 2 of 11bility of p that only one executable instruction (the branch or jump) is fetched. P1p=If the first instruction in the fetch group is not a taken branch and not an unconditional jump, and the second instruction is, then only the first two instructions in the fetch group should be executed. This has a probability of 1-p for the first instruction and p for the sec-ond instruction. Assuming these probabilities are independent gives the probability of fetching two executable instructions. P21 p–()p=If the first two instructions in the fetch group are not taken branches and not unconditional jumps, and the third instruction is, then the probability of fetching three executable instructions is the following. P31 p–()2p=This continues until the first taken branch or unconditional jump is not until the second to last instruction of the s instructions in the fetch group. Ps 1–1 p–()s 2–p=The last instruction in the fetch group is executable whether or not it is a taken branch or unconditional branch. Ps1 p–()s 1–p 1 p–+()=1 p–()s 1–=We can now determine the average number of fetched instructions that are executable. sE1 P1⋅ 2 P2⋅…s 1–()Ps 1–⋅ sPs⋅+++ +=p 21 p–()p … s 1–()1 p–()sps1 p–()s 1–+++ += which can be simplified using the geometric series formulas to sE11p–()s–p----------------------------=ssEsE = s1/p1/p ECEN 6253 Advanced Digital Computer Design Superscalar Overview January 13, 2006 page 3 of 11When the fetch group is small (s < 1/p), almost all of the instructions are executable (sE = s). For large fetch groups (s > 1/p), the average number of executable instructions can never be any larger than 1/p. sE1 p⁄<General purpose instruction streams typically have at least 10% taken branches and unconditional jumps (p = 0.1). The average number of executable instructions in a fetch group from a single cache line cannot be larger than about 10 even if we fetch an infinite number of instructions at once.The limit on the number of executable instructions in a fetch group would seem to be a serious limitation for large superscalar processors. More research is needed since super-scalar processors are reaching the size where this limit is important. One solution might be to use multi-port cache capable of reading several different cache lines at once. An interesting alternative is a trace cache. A trace cache stores instructions in the order that instructions are executed (an instruction trace) instead of storing them in program order as in a normal cache. The first time an instruction stream is executed, the fetch group limita-tion would slow the processor down. When the instruction stream is repeated, a trace cache hit can provide many more executable instructions.Misalignment.


View Full Document

O-K-State ECEN 6253 - Lecture Notes

Documents in this Course
Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?