DOC PREVIEW
UMD CMSC 411 - Lecture 7 Instruction Level Parallelism 1

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS252 S05CMSC 411Computer Systems ArchitectureLecture 7Instruction Level Parallelism 1(Compiler Techniques)CMSC 411 - 7 (from Patterson)2Outline• ILP• Compiler techniques to increase ILP• Loop Unrolling• Static Branch Prediction• Dynamic Branch Prediction• Overcoming Data Hazards with Dynamic Scheduling• Tomasulo Algorithm• ConclusionCMSC 411 - 7 (from Patterson)3Recall from Pipelining• Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls– Ideal pipeline CPI: measure of the maximum performance attainable by the implementation– Structural hazards: HW cannot support this combination of instructions– Data hazards: Instruction depends on result of prior instruction still in the pipeline– Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps)CMSC 411 - 7 (from Patterson)4Instruction-Level Parallelism• Instruction-Level Parallelism (ILP)– Overlap the execution of instructions to improve performance• 2 approaches to exploit ILP1. Rely on hardware to help discover and exploit the parallelism dynamically– Pentium 4, AMD Opteron, IBM Power2. Rely on software technology to find parallelism, statically at compile-time– Itanium 2 / IA-64CMSC 411 - 7 (from Patterson)5Instruction-Level Parallelism (ILP)• Basic Block (BB) ILP is quite small– BB: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit– average dynamic branch frequency 15% to 25% => 4 to 7 instructions execute between a pair of branches– Plus instructions in BB likely to depend on each other• Need ILP across multiple basic blocksCMSC 411 - 7 (from Patterson)6Loop-Level Parallelism• Simplest: loop-level parallelism to exploit parallelism among iterations of a loop. – Examplefor (i=1; i<=1000; i=i+1)x[i] = x[i] + y[i];• Exploit loop-level parallelism by “unrolling loop”either by – dynamic via branch prediction or – static via loop unrolling by compiler(Another way is vectors, to be covered later)CS252 S05CMSC 411 - 7 (from Patterson)7Loop-Level Parallelism• Determining dependences critical• If 2 instructions are– parallel, they can execute simultaneously in a pipeline of arbitrary depth without causing any stalls (assuming no structural hazards)– dependent, they are not parallel and must be executed in order, although they may often be partially overlappedCMSC 411 - 7 (from Patterson)8• InstrJis data dependent (aka true dependence) on InstrI:1. InstrJtries to read operand before InstrIwrites it2. or InstrJis data dependent on InstrKwhich is dependent on InstrI• If two instructions are data dependent, they cannot execute simultaneously or be completely overlapped• Data dependence in instruction sequence  data dependence in source code  effect of original data dependence must be preserved• If data dependence caused a hazard in pipeline, that’s a Read After Write (RAW) hazardData Dependence and HazardsI: add r1,r2,r3J: sub r4,r1,r3CMSC 411 - 7 (from Patterson)9ILP and Data Dependencies, Hazards• HW/SW must preserve illusion of program order: order instructions would execute in if executed sequentially as determined by original source program– dependences are a property of programs• Presence of dependence indicates potential for a hazard, but – actual hazard and length of any stall is property of the pipeline• Importance of the data dependencies1) indicates the possibility of a hazard2) determines order in which results must be calculated3) sets an upper bound on how much parallelism can possibly be exploited• HW/SW goal: exploit parallelism by preserving program order only where it affects the outcome of the programCMSC 411 - 7 (from Patterson)10• Name dependence: when 2 instructions use same register or memory location, called a name, but no flow of data between the instructions associated with that name; 2 versions of name dependence• InstrJwrites operand before InstrIreads itCalled an “anti-dependence” by compiler writers.This results from reuse of the name “r1”• If anti-dependence caused a hazard in the pipeline, that’s a Write After Read (WAR) hazardI: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7Name Dependence #1: Anti-dependenceCMSC 411 - 7 (from Patterson)11Name Dependence #2: Output dependence• InstrJwrites operand before InstrIwrites it.• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”• If anti-dependence caused a hazard in the pipeline, that’s a Write After Write (WAW) hazard• Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict– Register renaming resolves name dependence for registers– Either by compiler or by HWI: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7CMSC 41 1 - 8 (fro m Patter son)Control Dependencies• Every instruction is control dependent on some set of branches, and, in general, these control dependencies must be preserved to preserve program orderif p1 {S1;};if p2 {S2;}• S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1.12CS252 S05CMSC 41 1 - 8 (fro m Patter son)Control Dependence Ignored• Control dependence need not be preserved– willing to execute instructions that should not have been executed, thereby violating the control dependences, if can do so without affecting correctness of the program • Instead, 2 properties critical to program correctness are – exception behavior and – data flow13CMSC 41 1 - 8 (fro m Patter son)Exception Behavior• Preserving exception behavior– any changes in instruction execution order must not change how exceptions are raised in program (no new exceptions)• Example:DADDU R2,R3,R4BEQZ R2,L1LW R1,0(R2)L1:– (Assume branches not delayed)• Problem with moving LW before BEQZ?14CMSC 41 1 - 8 (fro m Patter son)Data Flow• Data flow: actual flow of data values among instructions that produce results and those that consume them– branches make flow dynamic, determine which instruction is supplier of data• Example:DADDU R1,R2,R3BEQZ R4,LDSUBU R1,R5,R6L: …OR R7,R1,R8• ORdepends onDADDUorDSUBU? Must preserve data flow on


View Full Document

UMD CMSC 411 - Lecture 7 Instruction Level Parallelism 1

Documents in this Course
Load more
Download Lecture 7 Instruction Level Parallelism 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 Instruction Level Parallelism 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 Instruction Level Parallelism 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?