Fall 2005 Lectures 14 & 15: Instruction Scheduling Simple Machine Model • Instructions are executed in sequence – Fetch, decode, execute, store results – One instruction at a time • For branch instructions, start fetching from a different location if needed – Check branch condition – Next instruction may come from a new location given by the branch instruction Saman Amarasinghe 2 6.035 ©MIT Fall 1998 Simple Execution Model • 5 Stage pipe-line decode memory ifetch execute wr teback • Fetch: get the next instruction • Decode: figure-out what that instruction is • Execute: Perform ALU operation – address calculation in a memory op • Memory: Do the memory access in a mem. Op. • Write Back: write the results back Saman Amarasinghe 3 6.035 ©MIT Fall 1998 4 6.035 IF DE WB IF DE WB Inst 1 Inst 2 IF DE WB IF DE WB IF DE WB IF DE WB IF DE WB Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Saman Amarasinghe ©MIT Fall 1998 Simple Execution Model EXE MEM EXE MEM time EXE MEM EXE MEM EXE MEM EXE MEM EXE MEM From a Simple Machine Model to a Real Machine Model • Many pipeline stages – Pentium 5 – Pentium Pro 10 – Pentium IV (130nm) 20 – Pentium IV (90nm) 31 • Different instructions taking different amount of time to execute • Hardware to stall the pipeline if an instruction uses a result that is not ready Saman Amarasinghe 5 6.035 ©MIT Fall 1998 Real Machine Model cont. • Most modern processors have multiple execution units (superscalar) – If the instruction sequence is correct, multiple operations will happen in the same cycles – Even more important to have the right instruction sequence Saman Amarasinghe 6 6.035 ©MIT Fall 1998 1Constraints On Scheduling • Data dependencies • Control dependencies • Resource Constraints Saman Amarasinghe 7 6.035 ©MIT Fall 1998 Data Dependency between Instructions • If two instructions access the same variable, they can be dependent • Kind of dependencies – True: write → read – Anti: read → write – Output: write → write • What to do if two instructions are dependent. – The order of execution cannot be reversed – Reduce the possibilities for scheduling Saman Amarasinghe 8 6.035 ©MIT Fall 1998 Computing Dependencies Representing Dependencies • For basic blocks, compute dependencies by • Using a dependence DAG, one per basic block walking through the instructions • Nodes are instructions, edges represent • Identifying register dependencies is simple dependencies – is it the same register? • For memory accesses – simple: base + offset1 ?= base + offset2 – data dependence analysis: a[2i] ?= a[2i+1] – interprocedural analysis: global ?= parameter – pointer alias analysis: p1 ?= p Saman Amarasinghe 9 6.035 ©MIT Fall 1998 Saman Amarasinghe 10 6.035 ©MIT Fall 1998 Representing Dependencies • Using a dependence DAG, one per basic block • Nodes are instructions, edges represent dependencies 1: r2 = *(r1 + 4) 2: r3 = *(r1 + 8) 3: r4 = r2 + r3 4: r5 = r2 - 1 3 1 2 4 2 2 2 • Edge is labeled with Latency: –v(i → j) = delay required between initiation times of i and j minus the execution time required by i Saman Amarasinghe 11 6.035 ©MIT Fall 1998 Example 1: r2 = *(r1 + 4)2: r3 = *(r2 + 4) 3: r4 = r2 + r3 4: r5 = r2 - 1 3 1 2 4 2 2 2 2 Saman Amarasinghe 12 6.035 ©MIT Fall 1998 2Another Example Control Dependencies and Resource Constraints 1: r2 = *(r1 + 4) 2: *(r1 + 4) = r3 3 1 2 4 • For now, lets only worry about basic blocks 3: r3 = r2 + r3 • For now, lets look at simple pipelines 4: r5 = r2 - 1 Saman Amarasinghe 13 6.035 ©MIT Fall 1998 Saman Amarasinghe 14 6.035 ©MIT Fall 1998 Example List Scheduling Algorithm Results In 1: lea var_a, %rax 1 cycle• Idea 2: add $4, %rax 1 cycle3: inc %r11 1 cycle– Do a topological sort of the dependence DAG 4: mov 4(%rsp), %r10 3 cycles – Consider when an instruction can be scheduled 5: add %r10, 8(%rsp) without causing a stall 6: and 16(%rsp), %rbx 4 cycles– Schedule the instruction if it causes no stall and all 7: imul %rax, %rbx 3 cycles its predecessors are already scheduled • Optimal list scheduling is NP-complete – Use heuristics when necessary 1 2 3 4 st st 5 6 st st st 7 Saman Amarasinghe 15 6.035 ©MIT Fall 1998 Saman Amarasinghe 16 6.035 ©MIT Fall 1998 List Scheduling Algorithm Heuristics for selection • Create a dependence DAG of a basic block • Heuristics for selecting from the READY list • Topological Sort – pick the node with the longest path to a leaf in the READY = nodes with no predecessors Loop until READY is empty Schedule each node in READY when no stalling dependence graph – pick a node with most immediate successors – pick a node that can go to a less busy pipeline (in a Update READY superscalar) Saman Amarasinghe 17 6.035 ©MIT Fall 1998 Saman Amarasinghe 18 6.035 ©MIT Fall 1998 3Heuristics for selection • pick the node with the longest path to a leaf in the dependence graph • Algorithm (for node x) – If no successors dx = 0 –dx = MAX( dy + cxy) for all successors y of x – reverse breadth-first visitation order Saman Amarasinghe 19 6.035 ©MIT Fall 1998 Heuristics for selection • pick a node with most immediate successors • Algorithm (for node x): –fx = number of successors of x Saman Amarasinghe 20 6.035 ©MIT Fall 1998 Example Results In 1: lea var_a, %rax 1 cycle2: add $4, %rax 1 cycle3: inc %r11 1 cycle4: mov 4(%rsp), %r10 3 cycles5: add %r10, 8(%rsp) 6: and 16(%rsp), %rbx 4 cycles7: imul %rax, %rbx 8: mov %rbx, 16(%rsp) 3 cycles 9: lea var_b, %rax Saman Amarasinghe 21 6.035 ©MIT Fall 1998 Example 1 6 8 2 7 9 1 1 3 4 1 4 5 3 3 d=0 d=0 d=3 d=7d=4 d=5 f=0 f=1f=1 f=1 f=2 f=0 d=0 f=0 d=3 f=1 READY = { } d=0 f=0 6 1 2 4 7 3 5 8 9 Saman Amarasinghe 22 6.035 ©MIT Fall 1998 Example Results In 1: lea var_a, %rax 1 cycle2: add $4, %rax 1 cycle3: inc %r11 1 cycle4: mov 4(%rsp), %r10 3 cycles5: add %r10, 8(%rsp) 6: and 16(%rsp), %rbx 4 cycles7: imul %rax, %rbx 8: mov %rbx, 16(%rsp) 3 cycles 9: lea 1 2 3 4 st st 5 6 st st st 7 8 96 1 2 4 7 3 5 8 9 var_b, %rax 14 cycles vs 9 cyclesSaman Amarasinghe 23 6.035 ©MIT Fall 1998 Resource Constraints • Modern machines have many resource constraints • Superscalar architectures: – can run few parallel operations – But
View Full Document