15-411 Lecture © Seth Copen Goldstein 2000-1 1 15-745 © Seth Copen Goldstein 2000-5 1 15-745 Software Pipelining Copyright © Seth Copen Goldstein 2000-8 (some slides borrowed from T Callahan & M. Voss) 15-745 © Seth Copen Goldstein 2000-5 2 Software Pipelining • Software pipelining is an IS technique that reorders the instructions in a loop. – Possibly moving instructions from one iteration to the previous or the next iteration. – Very large improvements in running time are possible. • The first serious approach to software pipelining was presented by Aiken & Nicolau. – Aiken’s 1988 Ph.D. thesis. – Impractical as it ignores resource hazards (focusing only on data-dependence constraints). • But sparked a large amount of follow-on research. 15-745 © Seth Copen Goldstein 2000-5 3 Goal of SP • Increase distance between dependent operations by moving destination operation to a later iteration A: a ← ld [d] B: b ← a * a C: st [d], b D: d ← d + 4 Assume all have latency of 2 A B C D 15-745 © Seth Copen Goldstein 2000-5 4 Can we decrease the latency? • Lets unroll A: a ← ld [d] B: b ← a * a C: st [d], b D: d ← d + 4 A1: a ← ld [d] B1: b ← a * a C1: st [d], b D1: d ← d + 4 A B C D A1 B1 C1 D115-411 Lecture © Seth Copen Goldstein 2000-1 2 15-745 © Seth Copen Goldstein 2000-5 5 Rename variables A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d ← d1 + 4 A B C D A1 B1 C1 D1 15-745 © Seth Copen Goldstein 2000-5 6 Schedule A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d ← d1 + 4 A B C D A1 B1 C1 D1 A B C D1 D A1 B1 C1 15-745 © Seth Copen Goldstein 2000-5 7 Unroll Some More A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d2 ← d1 + 4 A2: a2 ← ld [d2] B2: b2 ← a2 * a2 C2: st [d2], b2 D2: d ← d2 + 4 A B C D A1 B1 C1 D1 A2 B2 C2 D2 A B C D2 D A1 B1 C1 D1 A2 B2 C2 15-745 © Seth Copen Goldstein 2000-5 8 Unroll Some More A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d2 ← d1 + 4 A2: a2 ← ld [d2] B2: b2 ← a2 * a2 C2: st [d2], b2 D2: d ← d2 + 4 A B C D A1 B1 C1 D1 A2 B2 C2 D3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D2 A3 B3 C315-411 Lecture © Seth Copen Goldstein 2000-1 3 15-745 © Seth Copen Goldstein 2000-5 9 One More Time A B C D A1 B1 C1 D1 A2 B2 C2 D3 A B C D4 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D2 A3 B3 C3 A4 B4 C4 15-745 © Seth Copen Goldstein 2000-5 10 Can Rearrange A B C D A1 B1 C1 D1 A2 B2 C2 D3 A B C D4 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D2 A3 B3 C3 A4 B4 C4 15-745 © Seth Copen Goldstein 2000-5 11 Rearrange A B C D A1 B1 C1 D1 A2 B2 C2 D3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D2 A3 B3 C3 A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d2 ← d1 + 4 A2: a2 ← ld [d2] B2: b2 ← a2 * a2 C2: st [d2], b2 D2: d ← d2 + 4 15-745 © Seth Copen Goldstein 2000-5 12 Rearrange A B C D A1 B1 C1 D1 A2 B2 C2 D3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D2 A3 B3 C3 A: a ← ld [d] B: b ← a * a C: st [d], b D: d1 ← d + 4 A1: a1 ← ld [d1] B1: b1 ← a1 * a1 C1: st [d1], b1 D1: d2 ← d1 + 4 A2: a2 ← ld [d2] B2: b2 ← a2 * a2 C2: st [d2], b2 D2: d ← d2 + 415-411 Lecture © Seth Copen Goldstein 2000-1 4 15-745 © Seth Copen Goldstein 2000-5 13 SP Loop A: a ← ld [d] B: b ← a * a D: d1 ← d + 4 A1: a1 ← ld [d1] D1: d2 ← d1 + 4 C: st [d], b B1: b1 ← a1 * a1 A2: a2 ← ld [d2] D2: d ← d2 + 4 B2: b2 ← a2 * a2 C1: st [d1], b1 D3: d2 ← d1 + 4 C2: st [d2], b2 A B C C C D3 D A1 B1 B1 B1 C1 D1 A2 A2 A2 B2 C2 D2 D2 D2 Prolog Body Epilog 15-745 © Seth Copen Goldstein 2000-5 14 Goal of SP • Increase distance between dependent operations by moving destination operation to a later iteration A B C dependencies in initial loop A B C iteration i i+1 i+2 after SP 15-745 © Seth Copen Goldstein 2000-5 15 Goal of SP • Increase distance between dependent operations by moving destination operation to a later iteration • But also, to uncover ILP across iteration boundaries! 15-745 © Seth Copen Goldstein 2000-5 16 Example Assume operating on a infinite wide machine A0 A1 B0 A2 B1 C0 A3 B2 C1 B3 C2 C3 A0 A1 B0 Ai Bi-1 Ci-2 Bi Ci-1 Ci15-411 Lecture © Seth Copen Goldstein 2000-1 5 15-745 © Seth Copen Goldstein 2000-5 17 Example Assume operating on a infinite wide machine A0 A1 B0 Ai Bi-1 Ci-2 Bi Ci-1 Ci Prolog epilog loop body 15-745 © Seth Copen Goldstein 2000-5 18 for (i=0; i<N; i++) { Ai Bi Ci } Dealing with exit conditions i=0 if (i >= N) goto done A0 B0 if (i+1 == N) goto last i=1 A1 if (i+2 == N) goto epilog i=2 loop: Ai Bi-1 Ci-2 i++ if (i < N) goto loop epilog: Bi Ci-1 last: ci done: 15-745 © Seth …
View Full Document