DOC PREVIEW
MIT 6 035 - Loop Optimizations

This preview shows page 1-2-3-4-30-31-32-33-34-62-63-64-65 out of 65 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 65 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Spring 2010 Loop Optimizations Instruction Scheduling 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs Compiler I d i Variable Induction V i bl Recognition R ii loop invariant code motion Saman Amarasinghe 2 6 035 MIT Fall 1998 Scheduling g Loops p Loop p bodies are small But lot of time is spend in loops due to large number of iterations Need better ways to schedule loops Saman Amarasinghe 3 6 035 MIT Fall 1998 Loop p Example p Machine One load store unit load 2 cycles store 2 cycles Two arithmetic units add 2 cycles branch 2 cycles multiply 3 cycles Both units are pipelined initiate one op each cycle Source Code for i 1 to N A i A i b Saman Amarasinghe 4 6 035 MIT Fall 1998 Loop p Example p Source Code for i 1 to N A i A i b offset ff Assembly Code loop mov imul mov sub jz Saman Amarasinghe base rdi rax r10 r11 r10 r10 rdi rax 4 rax loop 5 6 035 MIT Fall 1998 Loopp Example p mov d 7 2 Assembly Code loop mov imul mov sub jz imul d 5 3 rdi rax r10 r11 r10 r10 rdi rax 4 rax loop mov 0 sub d 2 2 Schedule 9 cycles per iteration mov d 2 jz d 0 mov mov mov imul bge imul bge imul sub sub Saman Amarasinghe 6 6 035 MIT Fall 1998 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs Compiler I d i Variable Induction V i bl Recognition R ii loop invariant code motion Saman Amarasinghe 7 6 035 MIT Fall 1998 Loop p Unrollingg Unroll the loop pbod yfew times Pros Create a much larger basic block for the body Eliminate few loop bounds checks Cons Much larger program Setup codde off iiterations i unroll ll factor f beginning and end of the schedule can still have unusedd slots l t Saman Amarasinghe 8 6 035 MIT Fall 1998 loop mov imul mov sub jz Saman Amarasinghe Loop p Example p rdi rax r10 r11 r10 r10 rdi rax 4 rax loop 9 6 035 MIT Fall 1998 loop mov imul mov sub mov imul mov sub jz Saman Amarasinghe Loop p Example p rdi rax r10 r11 r10 r10 rdi rax 4 rax rdi rax r10 r11 r10 r10 rdi rax 4 rax loop 10 6 035 MIT Fall 1998 mov d 14 mul d 12 mov d 9 sub d 9 mov d 7 mul d 5 mov d 2 sub d 2 jz d 0 2 loop mov imul mov sub mov imul mov sub jz Loopp Example p rdi rax r10 r11 r10 r10 rdi rax 4 rax rdi rax r10 r11 r10 r10 rdi rax 4 rax loop 3 0 2 2 3 0 2 Schedule 8 cycles per iteration mov mov mov mov mov mov mov imul mov imul imul bge imul imul bge imul sub sub sub Saman Amarasinghe sub 11 6 035 MIT Fall 1998 Loop p Unrollingg Rename registers g Use different registers in different iterations Saman Amarasinghe 12 6 035 MIT Fall 1998 loop mov imul mov sub mov imul mov sub jz Saman Amarasinghe Loopp Example p rdi rax r10 r11 r10 10 di r10 rdi rax 4 rax rdi rax r10 r11 r10 r10 rdi rax 4 rax loop 13 mov d 14 mul d 12 mov d 9 sub d 9 mov d 7 mul d 5 mov d 2 sub d 2 d 2 jz d 0 2 3 0 2 2 3 0 2 6 035 MIT Fall 1998 loop mov imul mov sub mov imul mov sub jz Saman Amarasinghe Loopp Example p rdi rax r10 r11 r10 10 di r10 rdi rax 4 rax rdi rax rcx r11 rcx rcx rdi rax 4 rax loop 14 mov d 14 mul d 12 mov d 9 sub d 9 mov d 7 mul d 5 mov d 2 sub d 2 d 2 jz d 0 2 3 0 2 2 3 0 2 6 035 MIT Fall 1998 Loop p Unrollingg Rename reg gisters Use different registers in different iterations Eliminate unnecessary dependencies again again use more registers to eliminate true true anti and output dependencies eliminate dependent dependent chains chains of calculations when possible Saman Amarasinghe 15 6 035 MIT Fall 1998 mov d 14 mul d 12 mov d 9 sub d 9 mov d 7 mul d 5 mov d 2 sub d 2 jz d 0 2 loop mov imul mov sub mov imul mov sub jz Saman Amarasinghe Loopp Example p rdi rax r10 r11 r10 10 di r10 rdi rax 4 rax rdi rax rcx r11 rcx rcx rdi rax 4 rax loop 16 3 0 2 2 3 0 2 6 035 MIT Fall 1998 mov d 5 mul d 3 mov d 0 sub d 0 mov d 7 mul d 5 mov d 2 sub d 2 jz d 0 2 loop mov imul mov sub mov imul mov sub jz Saman Amarasinghe Loopp Example p rdi rax r10 r11 r10 10 di r10 rdi rax 8 rax rdi rbx rcx r11 rcx rcx rdi rbx 8 rbx loop 17 3 0 2 2 3 0 2 6 035 MIT Fall 1998 mov d 5 mul d 3 mov d 0 sub d 0 mov d 7 mul d 5 mov d 2 sub d 2 jz d 0 2 loop mov imul mov sub mov imul mov sub jz Loopp Example p rdi rax r10 r11 r10 r10 rdi rax 8 rax rdi rbx rcx r11 rcx rcx rdi rbx 8 rbx loop S Schedule h d l 4 5 4 5 cycles l per it iteration ti mov mov mov mov mov imul imul 2 2 3 0 2 mov jz imul imul jz imul sub sub sub Saman Amarasinghe 0 mov mov imul 3 sub 18 6 035 MIT Fall 1998 5 Outline Scheduling for loops Loop unrolling Software pipelining Interaction with register allocation Hardware vs Compiler loop invariiant codde motiion Induction Variable Recognition Saman Amarasinghe 19 6 035 MIT Fall 1998 Software Pipelining p g Try yto overla pmulti ple iterations so that the slots will be filled Find the steady steady state state window so that all the instructions of the loop body is executed but from different iterations Saman Amarasinghe 20 6 035 MIT Fall 1998 Loop p Example p Assembly Code loop mov imul mov sub jz j rdi rax r10 r11 r10 r10 rdi rax 4 rax loop p Schedule mov mov mov mov mul jz mul jz mul sub sub Saman Amarasinghe 21 6 035 MIT Fall 1998 Loopp Example p Assembly Code loop mov imul mov sub jz j rdi rax r10 r11 r10 r10 rdi rax 4 rax loop p Schedule mov mov1 mov mul Saman Amarasinghe mov2 mov mov1 mov2 1 2 mul1 mul mul1 mul sub mov3 mov1 mov4 mov2 mov mov3 3 mov1 1 mov4 4 mul2 jz mul3 jz1 mul2 jz mul3 …


View Full Document

MIT 6 035 - Loop Optimizations

Download Loop Optimizations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Loop Optimizations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Loop Optimizations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?