Unformatted text preview:

CSE 5317Lecture 26: Instruction scheduling22 April 2010Nate NystromUTAThursday, April 22, 2010• Final exam• Tuesday, May 11, 2-4:30pm• NH 110• Project 5 (4) due Apr 30• 0% late penalty until May 4 2pm• 100% penalty afterwardThursday, April 22, 2010Instruction scheduling• Note: none of this applies to the project• including branch delay slots• SPIM ignores instruction latency issuesThursday, April 22, 2010Instruction-level parallelism• Modern processors can execute several adjacent instructions simultaneously• pipelined machines• very long instruction word machines (VLIW)• superscalar machines• dynamic scheduling/out-of-order machines•ILP limited by execution constraints• data dependence constraints• resource constraints (“hazards”)• control hazardsThursday, April 22, 2010Execution constraints• Data dependence constraints• if instruction A computes a value that is read by instruction B, then B cannot execute before A completes• lw $t2, -28($fp)• add $t1, $t2, $t3• Resource hazards• limited number of functional units• limited instruction issue• limited register setThursday, April 22, 2010Instruction scheduling• Purpose is to order instructions to maximize ILP• keep all resources busy every cycle• eliminate data dependences and resource hazards if necessary• NP-complete (and bad in practice)• need heuristicsThursday, April 22, 2010Instruction scheduling• Open area of research• Most optimizing compilers perform good local IS, simple global IS• Biggest opportunities are scheduling code for loopsThursday, April 22, 2010Should the compiler do IS?• Many modern processors perform dynamic reordering of instructions•called out-of-order execution• use additional registers and register renaming to eliminate dependences• more complex hardware, longer cycle times, more power• some newer multicore processors don’t do OOOEThursday, April 22, 2010von Neumann model• Instruction starts only after predecessor completes• Not very efficient• “von Neumann bottleneck”• “memory wall”!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" "7/3*89:*4-/&(:+)2914/;<=+)8)&>8)&?>/@&24AA)8)/*&*):+/4B9)3&A-8&7(CD(*411&>/&-.)/&>8)>&-A&8)3)>8:+C<E-3*&-.*4?4F4/;&:-?.41)83&.)8A-8?&;--2&1-:>1&7(G&>/2&-/1@&34?.1)&;1-H>1&7(C<=+)&H4;;)3*&-..-8*9/4*4)3&>8)&4/&3:+)2914/;&*+)&:-2)&A-8&1--.3C!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" I(+-912&*+)&,-?.41)8&J-&7(K< E>/@&?-2)8/&?>:+4/)3&.)8A-8?&2@/>?4:&8)-82)84/;&-A&4/3*89:*4-/3CDL13-&:>11)2&M-9*#-A#-82)8&)N):9*4-/OPQQQRSCDT-*&@)*&:1)>8&U+)*+)8&*+43&43&>&;--2&42)>CDV8-W< QQQR&:>/&93)&>224*4-/>1&8);43*)83&>/2&8);43*)8&8)/>?4/;&*-&)14?4/>*)&2>*>&2).)/2)/:)3&*+>*&/-&>?-9/*&-A&3*>*4:&7(&:>/&>::-?.143+C<T-&/))2&*-&8):-?.41)&.8-;8>?3&U+)/&+>82U>8)&:+>/;)3CD ,-/W<QQQR&?)>/3&?-8)&:-?.1)N&+>82U>8)&P>/2&*+93&1-/;)8&:@:1)&*4?)3&>/2&?-8)&U>**>;)SC<L/2&:>/X*&H)&-.*4?>1&34/:)&7(&43&TV#:-?.1)*)C!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" $Y+>*&U)&U411&:-Z)8<(:+)2914/;&H>34:&H1-:[3D\43*&3:+)2914/;D\-/;#1>*)/:@&-.)8>*4-/3D J)1>@&31-*3<(:+)2914/;&A-8&:193*)83&>8:+4*):*98)3< (-A*U>8)&V4.)14/4/;&P/)N*&U))[S<Y+>*&U)&/))2&*-&[/-UD .4.)14/)&3*89:*98)D2>*>&2).)/2)/:4)3D8);43*)8&8)/>?4/;!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" ]<7/&*+)&Z-/&T)9?>//&?-2)1&-A&)N):9*4-/&>/&4/3*89:*4-/&3*>8*3&-/1@&>A*)8&4*3&.8)2):)33-8&:-?.1)*)3C<=+43&43&/-*&>&Z)8@&)AA4:4)/*&?-2)1&-A&)N):9*4-/CDZ-/&T)9?>//&H-**1)/):[&-8&*+)&?)?-8@&U>11Ctimeinstr 1 instr 27/3*89:*4-/&(:+)2914/;Thursday, April 22, 2010Pipelines• Processors today use pipelines to allow overlap of instructions• Divide execution of an instruction into stages; each stage performed by separate part of the processor• Pentium 4: 20-stage pipeline• multicore: pipelines getting shorter (less logic, less power)•Each stage completes its operation in one cycle• shorter than the von Neumann cycle• Instruction still takes the same amount of time to execute!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" 78/3*9:;*4-/&<4.)14/)3=>1?-3*&@11&.9-;)33-93&*-2@A&:3)&4/3*9:;*4-/3&.4.)14/)3&*-&@11-B&-C)91@.&-D&4/3*9:;*4-/3&E<)/*4:?&%&+@3&@&56&3*@F)&.4.)14/)GGGHI=J+)&)K);:*4-/&-D&@/&4/3*9:;*4-/&43&24C42)2&4/*-&3*@F)3L&)@;+&3*@F)&43&.)9D-9?)2&MA&@&3).@9@*)&.@9*&-D&*+)&.9-;)33-9I=N@;+&-D&*+)3)&3*@F)3&;-?.1)*)3&4*3&-.)9@*4-/&4/&-/)&;A;1)&E3+-9*)9&*+)&*+)&;A;1)&4/&*+)&C-/&O):?@//&?-2)1HI=>/&4/3*9:;*4-/&3*411&*@P)3&*+)&3@?)&*4?)&*-&)K);:*)IF D E M WF: Fetch instruction from cache or memory.D: Decode instruction.E: Execute. ALU operation or address calculation.M: Memory access.W: Write back result into register.timeinstr!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" !68/3*9:;*4-/&<4.)14/)3=Q-B)C)9R&B)&-C)91@.&*+)3)&3*@F)3&4/&*4?)&*-&;-?.1)*)&@/&4/3*9:;*4-/&)C)9A&;A;1)IF D E M Wtimeinstr 1F D E M WF D E M WF D E M WF D E M Winstr 2instr 3instr 4instr 5F D E M WF D E M Winstr 6instr 7Filling thepipelineDraining thepipelineSteady state%!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" !!<4.)14/)&Q@S@923=(*9:;*:9@1&Q@S@923T*B-&4/3*9:;*4-/3&/))2&*+)&3@?)&9)3-:9;)&@*&*+)&3@?)&*4?)T?)?-9A&-9&D:/;*4-/@1&:/4*3&4/&@&3:.)93;@1@9I=U@*@&Q@S@923T@/&4/3*9:;*4-/3&/))23&*+)&9)3:1*3&-D&@&.9)C4-:3&4/3*9:;*4-/9!&V&95&W&9X9%&V&9!&W&9!9!&V&Y95Z9%&V&9!&W&9!T3-1C)2&MA&D-9B@924/F&@/2[-9&3*@114/FT;@;+)&?433\=,-/*9-1&Q@S@923T]:?.&^&M9@/;+&@229)33&/-*&P/-B/&:/*41&1@*)9&4/&.4.)14/)T3-1C)2&MA&2)1@A&31-*&@/2[-9&.9)24;*4-/!"#$%"&' ()*+&,-.)/&0-123*)4/&5666#" !5_:?.[`9@/;+&U)1@A&(1-*E3H=,-/*9-1&+@S@923R&4I)I&]:?.[M9@/;+&4/3*9:;*4-/3IF D E M WF D E M WF D E M WF D E M Wjump/branchinstr 2instr 3instr 4unconditional jump address available only after Decode.conditional branch address available only after Execute.Thursday, April 22, 2010PipelinesOverlap the stages:!"#$%"&' ()*+&,-.)/&0-123*)4/& 5666#" 78/3*9:;*4-/&<4.)14/)3=>1?-3*&@11&.9-;)33-93&*-2@A&:3)&4/3*9:;*4-/3&.4.)14/)3&*-&@11-B&-C)91@.&-D&4/3*9:;*4-/3&E<)/*4:?&%&+@3& @&56&3*@F)&.4.)14/)GGGHI=J+)&)K);:*4-/&-D&@/&4/3*9:;*4-/&43&24C42)2&4/*-&3*@F)3L&)@;+&3*@F)&43&.)9D-9?)2&MA&@&3).@9@*)&.@9*&-D&*+)& .9-;)33-9I=N@;+&-D&*+)3)&3*@F)3&;-?.1)*)3&4*3&-.)9@*4-/&4/&-/)&;A;1)&E3+-9*)9&*+)&*+)&;A;1)&4/&*+)&C-/&O):?@//&?-2)1HI=>/&4/3*9:;*4-/&3*411&*@P)3&*+)&3@?)&*4?)&*-&)K);:*)IF D E M WF: Fetch instruction from cache or memory.D: Decode instruction.E: Execute. ALU operation or address calculation.M: Memory access.W: Write back result into register.timeinstr!"#$%"&' ()*+&,-.)/&0-123*)4/& 5666#" !68/3*9:;*4-/&<4.)14/)3=Q-B)C)9R&B)&-C)91@.&*+)3)&3*@F)3&4/&*4?)&*-&;-?.1)*)&@/&4/3*9:;*4-/&)C)9A&;A;1)IF D E M Wtimeinstr 1F D E M WF D


View Full Document

UT Arlington CSE 5317 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?