Register(Alloca.on(Deconstructed(David&Ryan&Koes&Seth&Copen&Goldstein&12th%Interna+onal%Workshop%on%So3ware%and%Compilers%for%Embedded%Systems%April%24,%2009%Register&Alloca9on&Problem&… v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); … eax(ebx(ecx(edx(esi(edi(ebp(esp(2&Register&Alloca9on&• Graph&coloring&• Linear&scan&• Op9mal&frameworks&• “Move&elimina9on”&allocators&3&Spill&Code&Op9miza9on&spill&to&memory&to&meet®ister&availability&Move&Inser9on&insert&moves&to&make&assignment&easy&(or&leave&in&SSA&form)&Assignment&assign&variables&to®isters;&aLempt&to&maximize&move&coalescing&Ques9ons&• &What&is&the&penalty&of&decomposing®ister&alloca9on&into&individual&components?&&&• What&is&the&individual&impact&of&each&component&on&code&quality?&&&• How&far&from&op9mal&are&exis9ng&heuris9cs?&4&Our(goal(is(to(answer(these(ques.ons.(An&op9mal®ister&alloca9on&framework&is&used&to&empirically&evaluate&the&importance&of&the&components&of®ister&alloca9on,&the&impact&of&component&integra9on,&and&the&effec9veness&of&exis9ng&heuris9cs.&&Outline&• Mo9va9on&• Register&Alloca9on&Components&– Move&Inser9on&– Coalescing&– Spilling&– Assignment&• Methodology&• Results&• Conclusion&5&Move&Inser9on&• Addi9onal&move&instruc9ons&can&simplify&assignment&problem&• Can&eliminate&need&to&spill&• Only&indirect&impact&on&code&quality&6&L1:!write !b!read !a!write !c!read !b!write !a!read !c!branch L1!a&a&b&c&a&a&b&c&move r0 -> r1!Live& RA&Move&Inser9on&Evalua9on&full:&&move&instruc9ons&may&be&inserted&at&any&program&point&&limited:&move&instruc9ons&may&be&inserted&only&at&the&entry&and&exit&of&basic&blocks&none:(no®ister‐to‐register&move&instruc9ons&are&generated&by&the&allocator&7&Coalescing&• Eliminate&move&instruc9ons&by&assigning&each&operand&to&the&same&loca9on&• Can&be&performed&as&separate&pass&– lose&ability&to&coalesce&with&physical®isters&– lose&ability&to&coalesce&“uncoalescables”&8&t1 !← !!← t1!t2 !← t1!!← t2!t3 !← !!← t3!t3 !← t3!!← t3!Pre‐alloc(Coalescing(t3 !← r0! t1 !← !!← t1!t2 !← t1!!← t2!!← t1!Physical(Reg( “Uncoalescable”(Coalescing&Evalua9on&integrated(op.mal:(move&coalescing&is&solved&op9mally&as&part&of&the&complete®ister&alloca9on&problem&integrated(op.mal(ignoring(uncoalescable:&the®ister&allocator&fully&op9mizes&only&those&move&instruc9ons&iden9fied&as&coalescable&prior&to®ister&alloca9on&separate(op.mal:(move&coalescing&is&solved&op9mally&as&a&separate&problem&prior&to&alloca9on&separate(aggressive:(a&greedy&heuris9c&aggressively&eliminates&coalescable&moves&prior&to®ister&alloca9on&none:&no&coalescing&is&performed&9&Spilling&• Can&be&performed&as&a&separate&pass&– spill&variables&to&memory&to&meet®ister&needs&at&each&program&point&– if&move&and&swap&inser9ons&are&allowed,&assignment&is&now&possible&10&a&a&b&c&d®(mem(MAXLIVE(≤(#REG(=(2(Spilling&Evalua9on&integrated(op.mal:&spill&code&genera9on&is&solved&op9mally&as&part&of&the&complete®ister&alloca9on&problem&separate(op.mal:&the&spill&code&genera9on&problem&(reducing&max&liveness&to&meet®ister&availability)&is&solved&op9mally&as&a&standalone&problem&&separate(heuris.c:&the&spill&code&genera9on&problem&is&solved&as&a&standalone&problem&using&a&heuris9c&algorithm&11&Assignment&• assign&physical®ister(s)&to&each&variable&at&every&program&point&• may&change&assignment&of&variable&by&inser9ng&move&instruc9on&• if&spilling&and&coalescing&are&performed&separately,&leaves&assignment&• op9mizes&for®ister&preferences&12&Assignment&Evalua9on&integrated(op.mal:&assignment&is&solved&op9mally&as&part&of&the&complete®ister&alloca9on&problem&graph(heuris.c:&a&graph‐coloring&based&heuris9c&is&used&to&assign®isters&to&the&results&of&spill&code&genera9on;&move&instruc9ons&may&be&inserted&to&improve&colorability&linear(scan(heuris.c:&a&linear&scan&based&heuris9c&is&used&to&assign®ister&to&the&results&of&spill&code&genera9on;&move&instruc9ons&may&be&inserted&to&improve&colorability&&13&Methodology&Implement&op9mal®ister&alloca9on&framework&in&LLVM&2.4&Consider&four&target&architectures&and&two&code&quality&metrics&14&x86& x86‐64&Thumb& ARM&CISC(RISC(More®isters&Fewer®isters&Limita9ons&• Self‐selec9ng&bias&in&results&– limited&to&those&func9ons&where&an&op9mal&solu9on&can&be&found&in&reasonable&9meframe&– however,&qualita9ve&results&do¬&appear&to&change&as&more&9me&is&allowed&for&op9mal&allocator&• Implement&swap&using&memory&loca9on&• Performance&metric&necessarily&inexact&(weighted&sum&of&memory&opera9ons)&• Evaluate&performance&only&on&desktop&processors&15&Results:&Code&Size&• Evaluate&subset&of&Mibench&• Consider&all&func9ons&where&op9mal&solu9ons&can&be&found&in&<10&minutes&– more&than&70%&coverage&of&func9ons&• Report&code&size&increase&rela9ve&to&fully&op9mal&(1.0&best&possible&result)&16&Results:&Code&Performance&• Evaluate&subset&of&SPEC2006&• Op9mize&only&cri9cal(>85%&of&running&9me)&func9ons&• Intel&Core&2&Quad&(Q6600)&@&2.4GHz&&• Report&geometric&mean&rela9ve&to&fully&op9mal&model&&• Possible&to&do&beLer&than&op9mal&due&to&limita9ons&of&metric&17&Move&Inser9on:&Code&Size&18&1&1.0005&1.001&1.0015&1.002&1.0025&x86&x86‐64&ARM&Thumb&Code(Size(Rela.ve(to(Op.mal(No&Move&Inser9on&Limited&Move&Inser9on&Move&Inser9on:&Code&Performance&19&0.98&1&1.02&1.04&1.06&1.08&1.1&Time&Loads&Stores&Instruc9ons&Time&Loads&Stores&Instruc9ons&x86&x86‐64&Rela.e(to(Op.mal(Alloca.on(No&Move&Inser9on&Limited&Move&Inser9on&Coalescing:&Code&Size&20&1&1.002&1.004&1.006&1.008&1.01&x86&x86‐64&ARM&Thumb&Code(Size(Rela.ve(to(Op.mal(No&Coalescing&Separate&Op9mal&Coalescing&Separate&Aggressive&Coalescing&Integrated&Coalescing&‐&Ignore&Uncoalescables&1.044&1.075& 1.11&
or
We will never post anything without your permission.
Don't have an account? Sign up