Points-To Analysis and Memory Disambiguation Cody HartwigElie KrevatBackground: Pointer Analysis Goal: Determine the set of storage locations that a pointer might reference Related techniques: Alias Analysis – Determine if 2 pointers alias the same mutable memory location Flow-insensitive vs. flow-sensitive May-alias vs. must-alias Escape Analysis – Determine the dynamic scope and lifetime of a pointer Pointer analysis is hard, but essential for enabling compiler optimizations.Example Optimizations CSE needs info on what is read/written:*p = a + b;x = a + b; Reaching definitions and constant propagation:x = 5;*p = 42;y = x; Register variable promotion:int *p = &s.a;s.b = 0;*(p + i) = 1;... = s.b; Scheduling optimizations to hide memory latency Improves IA-64 data speculationPractical Considerations Alias problem is undecidable [Landi 1992] Simplest assumption not very useful: “Everything may alias” Andersen vs. Steensgaard: points-to analysis Both are flow-insensitive and context-insensitive Differ in points-to set construction Andersen: many out edges, one variable per node Steensgaard: one out edge, many variables per nodea = &b;b = &c;a = &d;d = &e;Andersen Steensgaardaab cb cd e(b,d) (c,e)Memory Disambiguation Pointer analysis is just one component of a compiler’s memory disambiguator Little published on complete framework: How do optimizations interact with and benefit from memory disambiguation? How does this affect program performance? What are the most effective disambiguation techniques?On the Importance of Points-To Analysis and Other Memory Disambiguation Methods For C ProgramsRakesh Ghiya, Daniel Lavery, and David SehrPLDI 2001Disambiguation Framework DISAM: DISambiguation using Abstract Memory locations Maintains high-level symbolic representation LOC represents global/local vars, registers, etc. All LOCs independent Set of possible memory locations LOC setDisambiguation FrameworkDisambiguation Methods Intraprocedural (local) methods Direct memory analysis (direct) Includes symbol structure type analysis Simple base and offset analysis (sbo) Indirect memory analysis (indirect) Local points-to analysis (lpt) Array data-dependence analysis (array) Interprocedural methods Global address-taken analysis (global) Whole-program points-to analysis (wpt) Requiring user assertion for compliance with ANSI C Type-based disambiguation (type)WPT Framework WPT implements Andersen algorithm Standard optimizations to reduce cubic complexity: More precise structure handling to distinguish fields, but not instancesstruct foo (int *p; int *q;} s1, s2;int x,y;s1.p = &x;s2.q = &y; Identification of malloc-like functions Determine if malloc is unconditional Determine that address is not taken/stored elsewhere Assignment statements visited in sorted topological orders1.p & s2.p point to xs1.q & s2.q point to yLPT Framework Uses same analysis engine as WPT Conservative assumptions necessary for global vars and function call effects Symbolic location nloc represents all address-escaped varsExperiments 12 C/C++ benchmarks run on IA-64 hardware Highest-optimization level compilation Data speculation turned off! Data collected: Memory reference characteristics and points-to sets Disambiguation queries (number and type) Hash duplicate disambiguation queries Incremental results for each disambiguation methodQuery Results85% of queries independent, 14% maybeIndependent queries by methodEach technique useful for some benchmarkPerf Improvement per methodOverall program performance improvement of 12%Conclusions Suite of disambiguation methods provides different tools effective in different situations Optimizations to Andersen points-to analysis make algorithm runtime acceptable in practice 85% of queries found independent with disambiguation framework 14% maybe independent references leave some room for improvement unrecognizable malloc wrappers indirect calls with many possible targetsUltra-fast Aliasing Analysis using CLA: A Million Lines of C a SecondNevin HeintzeOlivier TardieuMotivation “…given a million+ lines of C code, and a proposed change of the form ‘change the type of this object from type1 to type2’, find all the other objects whose type may need to be changed to ensure ‘type consistency’…”Exampleshort x;short y, z;short *p, v, w;y = x;z = y+1;p = &v;*p = z;w = 1;Exampleint x;short y, z;short *p, v, w;y = x;z = y+1;p = &v;*p = z;w = 1;Complications Requires analysis of pointers Based on points-to analysis of Andersen[1] Must deal with vast amounts of code Modular analysis Defer work to preserve memory and timePoints-to Analysis Unification based (Steensgaard) Assignment unifies graph nodes x = y; // unifies nodes for x and y Less accurate, faster Subset-relationship based (Andersen) Assignment creates subset relationship x = y; // creates constraint x ⊇⊇⊇⊇ y More accurate, slowerDeduction RulesyxyxeeeeeePeeeeP*x eye&yxPexeyyxeeePpxxxxx→→→→=→=→→=→→∈∈===== derivecan weif point tocan )in ( if )in (if )in * (if &:,, and When Asn;|Asn:: Exp*|Exp:: &|*|::313221212121ExpProgramProgramAsnExpstruct handling Field-independent struct is treated as unstructured memory region Field-based struct is treated as separate variablesstruct Example------s=B.y;r gets &z---r=B.x;---q gets &zq=A.y;p gets &zp gets &zp=A.x;assign to xassign to AA.x=&z;Field-basedField-independentStatementHow to scale? These analyses are easy to implement for small programs Large programs are considerably more difficult Time constraints Memory constraintsNaïve Approach Paste all source files together Load information from large pasted file Analyze information Doesn’t scale beyond few thousand LOC3-phase approach Compile parse source files extract assignments, function calls/returns/defns write object file (database) Link Merge all object files Analyze Use points-to analysis contained in merged object fileAnalysis Graph algorithm for Andersen’s method Graph contains node for every variable in program Edges of graph show possible sources (dependence) between nodes. If x = y appears, then there is an edge xy Graph remains in pre-transitive
View Full Document