UCLA COMSCI 232 - HeintzeTardieu-pldi01 - D3051156

Home> Schools> University of California, Los Angeles> Computer Science (COMSCI) > COMSCI 232> HeintzeTardieu-pldi01

UCLA COMSCI 232 - HeintzeTardieu-pldi01

School name University of California, Los Angeles

Course Comsci 232- Static Program Analysis

Pages 10

Download Save

Unformatted text preview:

Ultra-fast Aliasin9 Analysis using CLA: A Million Lines of C Code in a Second I Nevin Heintze Research, Agere Systems (formerly Lucent's Microeleetronics Division) nch©agere, com Olivier Tardieu Ecole des Mines, Paris olivier, tardieu@mines, org ABSTRACT We describe the design and implementation of a system for very fast points-to analysis. On code bases of about a million lines of unpreprocessed C code, our system performs field- based Andersen-style points-to analysis in less than a second and uses less than 10MB of memory. Our two main contri- butions are a database-centric analysis architecture called compile-link-analyze (CLA), and a new algorithm for imple- menting dynamic transitive closure. Our points-to analysis system is built into a forward data-dependence analysis tool that is deployed within Lucent to help with consistent type modifications to large legacy C code bases. 1. INTRODUCTION The motivation for our work is the following software main- tenance/development problem: given a million+ lines of C code, and a proposed change of the form "change the type of this object (e.g. a variable or struct field) from typel to type2", find all other objects whose type may need to be changed to ensure the "type consistency" of the code base. In particular, we wish to avoid data loss through implicit narrowing conversions. To solve this problem, we need a global data-dependence analysis that in effect performs a forward data-dependence analysis (Section 2 describes this analysis, and how it differs from other more standard de- pendence analyses in the literature.). A critical part of this dependence analysis is an adequate treatment of pointers: for assignments such as *p = x we need to determine what objects p could point to. This kind of aliasing analysis is commonly called points-to analysis in the literature [4]. The scalability of points-to analysis has been a subject of inten- sive study over the last few years [5, 8, 21, 11, 23]. However the feasibility of building interactive tools that employ some form of "sufficiently-accurate" pointer analysis on million line code-bases is still an open question. The paper has two main contributions. The first is an archi- 1This is a substantially revised version of [16]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI 2001 6/01 Snowbird, Utah, USA © 2001 ACM ISBN 1-58113-414-2/01/06... $5.00 tecture for analysis systems that utilizes ideas from indexed databases. We call this architecture compile-link-analyze (CLA), in analogy with the standard compilation process. This architecture provides a substrate on which we can build a variety of analyses (we use it to implement a number of different algorithms for Andersen-style points~to analysis, dependence analysis and a unification-style points-to anal- ysis, all using a common database format for representing programs). It scales to large code bases and supports sepa- rate and/or parallel compilation of collections of source files. Also, its indexing structures support rapid dynamic loading of just those components of object files that are needed for a specific analysis, and moreover after reading a component we have the choice of keeping it in memory or discarding it and re-reading it if we even need it again (this is used to reduce the memory footprint of an analysis). We describe CLA in detail in Section 4, and discuss how it differs from other approaches in the literature, such as methods where sepaxate files are locally analyzed in isolation and then the individual results are combined to analyze an entire code- base. The second contribution is a new algorithm for implement- ing dynamic transitive closure (DTC). Previous algorithms in the literature for Andersen's analysis are based on a tran- sitively closed constraint graph e.g. [4, 10, 11, 21, 23, 22]. In contrast, our algorithm is based on a pre-transitive graph i.e. we maintain the graph in a form that is not transitively closed. When information about a node is requested, we must perform a graph reachability computation (as opposed to just looking up the information at the node itself in the case of a transitively closed constraint graph). A direct im- plementation of the pre-transitive graph idea is impractical. We show how two optimizations - caching of teachability computations, and cycle elimination - yield aa efficient al- gorithm. Cycle elimination has previously been employed in the context of transitively closed graph and shown to re- sult in significant improvement [11], however in that work the cost of finding cycles is non-trivial and so completeness of cycle detection is sacrificed in order to contain its cost. However, in the pre-transitive setting, cycle detection is es- sentially free during graph teachability. We describe our algorithm in detail in Section 5. Section 6 presents various measurements of the performance of our system. For the Lucent code bases for which our sys- tem is targeted, runtimes are typically less than a second (800MHz Pentium) and space utilization is about 10MB. ................ 254These code bases are in excess of a million lines of code (un- commented non-blank lines of source, before pre-processing). On gimp (a publicly available code base of about 440K lines), our system performs field-based Andersen-style points- to analysis in about a second (800MHz Pentium) and uses about 12MB. We also present data to illustrate the space advantages of CLA. 2. MOTIVATING APPLICATION- DEPEN- DENCE ANALYSIS Our points-to analysis system is built into a forward data- dependence analysis tool that is deployed within Lucent to help with consistent type modifications to large legacy C code bases. The basic problem is as follows: suppose that the range of

View Full Document


School:
Email:
New Password:
Confirm Password:

UCLA COMSCI 232 - HeintzeTardieu-pldi01

Sign up for free to view:

Please select your school