UMD CMSC 838Z - Applying Classification Techniques to Remotely-Collected Program Execution Data - D1581460

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 838Z> Applying Classification Techniques to Remotely-Collected Program Execution Data

UMD CMSC 838Z - Applying Classification Techniques to Remotely-Collected Program Execution Data

School name University of Maryland, College Park

Course Cmsc 838z- Advanced Topics in Programming Languages

Pages 12

Download Save

Unformatted text preview:

1 Applying Classification Techniques to Remotely-Collected Program Execution Data This work was supported in part by NSF awards CCF-0205118 to NISS, CCR-0098158 and CCR-0205265 to University of Maryland, and CCR-0205422, CCR-0306372, and CCR-0209322 to Georgia Tech. Murali Haran Penn State University Alan Karr, Ashish Sanil National Institute of Statistical Sciences Adam Porter University of Maryland Alessandro Orso Georgia Institute of Technology Alex Orso - ESEC-FSE - Sep 2005 Testing & Analysis after Deployment Program P User User User User User User User User User User User User User User User User Field Data SE Tasks [Pavlopoulou99] Test adequacy Residual coverage data [Hilbert00] Usability testing GUI interactions [Dickinson01] Failure classification Caller/callee profiles [Bowring02] Coverage analysis Partial coverage data [Orso03] Impact analysis Dynamic slices [Liblit05] Fault localization Various profiles (returns, …) … … …2 Alex Orso - ESEC-FSE - Sep 2005 Tradeoffs of T&A after Deployment • In-house (+) Complete control (measurements, reruns, …) (-) Small fraction of behaviors • In the field (+) All (exercised) behaviors (-) Little control • Only partial measures, no reruns, … • In particular, no oracles • Currently, mostly crashes Alex Orso - ESEC-FSE - Sep 2005 Our Goal Provide a technique for automatically identifying failures • Mainly, in the field • Useful in-house too • Automatically generated test cases3 Alex Orso - ESEC-FSE - Sep 2005 Overview • Motivation and Goal • General Approach • Empirical Studies • Conclusion and Future Work Alex Orso - ESEC-FSE - Sep 2005 Overview • Motivation and Goal • General Approach • Empirical Studies • Conclusion and Future Work4 Alex Orso - ESEC-FSE - Sep 2005 Background: Classification Techniques Classification -> Supervised learning -> Machine learning Many existing techniques (logistic regression, neural networks, tree-based classifiers, SVM, …) obj 1 … obj 2 … obj n … Learning Algorithm Model label x label y label z obj i … predicted label Training Classification Classifier Model Pass/Fail Random Forests Executions Execution Data Alex Orso - ESEC-FSE - Sep 2005 Background: Random Forests Classifiers • Tree-based classifiers • Partition predictor space in hyper-rectangular regions • Regions are assigned a label (+) Easy to interpret (-) Unstable • Random forests [Breiman01] • Integrate many (500) tree classifiers • Classification via a voting scheme (+) Easy to interpret (+) Stable fail size ≥ 14.5 size ≥ 8.5 time ≤ 111 time > 55 pass fail pass fail (size=10, time=80)5 Alex Orso - ESEC-FSE - Sep 2005 Our Approach Some critical open issues • What data should we collect? • What tradeoffs exist between different types of data? • How reliable/generalizable are the statistical analyses? Instrumentor P P inst Test Cases Runtime Execution Data Labels (pass/fail) Learning Algorithm Model (random forest) Training Set Training (In-House) Classification (In the Field) P inst Users Runtime Execution Data Classifier Model Predicted Labels (pass/fail) Alex Orso - ESEC-FSE - Sep 2005 Specific Research Questions RQ1: Can we reliably classify program outcomes using execution data? RQ2: If so, what type of execution data should we collect? RQ3: How can we reduce runtime data collection overhead while still producing accurate and reliable classifications? ⇒ Set of exploratory studies6 Alex Orso - ESEC-FSE - Sep 2005 Overview • Motivation and Goal • General Approach • Empirical Studies • Conclusion and Future Work Alex Orso - ESEC-FSE - Sep 2005 Experimental Setup (I) Subject program • JABA bytecode analysis library • 60 KLOC, 400 classes, 3000 methods • 19 single-fault versions (“golden version” + 1 real fault) Training set • 707 test cases (7 drivers applied to 101 input programs) • Collected various kinds of execution data (e.g., counts for throws, catch blocks, basic blocks, branches, methods, call edges, …) • “Golden version” to label passing/failing runs7 Alex Orso - ESEC-FSE - Sep 2005 Experimental Setup (II) Users’ Runs Classifier Predicted Outcome (pass/fail) Model In-House In the Field Training Set Learning Algorithm Model (random forest) Ideal setting, but • Expensive • Difficult to get enough data points • Oracle problem => Simulate users’ runs Users’ Runs Training Set 1/3 Training Set 2/3 Training Set 1/3 Training Set 2/3 classification error (misclassification rate) Alex Orso - ESEC-FSE - Sep 2005 RQ1 & RQ2: Can We Classify at All? How? • RQ1: Can we reliably classify program outcomes using execution data? • RQ2: Assuming we can classify program outcomes, what type of execution data should we collect? • We first considered a specific kind of execution data: basic-block counts (~20K) (simple measure, intuitively related to faults) • Results: classification error estimates always almost 0! • But, time overheard ~15% and data volume not negligible => Other kinds of execution data8 Alex Orso - ESEC-FSE - Sep 2005 exec i … pass/fail Basic-block counts Alex Orso - ESEC-FSE - Sep 2005 RQ1 & RQ2: Can We Classify at All? How? • We considered other kinds of execution data: • Basic-block counts yielded almost perfect predictors => richer data not considered • Counts for: throws, catch-blocks, methods, and call-edges • Results • Throw and catch-block counts are poor predictors • Method counts produced nearly perfect models • As accurate as block counts, but much cheaper to collect • 3000 methods vs. 20000 blocks (overhead < 5%) • Branch and call-edge counts equally accurate, but more costly than method counts Preliminary conclusion (1): Possible to classify program runs; method counts provided high accuracy at low cost9 Alex Orso - ESEC-FSE - Sep 2005 RQ3: Can We Collect Less Information? • Method-count models used between 2 and 7 method counts. Great for instrumentation, but… • Two alternative hypotheses • Few methods are relevant -> must choose specific methods well • Many, redundant methods -> method selection less important • To investigate, performed 100 random samplings • Took 10% random samples of method counts and rebuilt models • Models were excellent 90% of the times •

View Full Document


School:
Email:
New Password:
Confirm Password:

UMD CMSC 838Z - Applying Classification Techniques to Remotely-Collected Program Execution Data

Sign up for free to view:

Please select your school