Pin: Building Customized Program Analysis Tools with Dynamic InstrumentationWhat is Program Instrumentation?Uses of Program InstrumentationPin System LayoutSlide 5Slide 6Slide 7Slide 8Slide 9Simplified InstrumentationTrace LinkingTrace Linking (Indirect)Function CloningRegister BindingsOptimization – Inlined Analysis RoutinesOptimization – eflags Register LivenessOptimization – Call SchedulingBasic Pin OverheadEffectiveness of OptimizationsQuestions?San Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin: Building Customized Program Analysis Tools with Dynamic InstrumentationC.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. HazelwoodPresented by: Michael LaurenzanoSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCWhat is Program Instrumentation?• Inserting extra code into an application to observe its behavior– Example: Cache Simulation for (int i = 0; i < LENGTH; i++) {CacheSim(&A[i]); A[i] = (double)i;CacheSim(&B[i]); B[i] = (double)i;CacheSim(&C[i]); C[i] = (double)i; }San Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCUses of Program Instrumentation• Code Profiles– Basic block/Instruction count– Operation results• Microarchitectural study– Branch outcomes– Memory addresses• Bug checking– Memory leaks– Uninitialized dataSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutThe code beinganalyzedSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutThe code beinganalyzedTells us where and howto perform analysisSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutThe code beinganalyzedTells us where and howto perform analysisCombines applicationand pintool code tocreate instrumentedcodeSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutThe code beinganalyzedTells us where and howto perform analysisCombines applicationand pintool code tocreate instrumentedcodeStores theInstrumented codecreated by the JITSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCPin System LayoutThe code beinganalyzedTells us where and howto perform analysisCombines applicationand pintool code tocreate instrumentedcodeStores theInstrumented codecreated by the JITControls execution,maintains datastructures, tracksprogram stateSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCSimplified Instrumentation• Transfer control to VM at an application control transfer• Look for instrumented version of branch target in code cache– If found: execute instrumented code– If not: compile the code, insert into code cache, execute new code• RepeatSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCTrace Linking• Transfer control directly between traces–Branch target must be known statically–Target trace must be present in code cacheSequence 1Trace 1Trace 2Virtual MachineTrace 1Trace 2Sequence 2Regular ExecutionPin w/o Trace LinkingPin w/ Trace LinkingSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCTrace Linking (Indirect)• “Unknown” targets are usually somewhat predictable– Function typically returns to a few locations (few call sites)– Indirect Jump usually goes to a few locations• Try several predicted targets to see if we can avoid VM intervention– Short target lists are maintained for each indirect branch– If we exhaust this list, use the VMSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCFunction Cloning• Most common indirect control transfer is a function return• Create a function instance for each call site– Return address is then unique and known for each function instance– Turns this indirect control transfer into a direct control transfer–Code bloat!• Implemented by keeping a call stack for each instrumented instruction sequence– Keep last 4 in call stack– Call stack represented as a 64-bit integerSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCRegister Bindings• Register re-allocation occurs so that Pin can use registers–The register bindings can be different from one trace to the next• When compiling, keep register bindings from the previous trace if possible• When linking traces, modify the register bindings before going to the next trace– Usually only a few registers are mismatched in practiceSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCOptimization – Inlined Analysis RoutinesWithout Inlining With InliningApplicationApplicationBridge RoutineBridge RoutineAnalysisRoutine- 2 fewer calls and 2 fewer returnsApplicationBridge CodeAnalysis CodeBridge CodeApplication-Other optimizations: constantfolding, code relocationSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCOptimization – eflags Register Liveness• The x86 eflags register is treated as a bit-vector containing state information– This register can be modified as a side-effect of some instructions•eflags might not be live when we reach analysis routine– If this is the case, we do not need to save/restore itSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCOptimization – Call Scheduling• User can specify that the routine be put anywhere in the particular scope – Anywhere in instruction, basic block, function, program, etc.• Pin can schedule the call according to best performance– Perhaps at a point where few registers need to be saved– How well will this actually work?San Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCBasic Pin OverheadSan Diego Supercomputer CenterPerformance Modeling and Characterization LabPMaCEffectiveness of OptimizationsSan Diego Supercomputer CenterPerformance Modeling and Characterization
View Full Document