DOC PREVIEW
Stanford CS 295 - Performance Debugging

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Alex Aiken Lecture 8 CS295 1Performance DebuggingLecture 8CS295The Problem• We have mostly looked at functionality– Debugging/checking input-output behavior– Lots of ideas/tools available and being developed• Equally important is debugging performance– Debugging/checking resource utilization– But, only a handful of ideas/toolsAlex Aiken Lecture 8 CS295 2Outline• Profiling a program execution• Profiling sets of executions• Where are the performance bugs?• Unsolved problemsAlex Aiken Lecture 8 CS295 3gprof: Execution Profiling• Goal: Accurately account for time spent in each function• Produce a report like:– function f: 60%– function g: 10%– function h: 5%Alex Aiken Lecture 8 CS295 4Problems• Functions are called from multiple places– f calls g– h calls g– How do we charge the time spent in g to f and h?• Cost of a function varies– Different execution times for f on different calls• Instrumentation must not take much time– Critical, as we are measuring timeAlex Aiken Lecture 8 CS295 5Simplifying Assumption I• All calls to a function f take the same time– Only seriously flawed if cost of f is highly correlated with the call site– May well be, though• Consider a sorting function qsort(comp,l)– Time for sort varies greatly depending on– Cost of comparison operator comp– Length of list lAlex Aiken Lecture 8 CS295 62Simplifying Assumption II• Treat mutually recursive functions identically– If f calls g and g calls f, then f and g are equated– i.e., their reports will be the same• Why?– Not an unreasonable approximation if recursive calls are relatively deep in actual runs– Makes call graph acyclic– Makes distributing the time much easierAlex Aiken Lecture 8 CS295 7Profile Data• Gather two kinds of data:•Execution frequency of each function– Set random timer interrupts– On interrupt, record current function– Vector of counters Cf,Cg,…, one per function•Who calls who– On function call, record caller and callee– Increment Countcaller calls calleein hash tableAlex Aiken Lecture 8 CS295 8Self-Time• Estimate of percentage time in function f– Cfis the number of samples in f– C = §fCfis the total number of samples• • Then SfisCf* (total time)C• Sfis the self-time– Time spent in the body of fAlex Aiken Lecture 8 CS295 9Total Time• The total time spent in f is– The self-time Cfof f plus– The time spent in functions called by fTf= Cf+ §f calls gTg* (Countf calls g/ Countg)where Countg= §h Counth calls gAlex Aiken Lecture 8 CS295 10Notes• Formula assumes• No recursive functions• Each call to a function costs the sameAlex Aiken Lecture 8 CS295 11Example• Report includes– Time for self & called functions– Time divided up by• Time for each site the function is called from• Time for each call site in the functionAlex Aiken Lecture 8 CS295 123gprof Summary• Strengths– Attributes time to individual program components– Based on a single run– Standard approach to performance profiling• Weaknesses– Assumes uniform time for calls, recursive functions– Measurement effects distort time of small functions• Sometimes very substantiallyAlex Aiken Lecture 8 CS295 13Typical Usage• Run gprof• Optimize worst offenders• Repeat until profile is flat– Time spread out about evenly among many functions• What now?Alex Aiken Lecture 8 CS295 14Motivation• gprof has another weakness– If a run has no performance problem, the profile looks fine– Dynamic analysis of one run can’t find problems that don’t happen in that (or in one) run• What could we learn from multiple executions?Alex Aiken Lecture 8 CS295 15Trend Profiling• We can learn something about asymptotic behavior• Idea– Run the program– Plot execution time vs. input size– Fit a curve to the data• The empirical computational complexityAlex Aiken Lecture 8 CS295 16PictureAlex Aiken Lecture 8 CS295 17Comments• Fits will be approximate– There is noise in the data– Must have a notion of “good fit”• Fit also depends heavily on– Notion of time– Notion of input size• Not obvious how to fit curves– What kinds of curves should we consider?184Time• Using machine time is problematic• Consider two commands:> time foo input> 5 seconds> time foo –UNUSED=0 input> 6 secondsWhat happened?Alex Aiken Lecture 8 CS295 19Use Repeatable Notion of Time• One idea– Count of basic block executions• Keep a vector of counters– One per basic block– Count how many times the basic block executes• Advantages– Independent of low-level variations in time– Repeatable– Instrumentation does not perturb measurements20Notion of Input Size• One idea– Byte count of program input• Disadvantages– Doesn’t account for structure in the input– Example• A routine that scans the input looking for Foo’s• Does something expensive with each Foo• Cost depends much more on number of Foo’s than total size of input• Advantages– Simple, universal– Byte count is often correlated with cost21Garbage In, Garbage Out• We will use– Time: basic block execution counts– Input size: byte size of input• If these measures are not reasonable for an application, get poor fits22What Curves?• It’s not obvious what family of curves to fit• Many programs have complex performance– Different pieces have different time complexity– Even the asymptotic behavior of one component may be hard to describe• Goal– Simple descriptions• Focus on high order term23Power Law Profiling• Transform scale to– log of basic block counts– log of input size• Fit lines to the data in the log-log space• Why?– Performance of nkbecomes k log n– A line when drawn log-log– A power law24log timelog input size5Properties of Power Law Profiling• Low-dimensional– Requires estimating only two parameters: slope and intercept– Higher-dimensional models are prone to over fitting• Minimizes relative error– Tolerates larger errors in larger inputs• Focuses attention on the high order term25Computing Power Law Profiles• Linear fitsy = a + bxminimize i(yi– (a +bxi))2• Power law fitsy = axbminimize i(log yi– (log a +b log xi))226Example: Selection Sort• y = .45n2.02• Mean relative error .4%27Quicksort28An Idea• Since we have counts for each basic block• We can compute a power law for each block• Allows us to


View Full Document

Stanford CS 295 - Performance Debugging

Download Performance Debugging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Performance Debugging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Performance Debugging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?