Copyright © 2007 by Robert Sedgewick and Kevin Wayne.1Analysis of AlgorithmsReferences: Algorithms in Java, Chapter 2 Intro to Programming in Java, Section 4.1overviewcase studyformulating hypotheses2Running timeCharles Babbage (1864)As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise - By what course of calculation can these results be arrived at by the machine in the shortest time? - Charles BabbageAnalytic Enginehow many times do you have to turn the crank?3OverviewAnalysis of algorithms: framework for comparing algorithms and predicting performance.Scientific method.!Observe some feature of the universe.!Hypothesize a model that is consistent with observation.!Predict events using the hypothesis.!Verify the predictions by making further observations.!Validate the theory by repeating the previous steps until the hypothesis agrees with the observations.Universe = computer itself. 4overviewcase studyformulating hypotheses5Case study: SortingSorting problem:!Given N items, rearrange them in ascending order.!Applications: commercial databases, statistics, databases, data compression, computational biology, computer graphics, scientific computing, ... HauserHongHsuHayesHaskellHanleyHornet......HanleyHaskellHauserHayesHongHornetHsu......6Insertion sort.!Brute-force sorting solution.!Move left-to-right through array.!Exchange next element with larger elements to its left, one-by-one.Insertion sort7Insertion sort.!Brute-force sorting solution.!Move left-to-right through array.!Exchange next element with larger elements to its left, one-by-one.Insertion sortpublic static void InsertionSort(double[] a){ int N = a.length; for (int i = 0; i < N; i++) for (int j = i; j > 0; j--) if (a[j] < a[j-1]) exch(a, j, j-1); else break;}8Insertion sort: ObservationObserve and tabulate operation counts for various values of N.!concentrate on most frequently performed operation(comparisons for sorting)!Data source: N random numbers between 0 and 1.398 million40,00099 million20,00025 million10,0006 million5,000ComparisonsN1600 million80,0009Data analysis. Plot # comparisons vs. input size on log-log scale.Regression. Fit line through data points ! a Nb.Hypothesis. # comparisons grows quadratically with input size ! N2/4.Insertion sort: Experimental hypothesis0162503250048750650000 125,000 250,000 375,000 500,000 Actual Fittedslope10Insertion sort: Prediction and verificationExperimental hypothesis. # comparisons ! N2/4.Prediction. 400 million comparisons for N = 40,000.Observations.Prediction. 10 billion comparisons for N = 200,000.Observation.9.997 billion200,000ComparisonsN399.7 million40,000401.6 million40,000400.0 million40,000ComparisonsN401.3 million40,000Agrees.Agrees.11Experimental vs. theoretical hypothesesExperimental hypothesis.!Measure running times, plot, and fit curve.!Model useful for predicting.Theoretical hypothesis.!Analyze algorithm to estimate # comparisons as a function of:–number of elements N to sort–average or worst case input!Model useful for predicting and explaining.Difference. Theoretical model is independent of a particular machine or compiler; applies to machines not yet built.12Insertion sort: Theoretical hypothesisWorst case. [descending]!Iteration i requires i comparisons.!Total = 0 + 1 + 2 + … + N-2 + N-1 ! N2/2.Average case. [random]!Iteration i requires i/2 comparisons on average.!Total = 0 + 1/2 + 2/2 + … + (N-1)/2 ! N2/4.E F G H I J D C B AA C D F H J E B I Gii13Insertion sort: Theoretical hypothesisTheoretical hypothesis. Validation. Theory agrees with observations.RandomDescendingAscendingInputAverageWorstBestAnalysis1/4 N21/2 N2NComparisons1/6 N3/2--Stddev037500750001125001500000 125,000 250,000 375,000 500,000AscendingRandomDescending14Insertion sort: ObservationObserve and tabulate actual running time for various values of N.!Data source: N random numbers between 0 and 1.!Machine: Apple G5 1.8GHz with 1.5GB memory running OS X.Goal: use models to predict running time.5.6 seconds400 million40,0001.5 seconds99 million20,0000.43 seconds25 million10,0000.13 seconds6.2 million5,00023 secondsTimeComparisonsN1.6 billion80,000145 seconds10 billion200,00015Data analysis. Plot total running time vs. input size on log-log scale.Regression fit validates hypothesis that total running time is ~cN2A scientific connection between program and natural world.Insertion sort: A last check037.575.0112.5150.00 50,000 100,000 150,000 200,00016Timing in JavaWall clock. Measure time between beginning and end of computation.!Manual: Skagen wristwatch.!Automatic: Stopwatch.java library.Stopwatch.tic();. . .double elapsed = StopWatch.toc();public class Stopwatch { private static long start; public static void tic() { start = System.currentTimeMillis(); } public static double toc() { long stop = System.currentTimeMillis(); return (stop - start) / 1000.0; }}17Measuring running timeFactors that affect running time.!Machine.!Compiler.!Algorithm.!Input data.More factors.!Caching.!Garbage collection.!Just-in-time compilation.!CPU used by other processes.Bottom line. Often difficult to get precise measurements.18SummaryAnalysis of algorithms: framework for comparing algorithms and predicting performance.Scientific method.!Observe some feature of the universe.!Hypothesize a model that is consistent with observation.!Predict events using the hypothesis.!Verify the predictions by making further observations.!Validate the theory by repeating the previous steps until the hypothesis agrees with the observations.Remaining question. How to formulate a hypothesis? 19overviewcase studyformulating hypotheses20Types of hypothesesWorst case running time. Obtain bound on largest possible running time of algorithm on any input of a given size N.!Easy to obtain an initial estimate, harder to refine!Draconian view: real instances may not come close to worst case Average case running time. Obtain bound on running time of algorithm on random input as a function of input size N.!Hard to accurately model real instances by random distributions.!Randomized algorithm: create random distribution.Amortized running time. Worst-case bound on running time of any sequence of N operations.21Estimating the Running TimeTotal running time: sum of cost " frequency for all of the basic ops.!Cost depends on
View Full Document