Unformatted text preview:

Slide 1Work and Span (Recap)SortingQUICKSORTQUICKSORTQUICKSORTQUICKSORTParallelizing QuicksortParallel Quicksort (Basic)PerformanceMeasure Work/Span EmpiricallyAnalyzing QuicksortAnalyzing QuicksortAnalyzing QuicksortThe Master Method (Optional)Master Method — CASE 1Master Method — CASE 2Master Method — CASE 3Master Method SummaryMERGESORTMerging Two Sorted ArraysParallel Merge SortWork of Merge SortSpan of Merge SortParallelism of Merge SortParallel MergeParallel MergeSpan of Parallel MergeWork of Parallel MergeAnalysis of Work RecurrenceAnalysis of Work RecurrenceAnalysis of Work RecurrenceParallelism of P_MergeParallel Merge SortParallel Merge SortParallelism of P_MergeSortBreadth First SearchBreadth First SearchBreadth First SearchBreadth First SearchBreadth First SearchBreadth First SearchBreadth First SearchBreadth First SearchParallel BFSParallel BFSParallel BFSParallel BFSParallel BFS Caveats1CS 240A : Examples with Cilk++Thanks to Charles E. Leiserson for some of these slides•Divide & Conquer Paradigm•Solving recurrences •Sorting: Quicksort and Mergesort•Graph traversal: Breadth-First Search2TP = execution time on P processorsT1 = work T∞ = span**Also called critical-path lengthor computational depth.Speedup on p processors∙T1/Tp Parallelism∙T1/T∞Work and Span (Recap)3Sorting∙Sorting is possibly the most frequently executed operation in computing!∙Quicksort is the fastest sorting algorithm in practice with an average running time of O(N log N), (but O(N2) worst case performance)∙Mergesort has worst case performance of O(N log N) for sorting N elements∙Both based on the recursive divide-and-conquer paradigm4QUICKSORT∙Basic Quicksort sorting an array S works as follows:If the number of elements in S is 0 or 1, then return.Pick any element v in S. Call this pivot.Partition the set S-{v} into two disjoint groups:♦S1 = {x  S-{v} | x  v}♦S2 = {x  S-{v} | x  v}Return quicksort(S1) followed by v followed by quicksort(S2)5QUICKSORT132134563231457814Select Pivot1321345632314578146QUICKSORT132134563231457814Partition around Pivot1314213231455678347QUICKSORT131421323145567834 Quicksort recursively13 14 21 3231 34 45 56 7813 14 21 3231 34 45 56 788Parallelizing Quicksort∙Serial Quicksort sorts an array S as follows:If the number of elements in S is 0 or 1, then return.Pick any element v in S. Call this pivot.Partition the set S-{v} into two disjoint groups:♦S1 = {x  S-{v} | x  v}♦S2 = {x  S-{v} | x  v}Return quicksort(S1) followed by v followed by quicksort(S2)Not necessarily so !Not necessarily so !9template <typename T>void qsort(T begin, T end) { if (begin != end) { T middle = partition( begin, end, bind2nd( less<typename iterator_traits<T>::value_type>(), *begin ) ); cilk_spawn qsort(begin, middle); qsort(max(begin + 1, middle), end); cilk_sync; }}Parallel Quicksort (Basic)•The second recursive call to qsort does not depend on the results of the first recursive call•We have an opportunity to speed up the call by making both calls in parallel.10Performance∙./qsort 500000 -cilk_set_worker_count 1>> 0.083 seconds∙./qsort 500000 -cilk_set_worker_count 16>> 0.014 seconds∙Speedup = T1/T16 = 0.083/0.014 = 5.93∙./qsort 50000000 -cilk_set_worker_count 1>> 10.57 seconds∙./qsort 50000000 -cilk_set_worker_count 16>> 1.58 seconds∙Speedup = T1/T16 = 10.57/1.58 = 6.6711Measure Work/Span Empirically∙cilkscreen -w ./qsort 50000000Work = 21593799861 Span = 1261403043 Burdened span = 1261600249 Parallelism = 17.1189 Burdened parallelism = 17.1162 #Spawn = 50000000 #Atomic instructions = 14∙cilkscreen -w ./qsort 500000 Work = 178835973Span = 14378443Burdened span = 14525767Parallelism = 12.4378 Burdened parallelism = 12.3116#Spawn = 500000#Atomic instructions = 8workspan ws;ws.start();sample_qsort(a, a + n);ws.stop();ws.report(std::cout);12Analyzing Quicksort131421323145567834 Quicksort recursively13 14 21 3231 34 45 56 7813 14 21 3231 34 45 56 78Assume we have a “great” partitioner that always generates two balanced sets13∙Work:T1(n) = 2T1(n/2) + Θ(n)2T1(n/2) = 4T1(n/4) + 2 Θ(n/2)….….n/2 T1(2) = n T1(1) + n/2 Θ(2)T1(n)= Θ(n lg n)∙ Span recurrence: T∞(n) = T∞(n/2) + Θ(n)Solves to T∞(n) = Θ(n) Analyzing Quicksort+•P a r t i t i o n i n g •n o t p a r a l l e l !14Analyzing Quicksort∙Indeed, partitioning (i.e., constructing the array S1 = {x  S-{v} | x  v}) can be accomplished in parallel in time Θ(lg n)∙Which gives a span T∞(n) = Θ(lg2n )∙And parallelism Θ(n/lg n)∙Basic parallel qsort can be found under $cilkpath/examples/qsort Parallelism:T1(n)T∞(n)= Θ(lg n)Not much !Not much !Way better !Way better !15The Master Method (Optional)The Master Method for solving recurrences applies to recurrences of the formT(n) = a T(n/b) + f(n) , where a ≥ 1, b > 1, and f is asymptotically positive.IDEA: Compare nlogba with f(n) .IDEA: Compare nlogba with f(n) .* The unstated base case is T(n) = (1) for sufficiently small n.*16Master Method — CASE 1nlogba ≫ f(n)nlogba ≫ f(n)T(n) = a T(n/b) + f(n)T(n) = a T(n/b) + f(n)Specifically, f(n) = O(nlogba – ε) for some constant ε > 0 .Solution: T(n) = Θ(nlogba) .17Master Method — CASE 2nlogba ≈ f(n)nlogba ≈ f(n)Specifically, f(n) = Θ(nlogbalgkn) for some constant k ≥ 0.Solution: T(n) = Θ(nlogbalgk+1n)) .T(n) = a T(n/b) + f(n)T(n) = a T(n/b) + f(n)Ex(qsort): a =2, b=2, k=0  T1(n)=Θ(n lg n)18Master Method — CASE 3nlogba ≪ f(n)nlogba ≪ f(n)Specifically, f(n) = Ω(nlogba + ε) for some constant ε > 0, and f(n) satisfies the regularity condition that a f(n/b) ≤ c f(n) for some constant c < 1.Solution: T(n) = Θ(f(n)) .T(n) = a T(n/b) + f(n)T(n) = a T(n/b) + f(n)Example: Span of qsortExample: Span of qsort19Master Method SummaryCASE 1: f (n) = O(nlogba – ε), constant ε > 0 T(n) = Θ(nlogba) .CASE 2: f (n) = Θ(nlogba lgkn), constant k  0  T(n) = Θ(nlogba lgk+1n) .CASE 3: f (n) = Ω(nlogba + ε), constant ε > 0, and regularity condition T(n) = Θ(f(n)) .T(n) = a T(n/b) + f(n)T(n) = a T(n/b) + f(n)20MERGESORT∙Mergesort is an example of a recursive sorting algorithm.∙It is based on the divide-and-conquer paradigm∙It uses the merge operation as its fundamental component (which takes in two sorted sequences and produces a single sorted sequence) ∙Simulation of Mergesort∙Drawback


View Full Document

UCSB CS 240A - Work and Span

Download Work and Span
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Work and Span and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Work and Span 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?