DOC PREVIEW
UW CSE 303 - Analysis of Fork-Join Parallel Programs

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Where are weWhat else looks like this?ExamplesReductionsEven easier: Data Parallel (Maps)Maps in ForkJoin FrameworkDigression on maps and reducesTreesLinked listsAnalyzing algorithmsWork and SpanThe DAGOur simple examplesMore interesting DAGs?Connecting to performanceDefinitionsDivision of responsibilityWhat that means (mostly good news)ExamplesAmdahl’s Law (mostly bad news)Amdahl’s Law (mostly bad news)Why such bad newsPlots you gotta seeAll is not lostMoore and AmdahlCSE332: Data AbstractionsLecture 19: Analysis of Fork-Join Parallel ProgramsDan GrossmanSpring 2010Where are weDone:•How to use fork, and join to write a parallel algorithm•Why using divide-and-conquer with lots of small tasks is best–Combines results in parallel•Some Java and ForkJoin Framework specifics–More pragmatics in section and posted notesNow:•More examples of simple parallel programs•Arrays & balanced trees support parallelism, linked lists don’t•Asymptotic analysis for fork-join parallelism•Amdahl’s LawSpring 2010 2CSE332: Data AbstractionsWhat else looks like this?•Saw summing an array went from O(n) sequential to O(log n) parallel (assuming a lot of processors and very large n!)–An exponential speed-up in theorySpring 2010 3CSE332: Data Abstractions+ + + + ++ + ++ + +++ ++•Anything that can use results from two halves and merge them in O(1) time has the same property…Examples•Maximum or minimum element•Is there an element satisfying some property (e.g., is there a 17)?•Left-most element satisfying some property (e.g., first 17)–What should the recursive tasks return?–How should we merge the results?•In project 3: corners of a rectangle containing all points•Counts, for example, number of strings that start with a vowel–This is just summing with a different base case–Many problems are!Spring 2010 4CSE332: Data AbstractionsReductions•Computations of this form are called reductions (or reduces?)•They take a set of data items and produce a single result•Note: Recursive results don’t have to be single numbers or strings. They can be arrays or objects with multiple fields.–Example: Histogram of test results–Example on project 3: Kind of like a 2-D histogram•While many can be parallelized due to nice properties like associativity of addition, some things are inherently sequential–How we process arr[i] may depend entirely on the result of processing arr[i-1]Spring 2010 5CSE332: Data AbstractionsEven easier: Data Parallel (Maps)•While reductions are a simple pattern of parallel programming, maps are even simpler–Operate on set of elements to produce a new set of elements (no combining results)–For arrays, this is so trivial some hardware has direct support•Canonical example: Vector additionSpring 2010 6CSE332: Data Abstractionsint[] vector_add(int[] arr1, int[] arr2){ assert (arr1.length == arr2.length); result = new int[arr1.length]; len = arr.length; FORALL(i=0; i < arr.length; i++) { result[i] = arr1[i] + arr2[i]; } return result;}Maps in ForkJoin Framework•Even though there is no result-combining, it still helps with load balancing to create many small tasks–Maybe not for vector-add but for more compute-intensive maps–The forking is O(log n) whereas theoretically other approaches to vector-add is O(1)Spring 2010 7CSE332: Data Abstractionsclass VecAdd extends RecursiveAction { int lo; int hi; int[] res; int[] arr1; int[] arr2; VecAdd(int l,int h,int[] r,int[] a1,int[] a2){ … } protected void compute(){ if(hi – lo < SEQUENTIAL_CUTOFF) {for(int i=lo; i < hi; i++) res[i] = arr1[i] + arr2[i]; } else { int mid = (hi+lo)/2; VecAdd left = new VecAdd(lo,mid,res,arr1,arr2); VecAdd right= new VecAdd(mid,hi,res,arr1,arr2); left.fork(); right.compute(); } }}static final ForkJoinPool fjPool = new ForkJoinPool();int[] add(int[] arr1, int[] arr2){ assert (arr1.length == arr2.length); int[] ans = new int[arr1.length]; fjPool.invoke(new VecAdd(0,arr.length,ans,arr1,arr2); return ans;}Digression on maps and reduces•You may have heard of Google’s “map/reduce”–Or the open-source version Hadoop•Idea: Perform maps and reduces on data using many machines–The system takes care of distributing the data and managing fault tolerance–You just write code to map one element and reduce elements to a combined result•Separates how to do recursive divide-and-conquer from what computation to perform–Old idea in higher-order programming (see 341) transferred to large-scale distributed computing–Complementary approach to declarative queries (see 344)Spring 2010 8CSE332: Data AbstractionsTrees•Our basic patterns so far – maps and reduces – work just fine on balanced trees–Divide-and-conquer each child rather than array subranges–Correct for unbalanced trees, but won’t get much speed-up•Example: minimum element in an unsorted but balanced binary tree in O(log n) time given enough processors•How to do the sequential cut-off?–Store number-of-descendants at each node (easy to maintain)–Or I guess you could approximate it with, e.g., AVL heightSpring 2010 9CSE332: Data AbstractionsLinked lists•Can you parallelize maps or reduces over linked lists?–Example: Increment all elements of a linked list–Example: Sum all elements of a linked listSpring 2010 10CSE332: Data Abstractionsb c d e ffront back•Once again, data structures matter!•For parallelism, balanced trees generally better than lists so that we can get to all the data exponentially faster O(log n) vs. O(n)–Trees have the same flexibility as lists compared to arraysAnalyzing algorithms•Parallel algorithms still need to be:–Correct –Efficient•For our algorithms so far, correctness is “obvious” so we’ll focus on efficiency–Still want asymptotic bounds–Want to analyze the algorithm without regard to a specific number of processors–The key “magic” of the ForkJoin Framework is getting expected run-time performance asymptotically optimal for the available number of processors•Lets us just analyze our algorithms given this “guarantee”Spring 2010 11CSE332: Data AbstractionsWork and SpanLet TP be the running time if there are P processors availableTwo key measures of run-time for a fork-join computation•Work: How long it would take 1 processor = T1–Just “sequentialize” all the recursive forking•Span: How long it would take infinity processors = T–The


View Full Document

UW CSE 303 - Analysis of Fork-Join Parallel Programs

Documents in this Course
Profiling

Profiling

11 pages

Profiling

Profiling

22 pages

Profiling

Profiling

11 pages

Testing

Testing

12 pages

Load more
Download Analysis of Fork-Join Parallel Programs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analysis of Fork-Join Parallel Programs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analysis of Fork-Join Parallel Programs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?