UT Arlington CSE 3302 - Lecture Notes - D2781399

Home> Schools> University of Texas at Arlington> Computer Science & Engineering (CSE) > CSE 3302> Lecture Notes

DOC PREVIEW

UT Arlington CSE 3302 - Lecture Notes

School name University of Texas at Arlington

Course Cse 3302- Programming Languages

Pages 44

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 44 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CSE 3302Lecture 21: ForkJoin and X1018 Nov 2010Nate NystromUTAWhat else looks like this?•Saw summing an array went from O(n) sequential to O(log n) parallel (assuming a lot of processors and very large n!)– An exponential speed-up in theory2+ + ++ ++ + ++ + +++ ++• Anything that can use results from two halves and merge them in O(1) time has the same property…Examples• Maximum or minimum element• Is there an element satisfying some property (e.g., is there a 17)?• Left-most element satisfying some property (e.g., first 17)– What should the recursive tasks return?– How should we merge the results?• Corners of a rectangle containing all points (a “bounding box”)• Counts, for example, number of strings that start with a vowel– This is just summing with a different base case– Many problems are!3Reductions• Computations of this form are called reductions (or reduces?)• They take a set of data items and produce a single result• Note: Recursive results don’t have to be single numbers or strings. They can be arrays or objects with multiple fields.– Example: Histogram of test results• While many can be parallelized due to nice properties like associativity of addition, some things are inherently sequential–How we process arr[i] may depend entirely on the result of processing arr[i-1]4Even easier: Data Parallel (Maps)• While reductions are a simple pattern of parallel programming, maps are even simpler– Operate on set of elements to produce a new set of elements (no combining results)– For arrays, this is so trivial some hardware has direct support• Canonical example: Vector addition5int[] vector_add(int[] arr1, int[] arr2){ assert (arr1.length == arr2.length); result = new int[arr1.length]; len = arr.length; FORALL(i=0; i < arr.length; i++) { result[i] = arr1[i] + arr2[i]; } return result;}Maps in ForkJoin Framework• Even though there is no result-combining, it still helps with load balancing to create many small tasks– Maybe not for vector-add but for more compute-intensive maps– The forking is O(log n) whereas theoretically other approaches to vector-add is O(1)6class VecAdd extends RecursiveAction { int lo; int hi; int[] res; int[] arr1; int[] arr2; VecAdd(int l,int h,int[] r,int[] a1,int[] a2){ … } protected void compute(){ if(hi – lo < SEQUENTIAL_CUTOFF) { for(int i=lo; i < hi; i++) res[i] = arr1[i] + arr2[i]; } else { int mid = (hi+lo)/2; VecAdd left = new VecAdd(lo,mid,res,arr1,arr2); VecAdd right= new VecAdd(mid,hi,res,arr1,arr2); left.fork(); right.compute(); } }}static final ForkJoinPool fjPool = new ForkJoinPool();int[] add(int[] arr1, int[] arr2){ assert (arr1.length == arr2.length); int[] ans = new int[arr1.length]; fjPool.invoke(new VecAdd(0,arr.length,ans,arr1,arr2); return ans;}def add(arr1: Array[Int], arr2: Array[Int]) = { val ans = Array.make[Int](arr1.length); finish foreach (i in ans.region) { ans(i) = arr1(i) + arr2(i); } return ans;}• X10• new OO programming language from IBM• designed for high-performance computing, multicore• Fine-grained concurrency • Distribution• Partitioned global address space• Atomicity operations7Maps in X10X10• http://x10-lang.org• Current version 2.1• Language still under development• Compiles to Java• slow!• Compiles to C++ on top of MPI, LAPI, sockets• fast!X10 library• Features I won’t talk about:• constrained types• generics• arrays and region algebraX10 design (non-)goals• Not trying to hide concurrency• this never works• Not trying to hide the multiprocessor• this never works either• Not trying to pretend concurrency is just a minor extension• this gives lousy concurrencyBasic X10 concurrency• Three constructs:•async S –!spawn a new activity•finish S –!wait until all activities in S terminate•atomic S – do S without interruption•async and finish similar to spawn and sync in CilkFunicular [this name will change!]• X10 concurrency features as a Scala library• http://ranger.uta.edu/~nystrom/funicular• Examples below use Scala syntax• To use:• import funicular._Simple concurrent array sumdef add(arr1: Array[Int], arr2: Array[Int]) = { val ans = Array.make[Int](arr1.length); finish for (i in ans.region) async { ans(i) = arr1(i) + arr2(i); } return ans;}Async•async S –!spawn a new activity• much lighter-weight than creating a thread• both syntactically and performance-wiseFinish•finish S –!wait until all activities spawned in S terminate• Also collects any exceptions thrown by spawned activitiesDeadlock•Using async and finish guarantees no deadlock• A parent activity can wait on spawned children• But, children can’t wait on parent• No wait-cycle => no deadlockParallel loops• X10:var v: Int = 0foreach ((i) in 1..100) { v += i;}•Funicular:var v = 0for (i <- 1 to 100 async) { v += i}Each loop iteration runs in parallel.Loop spawns 100 activities.def add(arr1: Array[Int], arr2: Array[Int]) = { val ans = Array.make[Int](arr1.length); finish foreach (i in ans.region) { ans(i) = arr1(i) + arr2(i); } return ans;}Determinate concurrency• This is what to aim for• Order of execution does not change result•X10 does not guarantee determinacy• but can often get itAtomicity failvar v = 0for (i <- 1 to 100 async) { v += i}Atomicity failvar v = 0for (i <- 1 to 100 async) { v += i}Race condition: concurrent accesses to v.Atomicity goodvar v = 0for (i <- 1 to 100 async) { atomic { v += i }}Atomicity goodvar v = 0for (i <- 1 to 100 async) { atomic { v += i }}Use atomic to guarantee only one activity at a time will execute code.Atomic vs. synchronized• Compare: synchronized (this) { v += i }• vs. atomic { v += i }• No need to specify which lock to acquire.Atomicity• Most operations are not atomic• atomicity is expensive•Use atomic sparinglyAtomic and deadlock•Can deadlock if code in an atomic statement can block• X10 prevents this from occurring• Cannot perform blocking operations within an atomic statement• e.g., I/O•Funicular: no checking that atomic statements do not cause deadlock• A library function, not a language featureAtomicity bettervar v = new AtomicIntegerfor (i <- 1 to 100 async) { v.getAndAdd(i)}• Use built-in atomic types for faster atomic operations•Not as “clean” as atomic, though.Array

View Full Document