DOC PREVIEW
UW-Madison ME 964 - The Design of a Task Parallel Library

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Design of a Task Parallel LibraryDaan Leijen, Wolfram Schulte, and Sebastian BurckhardtMicrosoft Research{daan,schulte,sburckha}@microsoft.comAbstractThe Task Parallel Library (TPL) is a library for .NET thatmakes it easy to take advantage of potential parallelism ina program. The library relies heavily on generics and del-egate expressions to provide custom control structures ex-pressing structured parallelism such as map-reduce in userprograms. The library implementation is built around the no-tion of a task as a finite CPU-bound computation. To capturethe ubiquitous apply-to-all pattern the library also introducesthe novel concept of a replicable task. Tasks and replica-ble tasks are assigned to threads using work stealing tech-niques, but unlike traditional implementations based on theTHE protocol, the library uses a novel data structure calleda ‘duplicating queue’. A surprising feature of duplicatingqueues is that they have sequentially inconsistent behavioron architectures with weak memory models, but capture thisnon-determinism in a benign way by sometimes duplicatingelements. TPL ships as part of the Microsoft Parallel Exten-sions for the .NET framework 4.0, and forms the foundationof Parallel LINQ queries (however, note that the productizedTPL library may differ in significant ways from the basicdesign described in this article).Categories and Subject Descriptors D.3.3 [ProgrammingLanguages]: Concurrent Programming StructuresGeneral Terms Languages, DesignKeywords Domain specific languages, parallelism, workstealing1. IntroductionMany existing applications can be naturally decomposed sothat they can be executed in parallel. Take for example thefollowing (na¨ıve) method for multiplying two matrices:Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.OOPSLA 2009, October 25–29, 2009, Orlando, Florida, USA.Copyrightc 2009 ACM 978-1-60558-734-9/09/10. .. $10.00void MatrixMult(int size, double[,] m1,double[,] m2, double[,] result){for(int i = 0; i < size; i++){for(int j = 0; j < size; j++){result[i, j] = 0;for(int k = 0; k < size; k++){result[i, j] += m1[i, k] * m2[k, j];}}}}In this example, the outer iterations are independent of eachother and can potentially be done in parallel. A straight-forward way to parallelize this algorithm would be to usesize threads, where each thread would execute the two innerloops with its corresponding iteration index — but wait: thatwould be prohibitively expensive, since each thread needs astack and other book keeping information. It is better to usea small pool of threads, and assign to them a set of tasks toexecute, where a task is a finite CPU bound computation,like the body of a for loop. So instead of size threads weshould create size tasks, each of them executing the two in-ner loops for its iteration index. The tasks would be executedby n threads, where n is typically the number of processors.Using tasks instead of threads has many benefits – not onlyare they more efficient, they also abstract away from the theunderlying hardware and the OS specific thread scheduler.The Task Parallel Library (TPL) enables programmers toeasily introduce this potential task parallelism in a program.Using TPL we can replace the outer for loop of the matrixmultiplication with a call to the static Parallel.For method:void ParMatrixMult(int size, double[,] m1,double[,] m2, double[,] result){Parallel.For(0, size, delegate(int i){for(int j = 0; j < size; j++){result[i, j] = 0;for(int k = 0; k < size; k++){result[i, j] += m1[i, k] * m2[k, j];}});});}227The Parallel.For construct is a static method of the Parallelclass that takes three arguments: the first two argumentsspecify the iteration domain (between 0 and size), and thelast argument is a delegate expression that is called for eachindex in the domain. The delegate represents the task toexecute. A delegate expression is the C#equivalent of ananonymous function definition or lambda expression. In thiscase, the delegate takes the iteration index as its first argu-ment and executes the unchanged loop body of the origi-nal algorithm. Note that in C#the delegate passed to theParallel.For loop automatically captures m1, m2, result,and size. This automatic capture of free variables makesit particularly easy to experiment with introducing concur-rency in existing programs. In Section 6.1 we show to write aparallel matrix computation by hand without using the TPLand compare its performance.TPL can be viewed as a small embedded domain specificlanguage and its methods behave like custom control struc-tures in the language (Hudak 1996). We argue that the neces-sary ingredients for such library approach in strongly typedlanguages are parametric polymorphism (generics) and first-class anonymous functions (delegates).Moreover, through the abstraction provided by these twoingredients, we can implement all the high level controlstructures in terms of just two primitive concepts – tasks andreplicable tasks. This guarantees that the library abstractionshave consistent semantics and behave regularly.The parallel abstractions offered by the library only ex-press potential parallelism, but they do not guarantee it.For example, on a single-processor machine, Parallel.Forloops are executed sequentially, closely matching the perfor-mance of strictly sequential code. On a dual-core machine,however, the library might use two worker threads to exe-cute the loop in parallel, depending on the workload andconfiguration. Since no concurrency is guaranteed, the li-brary is specifically intended for speeding up CPU boundcomputations, and not for asynchronous programming. TPLalso does not help to correctly synchronize parallel code thatuses shared memory. Other mechanisms, such as locks, areneeded to protect concurrent modifications to shared mem-ory, and it is the programmer’s responsibility to ensure thatthe code can be safely executed in parallel.Under the hood, tasks and replicable tasks are assignedby the runtime to special worker threads. TPL uses standardwork-stealing for this work distribution (Frigo et al. 1998)where tasks are held in a thread local task queue.


View Full Document

UW-Madison ME 964 - The Design of a Task Parallel Library

Documents in this Course
Load more
Download The Design of a Task Parallel Library
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Design of a Task Parallel Library and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Design of a Task Parallel Library 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?