DOC PREVIEW
Penn CIT 594 - Parallelism

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ParallelismThe RAM modelApproaches to parallelismThe PRAM modelThe CTA modelConsequences of CTACosts of parallelismOverheadAmdahl’s lawConsequences of Amdahl’s lawIdle timeDependenciesParallelism in JavaConcurrency in Java, IConcurrency in Java, IIFunctional programmingWhy functional programming?What’s happening now?The EndJan 15, 2019ParallelismCan we make it faster?2The RAM modelThe RAM (Random Access Machine) model of computation assumes:There is a single processing unitThere is an arbitrarily large amount of memoryAccessing any arbitrarily chosen (i.e. random) memory location takes unit timeThis simple model is very useful guide for algorithm designFor maximum efficiency, “tuning” to the particular hardware is requiredThe RAM model breaks down when the assumptions are violatedIf an array is so large that only a portion of it fits in memory (the rest is on disk), very different sorting algorithms should be used3Approaches to parallelismThe basic question is, do the processing units share memory, or do they send messages to one another?A thread consists of a single flow of control, a program counter, a call stack, and a small amount of thread-specific dataThreads share memory, and communicate by reading and writing to that memoryThis is thread-based or shared-memory parallel processingJava “out of the box” is thread-basedA process is a thread that has its own private memoryThreads (sometimes called actors) send messages to one anotherThis is message-passing parallel processing4The PRAM modelAn obvious extension to the RAM model is the Parallel Random Access model, which assumes:There are multiple processing unitsThere is an arbitrarily large amount of memoryAccessing any memory location takes unit timeThe third assumption is “good enough” for many in-memory sequential programs, but not good enough for parallel programsIf the processing units share memory, then complicated and expensive synchronization mechanisms must be usedIf the processing units do not share memory, then each has its own (fast) local memory, and communicates with other processes by sending messages to them (much slower--especially if over a network!)Bottom line: Because there seems to be no way to meet the unit time assumption, the PRAM model is seriously broken!5The CTA modelThe Candidate Type Architecture model makes these assumptions:There are P standard sequential processors, each with its own local memoryOne of the processors may be acting as “controller,” doing things like initialization and synchronizationProcessors can access non-local memory over a communication networkNon-local memory is between 100 times and 10000 times slower to access than local memory (based on common architectures)A processor can make only a very small number (maybe 1 or 2) of simultaneous non-local memory accesses6Consequences of CTAThe CTA model does not specify how many processors are availableThe programmer does not need to plan for some specific number of processorsMore processors may cause the code to execute somewhat more quicklyThe CTA modes does specify a huge discrepancy between local and non-local memory accessThe programmer should minimize the number of non-local memory accesses7Costs of parallelismIt would be great if having N processors meant our programs would run N times as fast, but...There is overhead involved in setting up the parallelism, which we don’t need to pay for a sequential programThere are parts of any program that cannot be parallelizedSome processors will be idle because there is nothing for them to doProcessors have to contend for the same resources, such as memory, and may have to wait for one another8OverheadOverhead is any cost incurred by the parallel algorithm but not by the corresponding sequential algorithm Communication among threads and processes (a single thread has no other threads with which to communicate)Synchronization is when one thread or process has to wait for results or events from another thread or processContention for a shared resource, such as memoryJava’s synchronized is used to wait for a lock to become freeExtra computation to combine the results of the various threads or processesExtra memory may be needed to give each thread or process the memory required to do its job9Amdahl’s lawSome proportion P of a program can be made to run in parallel, while the remaining (1 - P) must remain sequentialIf there are N processors, then the computation can be done in (1 - P) + P/N timeThe maximum speedup is then 1 . (1 - P) + P/N As N goes to infinity, the maximum speedup is 1/(1 - P)For example, if P = 0.75, the maximum speedup is (1/0.25), or four times10Consequences of Amdahl’s lawIf 75% of a process can be parallelized, and there are four processors, then the possible speedup is1 / ((1 - 0.75) + 0.75/4) = 2.286But with 40 processors--ten times as many--the speedup is only1 / ((1 - 0.75) + 0.75/40) = 3.721This has led many people (including Amdahl) to conclude that having lots of processors won’t help very muchHowever....For many problems, as the data set gets larger,The inherently sequential part of the program remains (fairly) constantThus, the sequential proportion P becomes smallerSo: The greater the volume of data, the more speedup we can get11Idle timeIdle time results whenThere is a load imbalance--one process may have much less work to do than anotherA process must wait for access to memory or some other shared resourceData is registers is most quickly accessedData in a cache is next most quickly accessedA level 1 cache is the fastest, but also the smallestA level 2 cache is larger, but slowerMemory--RAM--is much slowerDisk access is very much slower12DependenciesA dependency is when one thread or process requires the result of another thread or processExample: (a + b) * (c + d)The additions can be done in parallelThe multiplication must wait for the results of the additionsOf course, at this level, the hardware itself handles the parallelismThreads or processors that depend on results from other threads or processors must wait for those resultsParallelism in JavaJava uses the shared memory modelThere are various competing Java packages (such as Akka and Kilim) to support message passing, but


View Full Document

Penn CIT 594 - Parallelism

Documents in this Course
Trees

Trees

17 pages

Searching

Searching

24 pages

Pruning

Pruning

11 pages

Arrays

Arrays

17 pages

Stacks

Stacks

30 pages

Recursion

Recursion

25 pages

Hashing

Hashing

24 pages

Recursion

Recursion

24 pages

Graphs

Graphs

25 pages

Storage

Storage

37 pages

Trees

Trees

21 pages

Arrays

Arrays

24 pages

Hashing

Hashing

24 pages

Recursion

Recursion

25 pages

Graphs

Graphs

23 pages

Graphs

Graphs

25 pages

Stacks

Stacks

25 pages

Recursion

Recursion

25 pages

Quicksort

Quicksort

21 pages

Quicksort

Quicksort

21 pages

Graphs

Graphs

25 pages

Recursion

Recursion

25 pages

Searching

Searching

24 pages

Counting

Counting

20 pages

HTML

HTML

18 pages

Recursion

Recursion

24 pages

Pruning

Pruning

11 pages

Graphs

Graphs

25 pages

Load more
Download Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?