ParallelismThe RAM modelApproaches to parallelismThe PRAM modelThe CTA modelConsequences of CTACosts of parallelismOverheadAmdahl’s lawConsequences of Amdahl’s lawIdle timeDependenciesParallelism in JavaConcurrency in Java, IConcurrency in Java, IIFunctional programmingWhy functional programming?What’s happening now?The EndJan 15, 2019ParallelismCan we make it faster?2The RAM modelThe RAM (Random Access Machine) model of computation assumes:There is a single processing unitThere is an arbitrarily large amount of memoryAccessing any arbitrarily chosen (i.e. random) memory location takes unit timeThis simple model is very useful guide for algorithm designFor maximum efficiency, “tuning” to the particular hardware is requiredThe RAM model breaks down when the assumptions are violatedIf an array is so large that only a portion of it fits in memory (the rest is on disk), very different sorting algorithms should be used3Approaches to parallelismThe basic question is, do the processing units share memory, or do they send messages to one another?A thread consists of a single flow of control, a program counter, a call stack, and a small amount of thread-specific dataThreads share memory, and communicate by reading and writing to that memoryThis is thread-based or shared-memory parallel processingJava “out of the box” is thread-basedA process is a thread that has its own private memoryThreads (sometimes called actors) send messages to one anotherThis is message-passing parallel processing4The PRAM modelAn obvious extension to the RAM model is the Parallel Random Access model, which assumes:There are multiple processing unitsThere is an arbitrarily large amount of memoryAccessing any memory location takes unit timeThe third assumption is “good enough” for many in-memory sequential programs, but not good enough for parallel programsIf the processing units share memory, then complicated and expensive synchronization mechanisms must be usedIf the processing units do not share memory, then each has its own (fast) local memory, and communicates with other processes by sending messages to them (much slower--especially if over a network!)Bottom line: Because there seems to be no way to meet the unit time assumption, the PRAM model is seriously broken!5The CTA modelThe Candidate Type Architecture model makes these assumptions:There are P standard sequential processors, each with its own local memoryOne of the processors may be acting as “controller,” doing things like initialization and synchronizationProcessors can access non-local memory over a communication networkNon-local memory is between 100 times and 10000 times slower to access than local memory (based on common architectures)A processor can make only a very small number (maybe 1 or 2) of simultaneous non-local memory accesses6Consequences of CTAThe CTA model does not specify how many processors are availableThe programmer does not need to plan for some specific number of processorsMore processors may cause the code to execute somewhat more quicklyThe CTA modes does specify a huge discrepancy between local and non-local memory accessThe programmer should minimize the number of non-local memory accesses7Costs of parallelismIt would be great if having N processors meant our programs would run N times as fast, but...There is overhead involved in setting up the parallelism, which we don’t need to pay for a sequential programThere are parts of any program that cannot be parallelizedSome processors will be idle because there is nothing for them to doProcessors have to contend for the same resources, such as memory, and may have to wait for one another8OverheadOverhead is any cost incurred by the parallel algorithm but not by the corresponding sequential algorithm Communication among threads and processes (a single thread has no other threads with which to communicate)Synchronization is when one thread or process has to wait for results or events from another thread or processContention for a shared resource, such as memoryJava’s synchronized is used to wait for a lock to become freeExtra computation to combine the results of the various threads or processesExtra memory may be needed to give each thread or process the memory required to do its job9Amdahl’s lawSome proportion P of a program can be made to run in parallel, while the remaining (1 - P) must remain sequentialIf there are N processors, then the computation can be done in (1 - P) + P/N timeThe maximum speedup is then 1 . (1 - P) + P/N As N goes to infinity, the maximum speedup is 1/(1 - P)For example, if P = 0.75, the maximum speedup is (1/0.25), or four times10Consequences of Amdahl’s lawIf 75% of a process can be parallelized, and there are four processors, then the possible speedup is1 / ((1 - 0.75) + 0.75/4) = 2.286But with 40 processors--ten times as many--the speedup is only1 / ((1 - 0.75) + 0.75/40) = 3.721This has led many people (including Amdahl) to conclude that having lots of processors won’t help very muchHowever....For many problems, as the data set gets larger,The inherently sequential part of the program remains (fairly) constantThus, the sequential proportion P becomes smallerSo: The greater the volume of data, the more speedup we can get11Idle timeIdle time results whenThere is a load imbalance--one process may have much less work to do than anotherA process must wait for access to memory or some other shared resourceData is registers is most quickly accessedData in a cache is next most quickly accessedA level 1 cache is the fastest, but also the smallestA level 2 cache is larger, but slowerMemory--RAM--is much slowerDisk access is very much slower12DependenciesA dependency is when one thread or process requires the result of another thread or processExample: (a + b) * (c + d)The additions can be done in parallelThe multiplication must wait for the results of the additionsOf course, at this level, the hardware itself handles the parallelismThreads or processors that depend on results from other threads or processors must wait for those resultsParallelism in JavaJava uses the shared memory modelThere are various competing Java packages (such as Akka and Kilim) to support message passing, but
View Full Document