Parallel and Distributed Processing CSE 8380ContentsWhat is a Model?Why Models?Models in Problem SolvingAn InterfaceModels in this classPRAM ModelThe PRAM model and its variations (cont.)Analysis of AlgorithmsAnalysis of Sequential AlgorithmsAnalysis of parallel algorithmAnalysis of parallel algorithm The NC-class and P-completenessSimulating multiple accesses on an EREW PRAMSimulating multiple accesses on an EREW PRAM (cont.)Parallel AlgorithmsSlide 17Enumeration SortSort-CRCW AssumptionsSort-CRCWAlgorithm DetailsComplexityExample: sort (9, 4, 6)Computer Science and EngineeringParallel and Distributed ProcessingCSE 8380February 3, 2005February 3, 2005Session 7Session 7Computer Science and EngineeringContentsAbstract ModelsPRAM ModelComplexity AnalysisIntroduction to Parallel AlgorithmsSortingComputer Science and EngineeringWhat is a Model?According to Webster’s Dictionary, a model is “a description or analogy used to help visualize something that cannot be directly observed.”According to The Oxford English Dictionary, a model is “a simplified or idealized description or conception of a particular system, situation or process.”Computer Science and EngineeringWhy Models?In general, the purpose of Modeling is to capture the salient characteristics of phenomena with clarity and the right degree of accuracy to facilitate analysis and prediction.Megg, Matheson and Tarjan (1995)Computer Science and EngineeringModels in Problem SolvingComputer Scientists use models to help design problem solving tools such as:Fast AlgorithmsEffective Programming EnvironmentsPowerful Execution EnginesComputer Science and EngineeringA model is an interface separating high level properties from low level onesAn InterfaceApplicationsArchitecturesProvidesoperationsRequires implementationMODELComputer Science and EngineeringModels in this classShared Memory ModelDistributed Memory ModelComputer Science and EngineeringPRAM ModelSynchronized Read Compute Write CycleEREWERCWCREWCRCWComplexity: T(n), P(n), C(n)ControlPrivateMemoryP1PrivateMemoryP2PrivateMemoryPpGlobalMemoryComputer Science and EngineeringThe PRAM model and its variations (cont.)There are different modes for read and write operations in a PRAM.Exclusive read(ER)Exclusive write(EW)Concurrent read(CR)Concurrent write(CW)CommonArbitraryMinimumPriorityBased on the different modes described above, the PRAM can be further divided into the following four subclasses.EREW-PRAM modelCREW-PRAM modelERCW-PRAM modelCRCW-PRAM modelComputer Science and EngineeringAnalysis of AlgorithmsSequential AlgorithmsTime ComplexitySpace ComplexityAn algorithm whose time complexity is bounded by a polynomial is called a polynomial-time algorithm. An algorithm is considered to be efficient if it runs in polynomial time.Computer Science and EngineeringAnalysis of Sequential AlgorithmsNPPNP-completeNP-hardThe relationships among P, NP, NP-complete, NP-hardComputer Science and EngineeringAnalysis of parallel algorithmPerformance of a parallel algorithm is expressed in terms of how fast it is and how much resources it uses when it runs. Run time, which is defined as the time during the execution of the algorithmNumber of processors the algorithm uses to solve a problemThe cost of the parallel algorithm, which is the product of the run time and the number of processorsComputer Science and EngineeringAnalysis of parallel algorithmThe NC-class and P-completenessNPPNP-completeNCP-completeNP-hardThe relationships among P, NP, NP-complete, NP-hard, NC, and P-completeComputer Science and EngineeringSimulating multiple accesses on an EREW PRAMBroadcasting mechanism:P1 reads x and makes it known to P2.P1 and P2 make x known to P3 and P4, respectively, in parallel.P1, P2, P3 and P4 make x known to P5, P6, P7 and P8, respectively, in parallel.These eight processors will make x know to another eight processors, and so on.Computer Science and EngineeringSimulating multiple accesses on an EREW PRAM (cont.)Simulating Concurrent read on EREW PRAM with eight processors using Algorithm Broadcast_EREWxxx P1(a)xxxx P2(b)xxxxx P3(c)xxxxxxxxx P5(d)x P4x P6x P7x P8LLLLComputer Science and EngineeringParallel AlgorithmsConstructsProcessor PiForallWhere Do in ParallelOthersComputer Science and EngineeringSimulating multiple accesses on an EREW PRAM (cont.)Algorithm Broadcast_EREWProcessor P1y (in P1’s private memory) xL[1] yfor i=0 to log p-1 doforall Pj, where 2i +1 < j < 2i+1 do in parallely (in Pj’s private memory) L[j-2i]L[j] yendforendforComputer Science and EngineeringEnumeration SortGiven a list on n numbers a1, a2, …, anWe try to find the position of each element ai in the sorted list by computing the number of elements smaller than itIt ci elements are smaller than ai, then it is the (ci+1)th element in the sorted listIf 2 or more elements have the same value, the element with the largest index in the unsorted list will be considered the largest in the sorted list.Computer Science and EngineeringSort-CRCW AssumptionsTo sort n elements, we use n2 processors (n rows and n columns)Pi,j processor in row i, column jConcurrent write sum of all valuesA[1..n] array of elements in global memoryC[1..n] array to store number of elements smaller than every element in AComputer Science and EngineeringSort-CRCWTwo steps1. Each row of processors i computes C[i], the number of elements smaller than A[i]. Each processor Pi,j compares A[i] and A[j], then updates C[i] appropriately2. The first row in each Pi,1 row places places A[i] in its proper position in the sorted list (C[i] + 1)Computer Science and EngineeringAlgorithm DetailsDetail of two step Algorithm/* step 1 */forall Pi,j, where 1 < i, j<n do in parallelif A[i] > A[j] or (A[i] = A[j] and i > j) thenC[i] 1elseC[i] 0endifendfor/* step 2 */forall Pi,l, where 1 < i<n do in parallelA[C[i] +1] A [i] endforComputer Science and EngineeringComplexity Run time: T(n) = O(1)Number of processors: P(n) = n2Cost: c(n) = O(n2)Is it cost optimal?No! (sequential sort can be done in O(n log n)Computer Science and EngineeringExample: sort (9, 4, 6)P1,1P1,2P1,3649A =9 & 9 9 & 4 9 & 6P2,1P2,2P2,34 & 9 4 & 4 4 & 6P3,1P3,2P3,36 & 9 6 & 4 6 & 6102C =964A =Concurrent write SUMT(n) = O(1)P(n) = n2C(n) = T(n) * P(n) =
View Full Document