Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Divide and ConquerYan GuWhat is Divide and Conquer? An effective approach to designing fast algorithms in sequential computation is the method known as divide and conquer. Stratege ----Divide: a problem to be solved is broken into a number of subproblems of the same form as the original problems;Conquer: the subproblems are then solved independently, usually recursively;Combine: finally, the solutions to the subproblems are combined to provide the answer to the original problem.Classic example Sorting algorithmsmergesortquicksortpurposeDivide and conquer can be sued successfully in parallel computation;Derive efficient parallel algorithms by using divide and conquer algorithm.Parallel Computation modelPRAM Parallel Random Access Machine -------- an abstract machine for designing the algorithms applicable to parallel computersInstructions: ER: Exclusive Read CR: Concurrent Read EW: Exclusive Write CW: Concurrent WriteSearchingProblem: given a file of n records, with the ith record, 1<=i<=n, consisting of: 1. a key field containing an integer Si. 2. several other DATA fields containing information.RAM BINARY SEARCHAlgorithm RAM BINARY SEARCH(S,x,k)Step1: (1.1) i<- 1 (1.2) h<- n (1.3) k<- 0Step2: while I =< h do (2.1) m<- (i+h)/2 (2.2) if x=Sm then (i) k<- m (ii) i<- h+1 else if x<Sm then h<- m-1 else i<- m+1 end if end if end whiletime complexity: O(logn)Search on the PRAM Assume that the file of n records is stored in the shared memory of a PRAM WITH n processors P1, P2,…, PN, where 1 < N =<n suppose now that N < n. the sequence S={S1, S2, …, Sn} is subdivided into N subsequence of length n/N each, and processor Pi is assigned the subsequence {S(i-1)(n/N)+1, …, Si(n/N)} , the algorithm is as follows:1. All processors read x.2. those processors Pi for which S(i-1)(n/N)+1 =< x =< Si(n/N) perform algorithm RAM BINARY SEARCH on their assigned subsequence;3. those processor Pl for which x=S(l-1)(n/N)+j use a MIN CW TO write (l-1)(n/N)+j in k.Time complexity : O (log (n/N) )Parallel Binary SearchIn the parallel binary search, there are N processors, and we can extend binary search to become an (N+1) – way search. The algorithm will consist of several stages. At each stage, the sequence still under consideration is divided into (N+1) subsequence of equal length. The N processors simultaneously test the elements at the boundary between adjacent subsequences for equality with x. (the sequence S is assumed to be sorted from left to right in nondecreasing order).Parallel Binary SearchAlgorithm PRAM SEARCH (S, x, k) Step1 : (1.1) q<- 1 (1.2) r<- n (1.3) k<- 0 (1.4) g<- log(n+1)/log(N+1)Step2: while (q=<r and k=0) do (2.1) j(0) <- q-1 (2.2) for i=1 to N do in parallel (i) j(i) <- (q-1)+i(N+1)g-1 (ii) if j(i)=< r then if x=Sj(i) then k<- j(i) else if x< Sj(i) then ei <- left else ei <- right end if end if else (a) j(i) <- r+1 (b) ei <-left end if (iii) if ei-1 <>ei then (a) q <- j(i-1) +1 (b) r <- j(i)-1 end if (iv) if (i=N and ei<> ei+1) then q <- j(i)+1 end if end for (2.3) g <- g-1 end while Time complexity: O( log N n)comparison RAM searchSequentially executed by single processorTime complexity: O (logn)Searching on the PRAMSequentially executed by several processors of N processorsTime complexity: O (log (n/N))PRAM search( Parallel Binary Search) parallelly executed by N processorsTime complexity: O (log Nn)) Comparison result: O (log Nn)) < O (log (n/N)) < O (log n)Conclusion: PRAM search( Parallel Binary Search) is most efficient algorithm among these three searching algorithms.MergingSuppose that two sequence of number X=(x1, x2,…., xn) and Y={y1,y2…ym} are given, each sorted in nondecreasing order, with n>= m>=1. the problem of merging X and Y calls for creating, from these two sequence, a third sequence of numbers Z={z1, z2,…, zn+m},also sorted into nondecreasing order, such that each element of X and each elements of Y appears exactly once in Z.Sequential algorithm: RAM MERGE (X,Y,Z) Time complexity : O (n)Ranking a sorted sequenceAlgorithm PRAM RANK (X,Y,R)If m< 4Then algorithm PRAM MODIFIED SEARCH compute the ranks of Y in X using N=|Y|1/2 processorselse (1) s<- |Y| ½ (2) for i=1 to s do in parallel (2.1) algorithm PRAM MODIFIED SEARCH compute the rank r(is) of yis in X using N=|Y|1/2 processors (2.2) r(0) <- 0 end for (3) for i=0 to s-1 do in parallel (3.1) Xi <- {Xr(is)+1, Xr(is+2)+2, …, Xr(i+1)s)} (3.2) Yi <- {Yis+1, Yis+2, …, Y(i+1)s-1} (3.3) if r(is)=r((i+1)s) then Ri={0,0,..,0} else PRAM RANK(Xi,Yi,Ri) end if (3.4) for j=1 to s-1 do in parallel r(is+j) <- r(is) +ri(j) end for end ifRunning time of this algorithm: t(n)= t(n1/2)+o(1) =O(loglogn)Cost : C(n)=p(n) x t(n)=O (nloglogn)PRAM MERGEAlgorithm PRAM MERGE(X,Y,Z)Step1: (1.1) PRAM RANK (X,Y,R) (1.2) for i=1 to m do in parallel Zi+r(i) <- yi end forStep 2: (2.1) PRAM RANK (Y,X,R’) (2.2) for i=1 to n do in parallel Zi+r’(i) <- xi end for.Using O(n) processors, steps (1.1)and (2.1) run in O(loglogn) time, while steps(1.2) and (2.2) take constant time. Thus, for algorithm PRAM MERGE P(n) = O(n), t(n) = O(loglogn), c(n) = O(nloglogn),We can use O(n/loglogn) processor and runs in O(loglogn) time, thus , can lead to a cost of O(n), which is an optimal costComputing the convex hull Compute the convex hull of a set of points in the plane.Let Q={q1,q2,…, qn} be a finite sequence representing n points in the plane. The convex hull of Q, denoted CH(Q), is the convex polygon with the smallest area containing all the points of Q. thus, each Qi belong to Q either lies inside CH(Q) or is a corner of CH(Q).PRAM CONVEX HULLAlgorithm PRAM CONVEX HULL (n, Q,CH(Q))Step1: sort the
View Full Document