Pitt CS 1510 - LECTURE NOTES - D373984

Home> Schools> University of Pittsburgh> Computer Science (CS) > CS 1510> LECTURE NOTES

DOC PREVIEW

Pitt CS 1510 - LECTURE NOTES

School name University of Pittsburgh

Course Cs 1510- Algorithm Design

Pages 10

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 Parallel and Distributed ComputationParallel computation: Tightly coupled processors that can communicate al-most as quickly as perform a computationDistributed computation: Loosely couple processor for which communicationis much slower than computation2 PRAM ModelA PRAM machine consists of m synchronous processors with shared memory.This model ignores synchronization problems and communication issues, andconcentrates on the task of parallelization of the problem. One gets variousvariations of this model depending on how various processors are permittedto access the same memory location at the same time.• ER= Exclusive Read, only one processor can read a location in any 1step• CR= Concurrent Read, any number of processors can read a locationin a step• EW= Exclusive write, only one processor can write a location in any1 step• CW= Concurrent Write, any number of processors can write a locationin a step. What it 2 processors try to write different values?– Common: All processors must be trying to write the same value– Arbitrary: An arbitrary processor succeeds in the case of a writeconflict– Priority: The lowest number processor succeedsThe “right” model is probably an EREW PRAM, but we will study othermodels as academic exercises. We will sometimes refer to algorithms by thetype of model that these algorithms are designed for, e.g. an EREW PRAMalgorithm.1We define T (n, p) to be parallel time for the algorithm under consider-ation with p processors on input of size n Let S(n) be the sequential timecomplexity. Then the efficiency of a parallel algorithm is defined byE(n, p) =S(n)mT (n, p)Efficiencies can range from 0 to 1. The best possible efficiency you can haveis 1. Generally we prefer algorithm whose efficiency is Ω(1).The folding principle can be stated in two equivalent ways:Time Based Definition: For k ≥ 1 it must be the case that T (n, p) ≤kT (n, kp). That is, increasing the number of processors by a factor ofk reduces the time by at most a factor of k. Or equivalently, reducingthe number of processors by a factor of k increases time by at most afactor of k.Efficiency Based Definition: For k ≥ 1, E(n, p) ≥ T (n, kp). That is,more processors can not increase efficiency, and there is no loss of effi-ciency if you decrease the number of processors.3 The OR ProblemINPUT: bits b1, . . . bnOUTPUT: The logical OR of the bitsOne can obtain an EREW Algorithm with T(n, p) = n/p + log p usinga divide and conquer algorithm that is perhaps best understood as a binarytree. The p leaves of the binary tree are n/p bits. Each internal node is aprocessor that OR’s the output of its children. The efficiency of the EREWAlgorithm isE(n, p) =S(n)pT (n, p)=np(n/p + log p)=np + p log pwhich is Ω(1) if p = O(n/ log n).One can also obtain a CRCW Common Algorithm with T (n, n) = Θ(1).In this algorithm each processor Pisets a variable answer to 0, then if bi= 1,Pisets answer to 1. The efficiency of this algorithm is E(n, n) = Θ(1).24 MIN ProblemSee section 10.2.1, and section 10.2.2.INPUT: Integers x1, . . . , xnOUTPUT: The smallest xiThe results are essentially the same as for the OR problem. There is anEREW divide and conquer algorithm with E(n, n/ log n) = Θ(1). Note thatthis technique works for any associative operator (both OR and MIN areassociative).There is an CRCW Common algorithm with T (n, p = n2) = 1 andE(n, p = n2) = 1/n. Here’s code for processor Pi,j, 1 ≤ i, j ≤ j for theCRCW Common algorithm to compute the location of the minimum num-ber:If x[i] <= x[j] then T[i,j] = 1 else T[i, j]=0;And[i]=1;If T[i, j] = 0 then And[i]=0;If And[i]=1 then Answer = iWhat happens when the above code is run in the various CW models ifthere are two smallest numbers?What happens in the various CW models if there are two smallest num-bers and you just want to compute the value of the smallest number, that isif the last line is changed toIf And[i]=1 then Answer = x[i]5 Parallel Prefix ProblemINPUT: Integers x1, . . . , xnOUTPUT: S1, . . . , SnWhere Si=Pjj=1xj. We give a divide and conquer algorithm. Solve theproblem for the even xi’s and odd xi’s separately ThenS2i= (x1+ x3+ . . . + x2i−1) + (x2+ x4+ . . . + x2i)andS2i−1= (x1+ x3+ . . . + x2i−1) + (x2+ x4+ . . . + x2i−2)3This gives an algorithm with time T(n, n) = log n on EREW PRAM. Thiscan be improved to T (n, n/ log n) = log n, thus giving Θ(1) efficiency.Note that divide and conquer into the first half and last half is moredifficult because of the sum for the first half becomes a bottleneck that all ofthe last half of the processors want to access. Also note that this techniqueworks for any associative operator.6 All Pairs Shortest Path ProblemINPUT: A directed edge weighted graph G, with no negative cycles.OUTPUT: The length of the shortest path D[i, j] between any pair of pointsi and j.First consider the following sequential code:For i = 1 to n doFor j=1 to n doD[i,j]=weight of edge (i, j)Repeat log n timesFor i = 1 to n doFor j=1 to n doFor m=1 to n doD[i,j]=min{D[i,j], D[i,m]+D[m,j]}The correctness of this procedure can be seen using the following loopinvariant: After t times through the repeat loop, for all i and j, if the lengthof the shortest path between vertices i and j that has 2tor less edges is equalto D[i, j].A “parallel for loop” is a loop where all operations can be executed inparallel, for example:For i = 1 to n do C[i]=A[i] + B[i]Question: So which loops can be replaced by parallel for loops? Answer: Thesecond and third. This gives T (n, p = n2) = n log n on a CREW PRAM.The fourth loop could be replaced by a parallel for loop on a machine withconcurrent write machine that always writes the smallest value. But note4that the fourth loop is just computing a minimum, which is an associa-tive operator. Thus using the standard algorithm the compute the value ofan associative operator in time log n with n/ log n processors, we get timeT (n, n3/ log n) = log n on an CREW PRAM.Question: What is the efficiency? Answer: It depends what you use forS(n):• S(n) = T(n, 1). This measures speed-up of the parallel algorithm, butdoesn’t give speed up over standard sequential algorithm.• S(n) = best achievable sequential time. But for almost all problems,the best achievable sequential time is not known.• S(n) = sequential time of “standard” or “best known” sequential algo-rithm. But this has the odd property that the efficiency of a parallelalgorithm can change when a new sequential

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Pitt CS 1510 - LECTURE NOTES

Sign up for free to view:

Please select your school