Unformatted text preview:

CSE 8383 - Advanced Computer ArchitectureContentsPRAM ModelThe PRAM model and its variations (cont.)Analysis of AlgorithmsAnalysis of Sequential AlgorithmsAnalysis of parallel algorithmAnalysis of parallel algorithm The NC-class and P-completenessSimulating multiple accesses on an EREW PRAMSimulating multiple accesses on an EREW PRAM (cont.)Slide 11MIMD Shared Memory SystemsShared MemoryClassificationUniform Memory Access (UMA)Non Uniform Memory Access (UMA)Cache Only Memory Architecture (COMA)Bus Based & switch based SM SystemsBus-based Shared MemoryUsing CachesGroup ActivitySlide 22Single Processor cachingCache Coherence PoliciesWriting in the cacheCache CoherenceSlide 27Write-invalidateWrite-UpdateSnooping ProtocolsWrite Invalidate Write ThroughWrite Through- Write Invalidate (cont.)Slide 33Example 1Complete the table (Write through write invalidate)Write Back- Write Invalidate (ownership)Write Back- Write Invalidate (ownership) (cont.)Ownership (cont.)Slide 39Example –2 Complete the table (Ownership)Write OnceWrite Once (cont.)Write Once (Cont.)Slide 44Write update and partial write throughWrite update and partial write through (cont.)Slide 47Slide 48Write Update Write BackWrite Update Write Back (cont.)Slide 51Slide 52CSE 8383 - Advanced Computer ArchitectureWeek-8Week of March 1, 2004engr.smu.edu/~rewini/8383ContentsPRAM Model (Cont.)Shared MemoryClassificationSnooping ProtocolsPRAM ModelSynchronized Read Compute Write CycleEREWERCWCREWCRCWComplexity: T(n), P(n), C(n)ControlPrivateMemoryP1PrivateMemoryP2PrivateMemoryPpGlobalMemoryThe PRAM model and its variations (cont.)There are different modes for read and write operations in a PRAM.Exclusive read(ER)Exclusive write(EW)Concurrent read(CR)Concurrent write(CW)CommonArbitraryMinimumPriorityBased on the different modes described above, the PRAM can be further divided into the following four subclasses.EREW-PRAM modelCREW-PRAM modelERCW-PRAM modelCRCW-PRAM modelAnalysis of AlgorithmsSequential AlgorithmsTime ComplexitySpace ComplexityAn algorithm whose time complexity is bounded by a polynomial is called a polynomial-time algorithm. An algorithm is considered to be efficient if it runs in polynomial time.Analysis of Sequential AlgorithmsNPPNP-completeNP-hardThe relationships among P, NP, NP-complete, NP-hardAnalysis of parallel algorithmPerformance of a parallel algorithm is expressed in terms of how fast it is and how much resources it uses when it runs. Run time, which is defined as the time during the execution of the algorithmNumber of processors the algorithm uses to solve a problemThe cost of the parallel algorithm, which is the product of the run time and the number of processorsAnalysis of parallel algorithmThe NC-class and P-completenessNPPNP-completeNCP-completeNP-hardThe relationships among P, NP, NP-complete, NP-hard, NC, and P-complete (if PNP and NC  P)Simulating multiple accesses on an EREW PRAMBroadcasting mechanism:P1 reads x and makes it known to P2.P1 and P2 make x known to P3 and P4, respectively, in parallel.P1, P2, P3 and P4 make x known to P5, P6, P7 and P8, respectively, in parallel.These eight processors will make x know to another eight processors, and so on.Simulating multiple accesses on an EREW PRAM (cont.)Simulating Concurrent read on EREW PRAM with eight processors using Algorithm Broadcast_EREWxxx P1(a)xxxx P2(b)xxxxx P3(c)xxxxxxxxx P5(d)x P4x P6x P7x P8LLLLSimulating multiple accesses on an EREW PRAM (cont.)Algorithm Broadcast_EREWProcessor P1y (in P1’s private memory)  xL[1]  yfor i=0 to log p-1 doforall Pj, where 2i +1 < j < 2i+1 do in parallely (in Pj’s private memory)  L[j-2i]L[j]  yendforendforMIMD Shared Memory SystemsInterconnection NetworksM M M MP P P P PShared Memory Single address spaceCommunication via read & writeSynchronization via locksClassificationMulti-portUniform memory Access (UMA)Non-uniform Memory Access (NUMA)Cache Only Memory Architecture (COMA)MP2P1Uniform Memory Access (UMA)CPCPCPCPMM M MBusNon Uniform Memory Access (UMA)MPMPMPInterconnection NetworkMPCache Only Memory Architecture (COMA)CPDCPDCPDCPDInterconnection NetworkBus Based & switch based SM SystemsGlobal Memory P C P C P C P C P C P C P CM M M MBus-based Shared MemoryCollection of wires and connectorsOnly one transaction at a timeBottleneck!! How can we solve the problem?Global MemoryP P P P PUsing CachesGlobal MemoryP1C1P2C2P3C3PnCn- Cache Coherence problem- How many processors?Group ActivityVariablesNumber of processors (n)Hit rate (h)Bus Bandwidth (B)Processor speed (V)Maximum number of processors n ?Group ActivitySingle Processor cachingPx x MemoryCacheHit: data in the cacheMiss: data is not in the cacheHit rate: hMiss rate: m = (1-h)Cache Coherence PoliciesWriting to Cache in 1 processor caseWrite ThroughWrite BackWriting in the cachePx xBeforeMemoryCachePx’ x’Write throughMemoryCachePx’ xWrite backMemoryCacheCache CoherenceP1xP2 P3 xPn xx-Multiple copies of x-What if P1 updates x?Cache Coherence PoliciesWriting to Cache in n processor caseWrite Update - Write ThroughWrite Invalidate - Write BackWrite Update - Write ThroughWrite Invalidate - Write BackWrite-invalidateP1xP2 P3 xxP1x’P2 P3 Ix’P1x’P2 P3 IxBefore Write ThroughWrite backWrite-UpdateP1xP2 P3 xxP1x’P2 P3 x’x’P1x’P2 P3 x’xBefore Write ThroughWrite backSnooping ProtocolsSnooping protocols are based on watching Snooping protocols are based on watching bus activities and carry out the appropriate bus activities and carry out the appropriate coherency commands when necessary. coherency commands when necessary. Global memory is moved in blocks, and each Global memory is moved in blocks, and each block has a state associated with it, which block has a state associated with it, which determines what happens to the entire determines what happens to the entire contents of the block. The state of a block contents of the block. The state of a block might change as a result of the operations might change as a result of the operations Read-Miss, Read-Hit, Write-Miss, and Read-Miss, Read-Hit, Write-Miss, and Write-Hit. Write-Hit.Write Invalidate Write ThroughMultiple processors can read block Multiple processors can read block copies from main memory safely until copies from main memory safely until one processor updates


View Full Document

SMU CSE 8383 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?