Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Slide 59Slide 60Slide 61Slide 62Slide 63Slide 64Slide 65Click to edit Master subtitle style 2/26/2009 Parallel SyStem InterconnectIonS and communIcatIonSAbdullah Algarni2/26/2009 Parallel Architectures- SISD- SIMD- MIMD-Shared memory systems-Distributed memory machinesPhysical Organization of Parallel Platforms-Ideal Parallel ComputerInterconnection Networks for Parallel Computers-Static and Dynamic Interconnection Networks-Switches -Network interfacesOutline 2/26/20092/26/2009 Network Topologies-Buses-Crossbars-Multistage Networks-Multistage Omega Network -Completely Connected Network -Linear Arrays -Meshes -Hypercubes -Tree-Based Networks-Fat Trees-Evaluating Interconnection NetworksGrid Computing Outline (con.) 2/26/20092/26/2009 SISD: Single instruction single data– Classical von Neumann architectureSIMD: Single instruction multiple dataMIMD: Multiple instructions multiple data – Most common and general parallel machineClassification of Parallel Architectures 2/26/20092/26/2009 • Also known as Array-processors• A single instruction stream is broadcasted to multiple processors, each having its own data stream– Still used in graphics cards todaySingle Instruction Multiple Data 2/26/20092/26/2009 • Each processor has its own instruction stream and input dataFurther breakdown of MIMD usually based on the memory organization– Shared memory systems– Distributed memory systemsMultiple Instructions Multiple Data 2/26/20092/26/2009 All processes have access to the same address space– E.g. PC with more than one processorData exchange between processes by writing/reading shared variablesAdvantage: Shared memory systems are easy to program– Current standard in scientific programming: OpenMPShared memory systems 2/26/20092/26/2009 • Two versions of shared memory systems available today:– Symmetric multiprocessors (SMP)– Non-uniform memory access (NUMA)Shared memory systems 2/26/20092/26/2009 • All processors share the same physical main memory• Disadvantage: Memory bandwidth per processor is limited• Typical size: 2-32 processorsSymmetric multi-processors (SMPs) 2/26/20092/26/2009 • More than one memory but some memory is closer to a certain processor than other memory◦ The whole memory is still addressable from all processorsNUMA architectures (1)(Non-uniform memory access) 2/26/20092/26/2009 • Advantage: It Reduces the memory limitation compared to SMPs• Disadvantage: More difficult to program efficiently• To reduce effects of non-uniform memory access, caches are often used• Largest example of this type:SGI Origin with10240 processors NUMA architectures (cont.) 2/26/2009Columbia Supercomputer2/26/2009 Each processor has its own address space Communication between processes by explicit data exchangeSome protocols are used: – Sockets – Message passing – Remote procedure call / remote method invocationDistributed memory machines 2/26/20092/26/2009 • Performance of a distributed memory machine strongly depends on the quality of the network interconnect and the topology of the network interconnect Two classes of distributed memory machines:1) Massively parallel processing systems (MPPs)2) ClustersDistributed memory machines(Con.) 2/26/20092/26/2009 Physical Organization of Parallel Platforms 2/26/20092/26/2009 A natural extension of the Random Access Machine (RAM) serial architecture is the Parallel Random Access Machine, or PRAM. PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all processors. Processors share a common clock but may execute different instructions in each cycle. Ideal Parallel Computer 2/26/20092/26/2009 Depending on how simultaneous memory accesses are handled, PRAMs can be divided into four subclasses. ◦Exclusive-read, exclusive-write (EREW) PRAM. ◦Concurrent-read, exclusive-write (CREW) PRAM. ◦Exclusive-read, concurrent-write (ERCW) PRAM. ◦Concurrent-read, concurrent-write (CRCW) PRAM. Ideal Parallel Computer 2/26/20092/26/2009 What does concurrent write mean, anyway? ◦Common: write only if all values are identical. ◦Arbitrary: write the data from a randomly selected processor. ◦Priority: follow a pre-determined priority order. ◦Sum: Write the sum of all data items. Ideal Parallel Computer 2/26/20092/26/2009 Processors and memories are connected via switches.Since these switches must operate in O(1) time at the level of words, for a system of p processors and m words, the switch complexity is O(mp).Physical Complexity of an Ideal Parallel Computer 2/26/20092/26/2009 Imagine how long it takes to complete Brain Simulation?The human brain contains 100,000,000,000 neurons each neuron receives input from 1000 othersTo compute a change of brain “state”, one requires 1014 calculationsIf each could be done in 1s s, it would take ~3 years to complete one calculation.Brain simulation 2/26/20092/26/2009 Imagine how long it takes to complete Brain Simulation?The human brain contains 100,000,000,000 neurons, each neuron receives input from 1000 othersTo compute a change of brain “state”, one requires 1014 calculationsIf each could be done in 1s s, it would take ~3 years to complete one calculation.Clearly, O(mp) for big values of p and m, a true PRAM is not realizable.Brain simulation 2/26/20092/26/2009 Important metrics:– Latency:• minimal time to send a message from one processor to another• Unit: ms, μs– Bandwidth:• amount of data which can be transferred from one processor to another in a certain time frame• Units: Bytes/sec, KB/s, MB/s, GB/s, Bits/sec, Kb/s, Mb/s, Gb/sInterconnection Networks for Parallel Computers 2/26/20092/26/2009 Important terms 2/26/20092/26/2009 Static and Dynamic Interconnection Networks Classification of interconnection networks: (a) a
View Full Document