UH COSC 6385 - COSC 6385 Multi-Processors (I)

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Multi-Processors (I)Edgar GabrielSpring 2010COSC 6385 – Computer ArchitectureEdgar GabrielClassification of Parallel ArchitecturesFlynn’s Taxonomy• SISD: Single instruction single data– Classical von Neumann architecture• SIMD: Single instruction multiple data• MISD: Multiple instructions single data– Non existent, just listed for completeness• MIMD: Multiple instructions multiple data– Most common and general parallel machine2COSC 6385 – Computer ArchitectureEdgar GabrielSingle Instruction Multiple Data• Also known as Array-processors• A single instruction stream is broadcasted to multiple processors, each having its own data stream– Still used in some graphics cards todayInstructionsstreamprocessor processorprocessorprocessorData DataData DataControl unitCOSC 6385 – Computer ArchitectureEdgar GabrielMultiple Instructions Multiple Data (I)• Each processor has its own instruction stream and input data• Very general case – every other scenario can be mapped to MIMD• Further breakdown of MIMD usually based on the memory organization– Shared memory systems– Distributed memory systems3COSC 6385 – Computer ArchitectureEdgar GabrielShared memory systems (I)• All processes have access to the same address space– E.g. PC with more than one processor• Data exchange between processes by writing/reading shared variables– Shared memory systems are easy to program– Current standard in scientific programming: OpenMP• Two versions of shared memory systems available today– Symmetric multiprocessors (SMP)– Non-uniform memory access (NUMA) architecturesCOSC 6385 – Computer ArchitectureEdgar GabrielSymmetric multi-processors (SMPs)• All processors share the same physical main memory• Memory bandwidth per processor is limiting factor for this type of architecture• Typical size: 2-32 processorsMemoryCPU CPUCPU CPU4COSC 6385 – Computer ArchitectureEdgar GabrielNUMA architectures (I)• Some memory is closer to a certain processor than other memory– The whole memory is still addressable from all processors– Depending on what data item a processor retrieves, the access time might vary stronglyMemoryCPUCPUMemoryCPUCPUMemoryCPUCPUMemoryCPUCPUCOSC 6385 – Computer ArchitectureEdgar GabrielNUMA architectures (II)• Reduces the memory bottleneck compared to SMPs• More difficult to program efficiently– E.g. first touch policy: data item will be located in the memory of the processor which uses a data item first• To reduce effects of non-uniform memory access, caches are often used– ccNUMA: cache-coherent non-uniform memory access architectures• Largest example as of today: SGI Origin with 512 processors5COSC 6385 – Computer ArchitectureEdgar GabrielDistributed memory machines (I)• Each processor has its own address space• Communication between processes by explicit data exchange– Sockets– Message passing– Remote procedure call / remote method invocationCPUCPUMemory MemoryNetwork interconnectCPUMemoryCPUMemoryCPUMemoryCOSC 6385 – Computer ArchitectureEdgar GabrielDistributed memory machines (II)• Performance of a distributed memory machine strongly depends on the quality of the network interconnect and the topology of the network interconnect– Of-the-shelf technology: e.g. fast-Ethernet, gigabit-Ethernet– Specialized interconnects: Myrinet, Infiniband, Quadrics, 10G Ethernet …6COSC 6385 – Computer ArchitectureEdgar GabrielDistributed memory machines (III)• Two classes of distributed memory machines:– Massively parallel processing systems (MPPs)• Tightly coupled environment • Single system image (specialized OS)– Clusters• Of-the-shelf hardware and software components such as– Intel P4, AMD Opteron etc.– Standard operating systems such as LINUX, Windows, BSD UNIXCOSC 6385 – Computer ArchitectureEdgar GabrielMIMD (III)• Important metrics:– Latency: • minimal time to send a very short message from one processor to another• Unit: ms, µs– Bandwidth: • amount of data which can be transferred from one processor to another in a certain time frame• Units: Bytes/sec, KB/s, MB/s, GB/sBits/sec, Kb/s, Mb/s, Gb/s,baud7COSC 6385 – Computer ArchitectureEdgar GabrielHybrid systems• E.g. clusters of multi-processor nodesCPUCPUMemoryNetwork interconnectCPUCPUMemoryCPUCPUMemoryCPUCPUMemoryCPUCPUMemoryCPUCPUMemoryCOSC 6385 – Computer ArchitectureEdgar GabrielShared memory vs. distributed memory machinesMemoryCPU CPUShared memory machines:• Compiler directives (e.g. Open MP)• Threads (e.g. POSIX Threads)Distributed memory machines:• Message Passing (e.g. MPI, PVM)• Distributed Shared Memory (e.g. UPC, CoArrayFortran)• Remote Procedure Calls (RPC/RMI)CPUCPUMemory MemoryNetwork interconnectMessage passing widely used in (scientific) parallel programming• price/performance ratio of ‘message passing hardware’• very general concept8COSC 6385 – Computer ArchitectureEdgar GabrielPerformance Metrics (I)• Speedup: how much faster does a problem run on pprocessors compared to 1 processor?– Optimal: S(p) = p (linear speedup)• Parallel Efficiency: Speedup normalized by the number of processors– Optimal: E(p) = 1.0)()1()(pTTpStotaltotal=ppSpE)()( =COSC 6385 – Computer ArchitectureEdgar GabrielPerformance Metrics (II)• Example: Application A takes 35 min. on a single processor, 27 on two processors and 18 on 4 processors.29.12735)2( ==S645.0229.1)2( ==E94.11835)4( ==S 485.0494.1)4( ==E9COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (I)• Most applications have a (small) sequential fraction, which limits the speedupf: fraction of the code which can only be executed sequentiallypffTpffTpStotaltotal−+=−+=11)1()1()1()(TotalTotalparallelsequentialtotalTffTTTT )1(−+=+=COSC 6385 – Computer ArchitectureEdgar GabrielExample for Amdahl’s Law0.0010.0020.0030.0040.0050.0060.001 2 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64f=0f=0.05f=0.1f=0.210COSC 6385 – Computer ArchitectureEdgar GabrielAmdahl’s Law (II)• Amdahl’s Law assumes, that the problem size is constant• In most applications, the sequential part is independent of the problem size, while the part which can be executed in parallel is not.COSC 6385 – Computer ArchitectureEdgar GabrielPerformance Metrics (III)• Scaleup: ratio of the execution time of a problem of size n on 1 processor to the execution time of the same problem of size


View Full Document

UH COSC 6385 - COSC 6385 Multi-Processors (I)

Download COSC 6385 Multi-Processors (I)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view COSC 6385 Multi-Processors (I) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view COSC 6385 Multi-Processors (I) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?