Parallel Processing III 15-319, spring 2010 10th Lecture, Feb 11th ReviewReviewReviewReviewHigh BW & Speed NetworksGigabit EthernetMyrinetInfinibandLecture OutlineHow to ParallelizeAutomatic VS Manual Parallelism(1) Can the Problem be Parallelized?(2) Hotspots & Bottlenecks(3) Partitioning/Decomposition(3) Partitioning/Decomposition(3) Partitioning/Decomposition(4) Communication (4) Communication(4) Communication: Considerations(4) Communication: Considerations(4) Communication: Considerations(4) Communication: Considerations(4) Communication: Considerations(4) Communication: Considerations(4) Communication: Considerations(5) Synchronization (1/4)(5) Synchronization (2/4)(5) Synchronization (3/4)(5) Synchronization (4/4)(6) Data Dependencies(6) Data Dependencies(7) Load Balancing(7) Load Balancing(8) GranularityLecture OutlineLimits and costs of Parallel Computing (1/5)Limits and costs of Parallel Computing (2/5)Limits and costs of Parallel Computing (3/5)Limits and costs of Parallel Computing (4/5)Limits and costs of Parallel Computing (5/5)Lecture OutlineParallel Computing Performance AnalysisAmdahl’s LawAmdahl’s Law: ExampleAmdahl’s Law: ExampleUsing Amdahl’s Law in Analyzing Performance of Parallel Computing Using Amdahl’s Law in Analyzing Performance of Parallel ComputingUsing Amdahl’s in Parallel Computing: ExampleUsing Amdahl’s in Parallel Computing: ExampleUsing Amdahl’s in Parallel Computing: ExampleUsing Amdahl’s in Parallel Computing: ExampleLecture OutlineParallelization ExamplesParallelization ExamplesExample: Array ProcessingExample: Array ProcessingParallelization Examples: Simple Heat Equation Parallelization Examples: Simple Heat EquationParallelization Examples: Simple Heat EquationParallelization Examples: Simple Heat EquationReferencesCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingIntroduction to Cloud ComputingMajd F. SakrParallel Processing III15-319, spring 201010thLecture, Feb 11thCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingReview Architectures Interconnecthttp://www.phys.uu.nl/~steen/web03/sm-mimd.htmlCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingReview Shared Memory MIMD http://www.phys.uu.nl/~steen/web03/sm-mimd.htmlCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingReview DistributedMemory MIMDhttp://www.phys.uu.nl/~steen/web03/dm-mimd.htmlCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingReview Hybrids Cache-coherent NUMAhttp://www.phys.uu.nl/~steen/web03/ccNUMA.htmlCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingHigh BW & Speed Networks Server and cluster backbones typically need fast interconnects Gigabit Ethernet 10 Gigabit 100 Gigabit Myrinet Infiniband© Barcelona Supercomputing CenterCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingGigabit Ethernet Known as “IEEE Standard 802.3z” Offers 1 Gbps raw bandwidth Speed: (10 x speed of fast Ethernet)(100 x speed of regular Ethernet) 1 Gig Ethernet uses UTP cables 10 Gig Ethernet and 100 Gig Ethernet are emerging technologies, typically require fiber optical cables commons.wikimedia.org/wiki/File:UTP_cable.jpghttp://www.directindustry.com/prod/lapp-group/fiber-optic-cable-17287-404578.htmlUTPFiber optical cablesCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingMyrinet High-speed Local Area Network Interconnect Typically requires two fiber optic cables per node (upstream and downstream) Offers low-latency networking with low protocol overhead @ 1.9 Gbps (messages in usec range) Next Generation (Myri-10G) is 10 Gbps.Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingInfiniband High-bandwidth interconnect primarily for processors to high performance I/O devices InfiniBand offers point-to-point bidirectional serial links which forms a switched fabric Upto 120 Gbps theoretical bandwidth (message in usec range)Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingLecture Outline Parallel Computing Design Considerations Limits and Costs of Parallel Computing Parallel Computing Performance Analysis Examples of Problems Solved By Parallel ComputingCarnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingHow to Parallelize Automatic vs. Manual Parallelization Design Considerations1. Can the Problem be parallelized?2. Program’s hotspots & bottlenecks?3. Partitioning 4. Communications5. Synchronization6. Data Dependencies 7. Load Balancing8. Granularity9. Input/Output11Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud ComputingAutomatic VS Manual Parallelism Mostly, developing parallel programs has been manual. This is complex, time consuming, and error-prone process. Parallelizing compiler or pre-processor is used to parallelize serial code. This complier usually works in two different ways: Fully Automatic: The compiler analyzes the source code and specifies parts that could be parallelized. Programmer Directed: The programmer uses compiler flags to explicitly tell the compiler how to parallelize the code.Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud Computing(1) Can the Problem be Parallelized? Parallelism Inhibitors: Control vs. data dependencies Examples: Parallelizable Problem: Multiply each element of the array by 2 Non-parallelizable Problem: Fibonacci sequence Handling Data Dependencies Parallelism Slow-down: Communications bottleneck 13Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud Computing(2) Hotspots & Bottlenecks Hotspots: What are they? Account for most of CPU usage How to define them in the program? Profiling & Performance Analysis Parallelism focus should be on these spots Bottlenecks: What are they? slow areas Can we redesign the algorithm to reduce /eliminate bottlenecks?14Source: http://scavenging.wordpress.com/2009/05/Source: http://zubinmehta.files.wordpress.com/Carnegie MellonSpring 2010 ©15-319 Introduction to Cloud Computing(3) Partitioning/Decomposition Dividing the problem into chunks/parts of work that can be distributed to multiple tasks. Best Partitioning happens where there is minimum I/O & communication Ways to Partition? Domain Decomposition Functional Decomposition15http://tinypic.com/view.php?pic=a29ah0&s=3Carnegie MellonSpring 2010 ©15-319 Introduction
View Full Document