Introduction to Parallel Processing Parallel Computer Architecture Definition Broad issues involved The Need And Feasibility of Parallel Computing Parallel Programming Models Flynn s 1972 Classification of Computer Architecture Current Trends In Parallel Architectures Scientific Supercomputing Trends CPU Performance and Technology Trends Parallelism in Microprocessor Generations Computer System Peak FLOP Rating History Near Future The Goal of Parallel Processing Elements of Parallel Computing Factors Affecting Parallel System Performance Parallel Architectures History A Generic Parallel Computer Architecture Modern Parallel Architecture Layered Framework Shared Address Space Parallel Architectures Message Passing Multicomputers Message Passing Programming Tools Data Parallel Systems Dataflow Architectures Systolic Architectures Matrix Multiplication Systolic Array Example PCA Chapter 1 1 1 2 EECC756 Shaaban 1 lec 1 Spring 2008 Parallel Computer Architecture A parallel computer or multiple processor system is a collection of communicating processing elements processors that cooperate to solve large computational problems fast by dividing such problems into parallel tasks exploiting Thread Level Parallelism TLP i e Parallel Processing Broad issues involved The concurrency and communication characteristics of parallel algorithms for a given computational problem represented by dependency graphs Computing Resources and Computation Allocation The number of processing elements PEs computing power of each element and amount organization of physical memory used What portions of the computation and data are allocated or mapped to each PE Data access Communication and Synchronization How the processing elements cooperate and communicate How data is shared transmitted between processors Abstractions and primitives for cooperation communication The characteristics and performance of parallel system network System interconnects Parallel Processing Performance and Scalability Goals Maximize performance enhancement of parallelism Maximize Speedup By minimizing parallelization overheads and balancing workload on processors Scalability of performance to larger systems problems Processor Programmable computing element that runs stored programs written using pre defined instruction set Processing Elements PEs Processors EECC756 Shaaban 2 lec 1 Spring 2008 A Generic Parallel Computer Architecture Parallel Machine Network Custom or industry standard Network A processing node Communication assist CA Mem Operating System Parallel Programming Environments P Processing Nodes Processing Nodes Network Interface AKA Communication Assist custom or industry standard One or more processing elements or processors per node Custom or commercial microprocessors Single or multiple processors per chip Homogenous or heterogonous Each processing node contains one or more processing elements PEs or processor s memory system plus communication assist Network interface and communication controller Parallel machine network System Interconnects Function of a parallel machine network is to efficiently reduce communication cost transfer information data results from source node to destination node as needed to allow cooperation among parallel processing nodes to solve large computational problems divided into a number parallel computational tasks Parallel Computer Multiple Processor System EECC756 Shaaban 3 lec 1 Spring 2008 The Need And Feasibility of Parallel Computing Application demands More computing cycles memory needed Driving Force Scientific Engineering computing CFD Biology Chemistry Physics General purpose computing Video Graphics CAD Databases Transaction Processing Gaming Mainstream multithreaded programs are similar to parallel programs Technology Trends Number of transistors on chip growing rapidly Clock rates expected to continue to go up but only slowly Actual performance returns diminishing due to deeper pipelines Increased transistor density allows integrating multiple processor cores per creating Chip Multiprocessors CMPs even for mainstream computing applications desktop laptop Architecture Trends Instruction level parallelism ILP is valuable superscalar VLIW but limited Increased clock rates require deeper pipelines with longer latencies and higher CPIs Coarser level parallelism at the task or thread level TLP utilized in multiprocessor systems is the most viable approach to further improve performance Main motivation for development of chip multiprocessors CMPs Economics The increased utilization of commodity of the shelf COTS components in high performance parallel computing systems instead of costly custom components used in traditional supercomputers leading to much lower parallel system cost Today s microprocessors offer high performance and have multiprocessor support eliminating the need for designing expensive custom Pes Commercial System Area Networks SANs offer an alternative to custom more costly networks EECC756 Shaaban 4 lec 1 Spring 2008 Why is Parallel Processing Needed Challenging Applications in Applied Science Engineering Astrophysics Atmospheric and Ocean Modeling Such applications have very high Bioinformatics 1 computational and 2 memory Biomolecular simulation Protein folding requirements that cannot be met Computational Chemistry with single processor architectures Computational Fluid Dynamics CFD Many applications contain a large Computational Physics degree of computational parallelism Computer vision and image understanding Data Mining and Data intensive Computing Engineering analysis CAD CAM Global climate modeling and forecasting Material Sciences Military applications Quantum chemistry Driving force for High Performance Computing HPC VLSI design and multiple processor system development EECC756 Shaaban 5 lec 1 Spring 2008 Why is Parallel Processing Needed Scientific Computing Demands Driving force for HPC and multiple processor system development Memory Requirement Computational and memory demands exceed the capabilities of even the fastest current uniprocessor systems 3 5 GFLOPS for uniprocessor EECC756 Shaaban 6 lec 1 Spring 2008 Scientific Supercomputing Trends Proving ground and driver for innovative architecture and advanced high performance computing HPC techniques Market is much smaller relative to commercial desktop server segment Dominated by costly vector machines starting in the 70s through the 80s Microprocessors have made huge gains in the performance needed
View Full Document
Unlocking...