DOC PREVIEW
RIT EECC 756 - Parallel Computer Architecture

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved Resource Allocation Number of processing elements PEs Computing power of each element Amount of physical memory used Data access Communication and Synchronization How the elements cooperate and communicate How data is transmitted between processors Abstractions and primitives for cooperation Performance and Scalability Performance enhancement of parallelism Speedup Scalabilty of performance to larger systems problems EECC756 Shaaban 1 lec 1 Spring 2002 The Need And Feasibility of Parallel Computing Application demands More computing cycles Scientific computing CFD Biology Chemistry Physics General purpose computing Video Graphics CAD Databases Transaction Processing Gaming Mainstream multithreaded programs are similar to parallel programs Technology Trends Number of transistors on chip growing rapidly Clock rates expected to go up but only slowly Architecture Trends Instruction level parallelism is valuable but limited Coarser level parallelism as in MPs the most viable approach Economics Today s microprocessors have multiprocessor support eliminating the need for designing expensive custom PEs Lower parallel system cost Multiprocessor systems to offer a cost effective replacement of uniprocessor systems in mainstream computing EECC756 Shaaban 2 lec 1 Spring 2002 Scientific Computing Demand EECC756 Shaaban 3 lec 1 Spring 2002 Scientific Supercomputing Trends Proving ground and driver for innovative architecture and advanced techniques Market is much smaller relative to commercial segment Dominated by vector machines starting in 70s Meanwhile microprocessors have made huge gains in floatingpoint performance High clock rates Pipelined floating point units Instruction level parallelism Effective use of caches Large scale multiprocessors replace vector supercomputers Well under way already EECC756 Shaaban 4 lec 1 Spring 2002 Raw Uniprocessor Performance LINPACK 10 000 CRAY CRAY Micro Micro n 1 000 n 100 n 1 000 n 100 1 000 T94 LINPACK MFLOPS C90 100 DEC 8200 Ymp Xmp 416 Xmp 14se IBM Power2 990 MIPS R4400 DEC Alpha HP9000 735 DEC Alpha AXP HP 9000 750 CRAY 1s IBM RS6000 540 10 MIPS M 2000 MIPS M 120 Sun 4 260 1 1975 1980 1985 1990 1995 2000 EECC756 Shaaban 5 lec 1 Spring 2002 Raw Parallel Performance LINPACK 10 000 MPP peak CRAY peak ASCI Red LINPACK GFLOPS 1 000 Paragon XP S MP 6768 Paragon XP S MP 1024 T3D CM 5 100 T932 32 Paragon XP S CM 200 CM 2 10 Ymp 832 8 1 Delta C90 16 iPSC 860 nCUBE 2 1024 Xmp 416 4 0 1 1985 1987 1989 1991 1993 1995 1996 EECC756 Shaaban 6 lec 1 Spring 2002 General Technology Trends Microprocessor performance increases 50 100 per year Transistor count doubles every 3 years DRAM size quadruples every 3 years 180 160 140 DEC 120 alpha 100 IBM 80 RS6000 60 540 40 20 MIPS Sun 4 M 120 260 MIPS Integer FP HP 9000 750 M2000 0 1987 1988 1989 1990 1991 1992 EECC756 Shaaban 7 lec 1 Spring 2002 Clock Frequency Growth Rate 1 000 Clock rate MHz 100 10 i8086 1 R10000 Pentium100 i80386 i80286 i8080 i8008 i4004 0 1 1970 1975 1980 1985 1990 Currently increasing 30 per year 1995 2000 2005 EECC756 Shaaban 8 lec 1 Spring 2002 Transistor Count Growth Rate 100 000 000 Transistors 10 000 000 1 000 000 i80286 100 000 R10000 Pentium i80386 R3000 R2000 i8086 10 000 i8080 i8008 i4004 1 000 1970 1975 1980 1985 1990 1995 2000 2005 100 million transistors on chip by early 2000 s A D Transistor count grows much faster than clock rate Currently 40 per year EECC756 Shaaban 9 lec 1 Spring 2002 System Attributes to Performance Performance benchmarking is program mix dependent Ideal performance requires a perfect machine program match Performance measures Cycles per instruction CPI Total CPU time T C x C f Ic x CPI x Ic x p m x k x Ic Instruction count CPU cycle time p Instruction decode cycles m Memory cycles k Ratio between memory processor cycles C Total program clock cycles f clock rate MIPS Rate Ic T x 106 f CPI x 106 f x Ic C x 106 Throughput Rate Wp f Ic x CPI MIPS x 106 Ic Performance factors Ic p m k are influenced by instruction set architecture compiler design CPU implementation and control cache and memory hierarchy EECC756 Shaaban 10 lec 1 Spring 2002 CPU Performance Trends The microprocessor is currently the most natural building block for multiprocessor systems in terms of cost and performance Performance 100 Supercomputers 10 Mainframes Microprocessors Minicomputers 1 0 1 1965 1970 1975 1980 1985 1990 1995 EECC756 Shaaban 11 lec 1 Spring 2002 Parallelism in Microprocessor VLSI Generations Bit level parallelism 100 000 000 Instruction level Thread level 10 000 000 1 000 000 R10000 Transistors Pentium i80286 100 000 R3000 R2000 i80386 i8086 10 000 i8080 i8008 i4004 1 000 1970 1975 1980 1985 1990 1995 2000 2005 EECC756 Shaaban 12 lec 1 Spring 2002 The Goal of Parallel Computing Goal of applications in using parallel machines Speedup Speedup p processors Performance p processors Performance 1 processor For a fixed problem size input data set performance 1 time Speedup fixed problem p processors Time 1 processor Time p processors EECC756 Shaaban 13 lec 1 Spring 2002 Elements of Modern Computers Computing Problems Algorithms and Data Structures Mapping Hardware Architecture Programming High level Languages Operating System Binding Compile Load Applications Software Performance Evaluation EECC756 Shaaban 14 lec 1 Spring 2002 Elements of Modern Computers 1 Computing Problems Numerical Computing Science and technology numerical problems demand intensive integer and floating point computations Logical Reasoning Artificial intelligence AI demand logic inferences and symbolic manipulations and large space searches 2 Algorithms and Data Structures Special algorithms and data structures are needed to specify the computations and communication present in computing problems Most numerical algorithms are deterministic using regular data structures Symbolic processing may use heuristics or non deterministic searches Parallel algorithm development requires interdisciplinary interaction EECC756 Shaaban 15 lec 1 Spring 2002 Elements of Modern Computers 3 Hardware Resources Processors memory and peripheral devices form the hardware core of a computer system Processor instruction set processor connectivity memory organization influence the system architecture 4 Operating Systems Manages the allocation of resources to running processes Mapping to


View Full Document

RIT EECC 756 - Parallel Computer Architecture

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Parallel Computer Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Computer Architecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?