Unformatted text preview:

Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved Resource Allocation Number of processing elements PEs Computing power of each element Amount of physical memory used Data access Communication and Synchronization How the elements cooperate and communicate How data is transmitted between processors Abstractions and primitives for cooperation Performance and Scalability Performance enhancement of parallelism Speedup Scalabilty of performance to larger systems problems EECC756 Shaaban 1 lec 1 Spring 2000 3 7 2000 The Need And Feasibility of Parallel Computing Application demands More computing cycles Scientific computing CFD Biology Chemistry Physics General purpose computing Video Graphics CAD Databases Transaction Processing Gaming Mainstream multithreaded programs are similar to parallel programs Technology Trends Number of transistors on chip growing rapidly Clock rates expected to go up but only slowly Architecture Trends Instruction level parallelism is valuable but limited Coarser level parallelism as in MPs the most viable approach Economics Today s microprocessors have multiprocessor support eliminating the need for designing expensive custom PEs Lower parallel system cost Multiprocessor systems to offer a cost effective replacement of uniprocessor systems in mainstream computing EECC756 Shaaban 2 lec 1 Spring 2000 3 7 2000 Scientific Computing Demand EECC756 Shaaban 3 lec 1 Spring 2000 3 7 2000 Scientific Supercomputing Trends Proving ground and driver for innovative architecture and advanced techniques Market is much smaller relative to commercial segment Dominated by vector machines starting in 70s Meanwhile microprocessors have made huge gains in floating point performance High clock rates Pipelined floating point units Instruction level parallelism Effective use of caches Large scale multiprocessors replace vector supercomputers Well under way already EECC756 Shaaban 4 lec 1 Spring 2000 3 7 2000 Raw Uniprocessor Performance LINPACK 10 000 n s l u CRA Y CRA Y Micro Micro n 1 000 n 100 n 1 000 n 100 n 1 000 n T94 s l LINP ACK MFLOPS C90 s n n n s Xmp 416 s 100 s DEC 8200 l Ymp l Xmp 14se l l l l u u u IBM Power2 990 MIPS R4400 DEC Alpha u HP9000 735 u u DEC Alpha AXP u HP 9000 750 s CRA Y 1s n u IBM RS6000 540 10 l l MIPS M 2000 u MIPS M 120 u Sun 4 260 1 1975 u l 1980 1985 1990 1995 2000 EECC756 Shaaban 5 lec 1 Spring 2000 3 7 2000 Raw Parallel Performance LINPACK 10 000 l MPP peak n CRA Y peak ASCI Red l LINP ACK GFLOPS 1 000 Paragon XP S MP 6768 l Paragon XP S MP 1024 l n T3D CM 5 l 100 T932 32 n Paragon XP S CM 200 l CM 2 l Ymp 832 8 1n l n C90 16 Delta 10 n l l iPSC 860 l nCUBE 2 1024 Xmp 416 4 0 1 1985 1987 1989 1991 1993 1995 1996 EECC756 Shaaban 6 lec 1 Spring 2000 3 7 2000 General Technology Trends Microprocessor performance increases 50 100 per year Transistor count doubles every 3 years DRAM size quadruples every 3 years 180 160 140 DEC 120 alpha 100 IBM 80 RS6000 60 40 20 MIPS MIPS Sun 4 Integer FP HP 9000 750 540 M2000 M 120 260 0 1987 1988 1989 1990 1991 1992 EECC756 Shaaban 7 lec 1 Spring 2000 3 7 2000 Clock Frequency Growth Rate Clock rate MHz 1 000 100 10 uu i8086 u u u u u u u u u R10000 u u u u u uuu uu u uuPentium100 u u u u u uu u u u u u uu uu u u uuuu u u uu u u u u uu ui80386 ui80286 u u 1 i8080 u u u i8008 i4004 0 1 1970 1975 1980 Currently increasing 30 1985 1990 1995 2000 2005 per year EECC756 Shaaban 8 lec 1 Spring 2000 3 7 2000 Transistor Count Growth Rate 100 000 000 u Transistors 10 000 000 u uu u u R10000 u u u uu u Pentium uu uu u u u u u uu u uuu u u u uu u u u u u i80386 u i80286 u u u R3000 u R2000 u u 1 000 000 100 000 u i8086 10 000 u u i8080 u u i8008 i4004 1 000 1970 1975 1980 1985 1990 1995 2000 2005 100 million transistors on chip by early 2000 s A D Transistor count grows much faster than clock rate Currently 40 per year EECC756 Shaaban 9 lec 1 Spring 2000 3 7 2000 System Attributes to Performance Performance benchmarking is program mix dependent Ideal performance requires a perfect machine program match Performance measures Cycles per instruction CPI Total CPU time T C x C f I c x CPI x I c x p m x k x Ic Instruction count CPU cycle time p Instruction decode cycles m Memory cycles k Ratio between memory processor cycles C Total program clock cycles f clock rate MIPS Rate Ic T x 106 f CPI x 106 f x Ic C x 10 6 Throughput Rate Wp f Ic x CPI MIPS x 106 Ic Performance factors I c p m k are influenced by instruction set architecture compiler design CPU implementation and control cache and memory hierarchy EECC756 Shaaban 10 lec 1 Spring 2000 3 7 2000 CPU Performance Trends The microprocessor is currently the most natural building block for multiprocessor systems in terms of cost and performance Performance 100 Supercomputers 10 Mainframes Microprocessors Minicomputers 1 0 1 1965 1970 1975 1980 1985 1990 1995 EECC756 Shaaban 11 lec 1 Spring 2000 3 7 2000 Parallelism in Microprocessor VLSI Generations Bit level parallelism Instruction level Thread level 100 000 000 u 10 000 000 u uu u uuu 1 000 000 u u R10000 u u u u uu u u uuu u u u u u u u uu u uu Pentium Transistors u u u i80386 u u u i80286 u 100 000 u u R2000 u u u R3000 u i8086 10 000 u i8080 u i8008 u u i4004 1 000 1970 1975 1980 1985 1990 1995 2000 2005 EECC756 Shaaban 12 lec 1 Spring 2000 3 7 2000 The Goal of Parallel Computing Goal of applications in using parallel machines Speedup Speedup p processors Performance p processors Performance 1 processor For a fixed problem size input data set performance 1 time Speedup fixed problem p processors Time 1 processor Time p processors EECC756 Shaaban 13 lec 1 Spring 2000 3 7 2000 Elements of Modern Computers Computing Problems Algorithms and Data Structures Mapping Hardware Architecture Programming High level Languages Operating System Binding Compile Load Applications Software Performance Evaluation EECC756 Shaaban 14 lec 1 Spring 2000 3 7 2000 Elements of Modern Computers 1 Computing Problems Numerical Computing Science and technology numerical problems demand intensive integer and floating point computations Logical Reasoning Artificial intelligence AI demand logic inferences and symbolic manipulations and large space searches 2 Algorithms and Data Structures Special algorithms and data structures are needed to specify the computations and communication present in computing problems Most numerical …


View Full Document

RIT EECC 756 - Parallel Computer Architecture

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Parallel Computer Architecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Computer Architecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?