BROCKPORT CPS 303 - Chapter 1: Introduction to High Performance Computing

Unformatted text preview:

CPS 303 High Performance ComputingWensheng ShenDepartment of Computational ScienceSUNY BrockportChapter 1: Introduction to High Performance Computing van Neumann Architecture CPU and Memory Speed Motivation of Parallel Computing Applications of Parallel Computing1.1. van Neumann Architecture A fixed-program computer A stored-program computer A computer model for more than 40 years CPU executes a stored program The operation is sequential A sequence of read and write operations on the memory van Neumann proposed the use of ROM: read only stored programJohn van Neumann Born on December 28, 1903, died on February 8, 1957. Mastered calculus at the age of 8. Graduate level math at the age of 12 Obtained his Ph.D. at the age of 23 Stored program conceptA Typical Example of van Neumann ArchitectureMemoryControl UnitArithmetic LogicUnitCPUInputDevicesOutputDevicesExternalStorageModern Personal Computers1. Monitor2. Motherboard3. CPU (Microprocessor)4. Primary storage (RAM)5. Expansion cards6. Power supply7. Optical disc drive8. Secondary storage (Hard disk)9. Keyboard10. Mousehttp://en.wikipedia.org/wiki/Personal_computerPeripheral Component Interconnect Graphics cardsSound cardsNetwork cardsModemsCISC and RISC machines CISC: stands for complex instruction set computer. A single bus system. CISC: Each individual instruction can execute several low-level operations, such as a memory load, an arithmetic operation, and a memory store.  RISC: stands for reduced instruction set computer. Two bus system, a data bus and an address bus. They are all SISD machines: Single Instruction Stream on Single Data Stream.1.2 CPU and memory speed Cray 1: 12ns 1975 Cray 2: 6ns 1986 Cray T-90 2ns 1997 Intel PC 1ns 2000 Today’s PC 0.3ns 2006(P4)Moore’s Law Moore’s law (1965): the number of transistors per square inch on integrated circuits had double every two years since the integrated circuit was invented How about the future? (price of computers that have the same computing power falls by half every two years?) In a 2008 article in InfoWorld, Randall C. Kennedy, formerly of Intel introduces this term using successive versions of Microsoft Office between the year 2000 and 2007 as his premise. Despite the gains in computational performance during this time period according to Moore's law, Office 2007 performed the same task at half the speed on a prototypical year 2007 computer as compared to Office 2000 on a year 2000 computer.CPU and memory speed comparison In 20 years, CPU speed (clock rate) has increased by a factor of 1000 DRAM speed has increased only by a factor of smaller than 4 CPU speed: 1-2 ns Cache speed: 10 ns DRAM speed: 50-60 ns How to feed data fast enough to keep CPU busy?Possible Solutions A hierarchy of successive fast memory devices (multilevel caches) Location of data reference Efficient programming can be an issue Parallel systems may provide(1) large aggregate cache(2) high aggregate bandwidth to the memory system1.3 Price and Performance ComparisonPrice for high-end CPU rises sharplyIntel processor price/performance1.4 Computation for special purpose Weather forecasting Information retrieval  Car and aircraft design NASA space discoveryProblem: Insufficient memory  Slow in speedExample: predicting weather of US and Canada for next two days20 million square kilometers = 2.0×107kilometer5,000 kilometers4,000 kilometers20 kilometerΔx=Δy=Δz=0.1 kilometerMesh size111041.0201.040001.05000×=××=nNumber of cells:0.1 kilometerAssuming it takes 100 calculations to determine the weather at a typical grid point, we wants to predict the weather condition at each hour for the next 48 hours, the total number of calculations are:151110248100104 ×≈×××Assuming that our computer can execute one billion (109) calculations per second, it will take 69510210/102 ×=×Seconds, or 23 daysIncrease the CPU speed to one trillion calculations per second? We still need more than half an hour. What happens if we wants to predict the weather for the whole earth, or if we want to use a smaller grid size, Δx=Δy=Δz=0.05 kilometer for better accuracy?The memory requirement If we need 7 variables (u, v, w, p, T, ρ, ω) at each location, the memory cost is,7×4×1011words = 112×1011bytes = 11,200 GbytesData transfer latency among CPU, registers, and memoryPossible solution: to build a processor executing 1 trillion operations per second For (i=0; i<ONE_TRILLION; i++) z[i] = x[i] + y[i];Fetch x[i], y[i]Add z[i], x[i], y[i]Store z[i] At least 3×1012copies of data must be transfer between registers and memory per secondData are transferred with the speed of light 3×108m/s.rWe assume that r is the average distance of a word memory from the CPU, then r must satisfy3×1012r meters = 3×108meters/second × 1 secondr = 10-4metersWe need at least three trillion words of memory to store x, y, and z. Memory words are typically arranged in rectangular grid in hardware. If we use square grid with side length s and connect the CPU to the center of the square, then the average distance from a memory location to the CPU is about s/2=r, so s=2×10-4meters. For a square grid, a typical row of memory words will contain s612103103 ×=×words106410103102−−≈××Therefore, we need to fit a single word of memory into a square with a side length ofmeters, the size of an atomThat is to say, we need to figure out how to represent a 32-bit word with a single atom. The solution of building a computer performing one trillion operations is extremely difficulty. Other solutions? To invite one hundred people for a diner, should we build one big table to seat everyone? More tables How to perform the task of 1015calculation in minutes? More computers. We divide one big problem into many small sized sub problems.1.5 Challenges: Communications. In a dinner, people sitting in different table can work around to talk to each other. How about data in different processors? How do processors communicate?Tasks: Decide on and implement an interconnection network for the processors and memory modules Design and implement system software for the hardware Devise algorithms and data structures for solving our problem Divide the algorithm and data structures up into subproblems Identify the communications that will be needed among the subproblems Assign


View Full Document

BROCKPORT CPS 303 - Chapter 1: Introduction to High Performance Computing

Download Chapter 1: Introduction to High Performance Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 1: Introduction to High Performance Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 1: Introduction to High Performance Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?