Unformatted text preview:

inst eecs berkeley edu cs61c CS61C Machine Structures Lecture 42 Parallel Computing 2 Perform experiments or build systems Has become difficult expensive slow and dangerous for fields on the leading edge Andy Carle Computational Science cs61c ta inst Use ultra high performance computers to simulate the system we re interested in The California legislature is currently working on a bill to ban remote hunting via the internet after the incorporation of a Texas company specializing in a unique combination of robotics web cameras and weapons Years of Counter Strike practice and I can t even get a meal out of it Carle Spring 2005 UCB Example Applications Many of the concepts and some of the content of this lecture were drawn from Prof Jim Demmel s CS 267 lecture slides which can be found at http www cs berkeley edu demmel cs267 Spr05 CS61C L42 Parallel Computing 2 Carle Spring 2005 UCB Terminology Global climate modeling Biology genomics protein folding drug design Astrophysical modeling Computational Chemistry Computational Material Sciences and Nanosciences Flop Floating point operation Flops second standard metric for expressing the computing power of a system Engineering Acknowledgement Performance Requirements Science Traditional Science 1 Produce theories and designs on paper 2005 05 09 CS61C L42 Parallel Computing 1 Scientific Computing Global Climate Modeling Semiconductor design Earthquake and structural modeling Computation fluid dynamics airplane design Combustion engine design Crash simulation Divide the world into a grid e g 10 km spacing Solve fluid dynamics equations to determine what the air has done at that point every minute Business Financial and economic modeling Transaction processing web services and search engines Defense Nuclear weapons test by simulations Cryptography CS61C L42 Parallel Computing 3 Requires about 100 Flops per grid point per minute This is an extremely simplified view of how the atmosphere works to be maximally effective you need to simulate many additional systems on a much finer grid Carle Spring 2005 UCB CS61C L42 Parallel Computing 4 Performance Requirements 2 What Can We Do Computational Requirements Wait To keep up with real time i e simulate one minute per wall clock minute 8 Gflops sec Weather Prediction 7 days in 24 hours 56 Gflops sec Climate Prediction 50 years in 30 days 4 8 Tflops sec Climate Prediction Experimentation 50 years in 12 hours 288 Tflops sec Carle Spring 2005 UCB Moore s law tells us things are getting better why not stall for the moment Parallel Computing Perspective Pentium 4 1 4GHz 1GB RAM 4x100MHz FSB 320 Mflops sec effective Climate Prediction would take 1233 years Reference http www tc cornell edu lifka Papers SC2001 pdf CS61C L42 Parallel Computing 5 Carle Spring 2005 UCB CS61C L42 Parallel Computing 6 Carle Spring 2005 UCB Prohibitive Costs How fast can a serial computer be Consider a 1 Tflop sec sequential machine Rock s Law The cost of building a semiconductor chip fabrication plant that is capable of producing chips in line with Moore s law doubles every four years Data must travel some distance r to get from memory to CPU To get 1 data element per cycle this at the means 1012 times per second speed of light c 3x108 m s Thus r c 1012 0 3 mm So all of the data we want to process must be stored within 0 3 mm of the CPU Now put 1 Tbyte of storage in a 0 3 mm x 0 3 mm area Each word occupies about 3 square Angstroms the size of a very small atom Maybe someday but it most certainly isn t going to involve transistors as we know them Carle Spring 2005 UCB CS61C L42 Parallel Computing 7 What is Parallel Computing CS61C L42 Parallel Computing 8 Carle Spring 2005 UCB Recent History Parallel Computing as a field exploded in popularity in the mid 1990s Dividing a task among multiple processors to arrive at a unified meaningful solution This resulted in an arms race between universities research labs and governments to have the fastest supercomputer in the world For today we will focus on systems with many processors executing identical code How is this different from Multiprogramming which we ve touched on some in this course How is this different from Distributed Computing Carle Spring 2005 UCB CS61C L42 Parallel Computing 9 Current Champions Source top500 org CS61C L42 Parallel Computing 10 Carle Spring 2005 UCB Parallel Programming BlueGene L IBM DOE Rochester United States 32768 Processors 70 72 Tflops sec 0 7 GHz PowerPC 440 Processes and Synchronization Processor Layout Other Challenges Columbia NASA Ames Mountain View United States 10160 Processors 51 87 Tflops sec 1 5 GHz SGI Altix Earth Simulator Earth Simulator Ctr Yokohama Japan 5120 Processors 35 86 Tflops sec SX6 Vector Locality Finding parallelism Parallel Overhead Load Balance Data Source top500 org CS61C L42 Parallel Computing 11 Carle Spring 2005 UCB CS61C L42 Parallel Computing 13 Carle Spring 2005 UCB Processes Processes 2 We need a mechanism to intelligently split the execution of a program We don t know Two potential orderings I am the child I am the parent Fork I am the parent I am the child int main This situation is a simple race condition This type of problem can get far more complicated int pid fork if pid 0 printf I am the child if pid 0 printf I am the parent return 0 What will this print Carle Spring 2005 UCB CS61C L42 Parallel Computing 14 Modern parallel compilers and runtime environments hide the details of actually calling fork and moving the processes to individual processors but the complexity of synchronization remains Carle Spring 2005 UCB CS61C L42 Parallel Computing 15 Synchronization Synchronization 2 How do processors communicate with each other Some of the logistical complexity of these operations is reduced by standard communication frameworks How do processors know when to communicate with each other Message Passing Interface MPI How do processors know which other processor has the information they need When you are done computing which processor or processors have the answer Carle Spring 2005 UCB CS61C L42 Parallel Computing 16 Processor Layout Sorting out the issue of who holds what data can be made easier with the use of explicitly parallel languages Unified Parallel C UPC Titanium Parallel Java Variant Even with these tools much of the skill and challenge of parallel programming is in resolving these problems Processor Layout 2 Generalized View P2 P1 P M M bus P P M Pn P Carle Spring 2005 UCB CS61C L42 Parallel


View Full Document

Berkeley COMPSCI 61C - Parallel Computing

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Parallel Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Computing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?