DOC PREVIEW
UT CS 395T - Lecture notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 395T:Topics in Multicore ProgrammingUniversity of Texas, AustinFall 2009Administration• Instructors: – Keshav Pingali (CS,ICES)• 4.126A ACES• Email: [email protected]•TA: – Aditya Rawal• Email: [email protected]• Course in computer architecture– (e.g.) book by Hennessy and Patterson• Course in compilers– (e.g.) book by Allen and Kennedy• Self-motivation– willingness to learn on your own to fill in gaps in your knowledge Why study parallel programming?• Fundamental ongoing change in computer industry• Until recently: Moore’s law(s)1. Number of transistors on chip double every 1.5 years• Transistors used to build complex, superscalar processors, deep pipelines, etc. to exploit instruction-level parallelism (ILP)2. Processor frequency doubles every 1.5 years• Speed goes up by factor of 10 roughly every 5 yearsÎ Many programs ran faster if you just waited a while.• Fundamental change– Micro-architectural innovations for exploiting ILP are reaching limits– Clock speeds are not increasing any more because of power problemsÎ Programs will not run any faster if you wait.• Let us understand why.Gordon Moore2(1) Micro-architectural approaches to improving processor performance• Add functional units– Superscalar is known territory– Diminishing returns for adding more functional blocks– Alternatives like VLIW have been considered and rejected by the market• Wider data paths– Increasing bandwidth between functional units in a core makes a difference• Such as comprehensive 64-bit design, but then where to?i4004i80286i80386i8080i8086R3000R2000R10000Pentium1,00010,000100,0001,000,00010,000,000100,000,0001970 1975 1980 1985 1990 1995 2000 2005YearTransistors(from Paul Teisch, AMD)(1) Micro-architectural approaches (contd.)• Deeper pipeline– Deeper pipeline buys frequency at expense of increased branch mis-prediction penalty and cache miss penalty– Deeper pipelines => higher clock frequency => more power– Industry converging on middle ground…9 to 11 stages• Successful RISC CPUs are in the same range• More cache– More cache buys performance until working set of program fits in cache– Exploiting caches requires help from programmer/compiler as we will see(2) Processor clock speeds • Old picture: – Processor clock frequency doubled every 1.5 years• New picture: – Power problems limit further increases in clock frequency (see next couple of slides)0.111010010001970 1980 1990 2000Ye arClock Rate (MHz)Increase in clock rate3FrequencyStatic CurrentEmbedded Embedded PartsPartsVery High LeakageVery High Leakageand Powerand PowerFast, High Fast, High PowerPowerFast, Low Fast, Low PowerPower1.0 1.5150Static current rises non-linearly as processors approach max frequency440048008808080858086286386486Pentium®P61101001000100001970 1980 1990 2000 2010YearPower Density (W/cm2)Hot PlateNuclearReactorRocketNozzleSun’sSurfaceSource: Patrick Gelsinger, Intel®Recap• Old picture:– Moore’s law(s):1. Number of transistors doubled every 1.5 years– Use these to implement micro-architectural innovations for ILP2. Processor clock frequency doubled every 1.5 yearsÎ Many programs ran faster if you just waited a while. • New picture:– Moore’s law1. Number of transistors still double every 1.5 years– But micro-architectural innovations for ILP are flat-lining– Processor clock frequencies are not increasing very muchÎ Programs will not run faster if you wait a while.• Questions: – Hardware: What do we do with all those extra transistors?– Software: How do we keep speeding up program execution?One hardware solution: go multicore• Use semi-conductor tech improvements to build multiple cores without increasing clock frequency– does not require micro-architectural breakthroughs– non-linear scaling of power density with frequency will not be a problem• Predictions:– from now on. number of cores will double every 1.5 years(from Saman Amarasinghe, MIT)Design choices• Homogenous multicore processors– large number of identical cores• Heterogenous multicore processors– cores have different functionalities• It is likely that future processors will be heterogenous multicores– migrate important functionality into special-purpose hardware (eg. codecs)– much more power efficient than executing program in general-purpose core– trade-off: programmability5Problem: multicore software• More aggregate performance for: – Multi-tasking – Transactional apps: many instances of same app – Multi-threaded apps (our focus)• Problem– Most apps are not multithreaded– Writing multithreaded code increases software costs dramatically • factor of 3 for Unreal game engine (Tim Sweeney, EPIC games)• The great multicore software quest: Can we write programs so that performance doubles when the number of cores doubles?• Very hard problem for many reasons (see later)– Amdahl’s law– Locality – Overheads of parallel execution– Load balancing– ………“We are the cusp of a transition to multicore, multithreaded architectures, and we still have not demonstrated the ease of programming the move will require… I have talked with a few people at Microsoft Research who say this is also at or near the top of their list [of critical CS research problems].” Justin Rattner, CTO IntelParallel Programming• Community has worked on parallel programming for more than 30 years– programming models– machine models– programming languages–….• However, parallel programming is still a research problem – matrix computations, stencil computations, FFTs etc. are well-understood– few insights for other applications • each new application is a “new phenomenon”• Thesis: we need a science of parallel programming– analysis: framework for thinking about parallelism in application– synthesis: produce an efficient parallel implementation of application“The Alchemist” Cornelius Bega (1663)Analogy: science of electro-magnetismSeemingly unrelated phenomenaUnifying abstractionsSpecialized modelsthat exploit structureCourse objective• Create a science of parallel programming– Structure:• understand the patterns of parallelism and locality in applications– Analysis: • abstractions for reasoning about parallelism and locality in applications• programming models based on these abstractions• tools for quantitative estimates of parallelism and locality–


View Full Document

UT CS 395T - Lecture notes

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?