This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1HOPL 2007 Kathy YelickParallel Languages: Past, Presentand FutureKatherine YelickU.C. Berkeley and Lawrence Berkeley National LabKathy Yelick, 2HOPL 2007 Internal Outline• Two components: control and data(communication/sharing)• One key question: how much to virtualize, i.e., hidemachine?• Tradeoff: hiding improves programmability (productivity), portability,while exposing gives programmers control to improve performance• Important of machine trends• Future partitioned vs. cc shared• Transactions will save us• PGAS: what is it? What about OpenMP?• Looking ahead towards multicore: these are not SMPs. Partitioned vscc shared memory• What works for performance: nothing virtualized *atruntime*, except Charm++• Open problem: load balancing with localityKathy Yelick, 3HOPL 2007 Two Parallel Language Questions• What is the parallel control model?• What is the model for sharing/communication? implied synchronization for message passing, not shared memorydata parallel(singe thread of control)dynamicthreadssingle programmultiple data (SPMD)shared memoryloadstoresendreceivemessage passingKathy Yelick, 4HOPL 2007 vector machines distributed memory machinesDSMKathy Yelick, 5HOPL 2007 1001000100001000001E+061E+071E+081E+091E+101E+111E+121993 1996 1999 2002 2005 2008 2011 2014SUM#1#500DesktopExpon.(Desktop)Petaflop Desktop By 2026 ?1Eflop/s100 Pflop/s10 Pflop/s1 Pflop/s100 Tflop/s10 Tflops/s1 Tflop/s100 Gflop/s10 Gflop/s1 Gflop/s10 MFlop/s1 PFlop system (100K cores?)Slide source Horst Simon, LBNL6-8 years8-10 yearsKathy Yelick, 6HOPL 2007 HPC Programming: Where are We?• BG/L at LLNL has 64K processor cores• There were 68K transistors in the MC68000• A BG/Q system with 1.5M processors may have moreprocessors than there are logic gates per processor• Trend towards simpler cores, but more of them• HPC Applications developers write programs that are ascomplex as describing where every single bit must movebetween the transistors in the MC68000• We need to at least get to “assembly language” levelSlide source: Horst Simon and John Shalf, LBNL/NERSCKathy Yelick, 7HOPL 2007 A Brief History of Languages• When vector machines were king• Parallel “languages” were loop annotations (IVDEP)• Performance was fragile, but there was good user support• When SIMD machines were king• Data parallel languages popular and successful (CMF, *Lisp, C*, …)• Quite powerful: can handle irregular data (sparse mat-vec multiply)• Irregular computation is less clear (multi-physics, adaptive meshes,backtracking search, sparse matrix factorization)• When shared memory multiprocessors (SMPs) were king• Shared memory models, e.g., OpenMP, Posix Threads, are popular• When clusters took over• Message Passing (MPI) became dominantWe are at the mercy of hardware, but we’ll take the blame.Kathy Yelick, 8HOPL 2007 Partitioned Global Address Space Languages•Global address space: any thread may directly read/writedata allocated by another  shared memory semantics•Partitioned: data is designated local/remote messagepassing performance modelpartitioned globaladdress spacex: 1y:l: l: l: g:g:g:x: 5y:x: 7y: 0p0 p1 pn•3 older languages: UPC, CAF, and Titanium• All three use an SPMD execution model• Success: in current NSF PetaApps RFP, procurements, etc• Why: Portable (multiple compilers, including source-to-source); Simplecompiler / runtime; Performance sometimes better than MPI•3 newer HPCS languages: X10, Fortress, and Chapel• All three use a dynamic parallelism model with data parallel constructsChallenge: improvement over past models that are just large enoughKathy Yelick, 9HOPL 2007 Open Problems• Can load balance if we don’t care about locality (Cilk)• Can we mix in locality?• If user places the work explicitly can we move it? They canunknowingly overload resources at the “place” because of anexecution schedule chosen by the runtime• Can generate SPMD from data parallel (ZPL, NESL, HPF)• But those performance results depend on pinning• E.g., compiled a program and run it on P processors, what happens iftask needs to use some of them?• Can multicore support better programming models?• A multicore chip is not an SMP (and certainly not a cluster)• 10-100x higher bandwidth on chip• 10-100x lower latency on chip• Are transactions a panacea?Kathy Yelick, 10HOPL 2007 Predictions• Parallelism will explode• Number of cores will double every 12-24 months• Petaflop (million processor) machines will be commonin HPC by 2015 (all top 500 machines will have this)• Performance will become a software problem• Parallelism and locality are key will be concerns formany programmers – not just an HPC problem• A new programming model will emerge formulticore programming• Can one language cover laptop to top500 space?• Locality will continue to be important• On-chip to off-chip as well as node to


View Full Document

UCLA COMSCI 239 - yelick-hopl07

Download yelick-hopl07
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view yelick-hopl07 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view yelick-hopl07 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?