DOC PREVIEW
Berkeley COMPSCI C267 - Future Trends in High Performance Computing

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Berkeley Lab MissionKey MessageOverviewSupercomputing Ecosystem (2005)Slide 7Traditional Sources of Performance Improvement are Flat-Lining (2004)Slide 9Slide 10Roadrunner Breaks the Pflop/s BarrierCray XT5 at ORNL -- 1 Pflop/s in November 2008Cores per SocketPerformance DevelopmentPerformance Development ProjectionConcurrency LevelsMoore’s Law reinterpretedMulticore comes in a wide varietyWhat’s Next?A Likely Trajectory - Collision or Convergence?Trends for the next five years up to 2014Impact on SoftwareA Likely Future Scenario (2014)Why MPI will persistWhat will be the “?” in MPI+?What’s Wrong with MPI Everywhere?Slide 27PGAS LanguagesPerformance Advantage of One-Sided CommunicationAutotuningMultiprocessor Efficiency and Scaling (auto-tuned stencil kernel; Oliker et al. , paper in IPDPS’08)Autotuning for Scalability and Performance PortabilityThe Likely HPC Ecosystem in 2014Slide 37DARPA Exascale StudySlide 39… and the power costs will still be staggeringExtrapolating to Exaflop/s in 2018Processor Technology TrendConsumer Electronics has Replaced PCs as the Dominant Market Force in CPU Design!!Slide 46Green Flash: Ultra-Efficient Climate ModelingDesign for Low Power: More ConcurrencyGreen Flash Strawman System DesignClimate System Design Concept Strawman Design StudySummary on Green FlashSummaryFuture Trends in High Performance Computing 2009 – 2018 Horst SimonLawrence Berkeley National Laboratory and UC BerkeleySeminar at Princeton Univ.April 6, 2009Berkeley Lab Mission•Solve the most pressing and profound scientific problems facing humankind–Basic science for a secure energy future–Understand living systems to improve the environment, health, and energy supply–Understand matter and energy in the universe•Build and safely operate leading scientific facilities for the nation•Train the next generation of scientists and engineersKey MessageComputing is changing more rapidly than ever before, and scientists have the unprecedented opportunity to change computing directionsOverview•Turning point in 2004•Current trends and what to expect until 2014•Long term trends until 2019Supercomputing Ecosystem (2005)Commercial Off The Shelf technology (COTS)“Clusters”12 years of legacy MPI applications baseFrom my presentation at ISC 2005Supercomputing Ecosystem (2005)Commercial Off The Shelf technology (COTS)“Clusters”12 years of legacy MPI applications baseFrom my presentation at ISC 2005Traditional Sources of Performance Improvement are Flat-Lining (2004)•New Constraints–15 years of exponential clock rate growth has ended•Moore’s Law reinterpreted:–How do we use all of those transistors to keep performance increasing at historical rates?–Industry Response: #cores per chip doubles every 18 months instead of clock frequency! Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton SmithSupercomputing Ecosystem (2005)Commercial Off The Shelf technology (COTS)“Clusters”12 years of legacy MPI applications basePCs and desktop systems are no longer the economic driver.2008Architecture and programming model are about to changeOverview•Turning point in 2004•Current trends and what to expect until 2014•Long term trends until 2019Roadrunner Breaks the Pflop/s Barrier•1,026 Tflop/s on LINPACK reported on June 9, 2008•6,948 dual core Opteron + 12,960 cell BE•80 TByte of memory•IBM built, installed at LANLCray XT5 at ORNL -- 1 Pflop/s in November 2008Jaguar Total XT5 XT4Peak Performance 1,645 1,382 263AMD Opteron Cores 181,504 150,17631,328System Memory (TB) 362 300 62Disk Bandwidth (GB/s) 284 240 44Disk Space (TB) 10,750 10,000 750Interconnect Bandwidth (TB/s)532 374 157The systems will be combined after acceptance of the new XT5 upgrade. Each system will be linked to the file system through 4x-DDR InfinibandCores per SocketPerformance Development1.1 PFlop/s1.1 PFlop/s12.64 TFlop/s12.64 TFlop/sPerformance Development ProjectionISC’08, DresdenConcurrency LevelsJack‘s NotebookMoore’s Law reinterpreted•Number of cores per chip will double every two years•Clock speed will not increase (possibly decrease)•Need to deal with systems with millions of concurrent threads•Need to deal with inter-chip parallelism as well as intra-chip parallelismMulticore comes in a wide variety–Multiple parallel general-purpose processors (GPPs)–Multiple application-specific processors (ASPs)“The Processor is the new Transistor” [Rowen]Intel 4004 (1971): 4-bit processor,2312 transistors, ~100 KIPS, 10 micron PMOS, 11 mm2 chip 1000s of processor cores per dieSun Niagara8 GPP cores (32 threads)Intel®XScale™ Core32K IC32K DCMEv210MEv211MEv212MEv215MEv214MEv213Rbuf64 @ 128BTbuf64 @ 128BHash48/64/128Scratch16KBQDRSRAM2QDRSRAM1RDRAM1RDRAM3RDRAM2GASKETPCI(64b)66 MHzIXP280IXP2800016b16b16b16b118811881188118818181818181864b64bSPI4orCSIXStripeE/D Q E/D QQDRSRAM3E/D Q11881188MEv29MEv216MEv22MEv23MEv24MEv27MEv26MEv25MEv21MEv28CSRs -Fast_wr-UART-Timers-GPIO-BootROM/SlowPortQDRSRAM4E/D Q11881188Intel Network Processor1 GPP Core16 ASPs (128 threads)IBM Cell1 GPP (2 threads)8 ASPsPicochip DSP1 GPP core248 ASPsCisco CRS-1188 Tensilica GPPsWhat’s Next?Source: Jack Dongarra, ISC 2008A Likely Trajectory - Collision or Convergence?CPUGPUmulti-threadingmulti-coremany-corefixed functionpartially programmablefully programmablefuture processor by 2012?programmabilityparallelismafter Justin Rattner, Intel, ISC 2008Trends for the next five years up to 2014•After period of rapid architectural change we will likely settle on a future standard processor architecture•A good bet: Intel will continue to be a market leader •Impact of this disruptive change on software and systems architecture not clear yetImpact on Software•We will need to rethink and redesign our software–Similar challenge as the 1990 to 1995 transition to clusters and MPI??A Likely Future Scenario (2014)System: cluster + many core nodeProgramming model: MPI+?after Don Grice, IBM, Roadrunner Presentation, ISC 2008Not Message PassingHybrid & many core technologieswill require new approaches:PGAS, auto tuning, ?Not Message PassingHybrid & many core technologieswill require new approaches:PGAS, auto tuning, ?Why MPI will persist•Obviously MPI will not disappear in five years•By 2014 there will be 20 years of legacy software in MPI•New systems are not sufficiently different to lead to new programming modelWhat will be the “?” in MPI+?•Likely


View Full Document

Berkeley COMPSCI C267 - Future Trends in High Performance Computing

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Future Trends in High Performance Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Future Trends in High Performance Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Future Trends in High Performance Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?