DOC PREVIEW
Introduction to Research

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Introduction to Research 2011OutlineResearch AreasImportance of SupercomputingSome ApplicationsSupercomputing PowerGeographic DistributionAsian Supercomputing TrendsChallenges in SupercomputingArchitectural TrendsAccelerating Applications with GPUsSmall Discrete Fourier Transforms (DFT) on GPUsComparison of DFT PerformancePetascale Quantum Monte CarloLoad BalancingPerformance ComparisonProcess-Node AffinityLoad Balancing with AffinityPotential Research TopicsIntroduction to Research 2011Introduction to Research 2011 Ashok SrinivasanFlorida State Universitywww.cs.fsu.edu/~asrinivaAshok SrinivasanFlorida State Universitywww.cs.fsu.edu/~asrinivaImages from ORNL, IBM, NVIDIAImages from ORNL, IBM, NVIDIAPart of the machine room at ORNLPart of the machine room at ORNLThe Cell processor powers the Roadrunner at LANLThe Cell processor powers the Roadrunner at LANLNVIDIA GPUs power Tianhe-1A in ChinaNVIDIA GPUs power Tianhe-1A in ChinaOutlineOutlineResearch High Performance Computing  Applications and SoftwareMulticore processors Massively parallel processorsComputational nanotechnologySimulation-based policy makingPotential Research TopicsResearch AreasResearch AreasHigh Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical SoftwareCurrent topics: Computational Nanotechnology, HPC on Multicore Processors, Massively Parallel ApplicationsNew Topics: Simulation-based policy analysisOld Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image CompressionImportance of SupercomputingImportance of SupercomputingFundamental scientific understanding Nano-materials, drug designSolution of bigger problemsClimate modelingMore accurate solutionsAutomobile crash testsSolutions with time constraintsDisaster mitigationStudy of complex interactions for policy decisionsUrban planningSome ApplicationsSome ApplicationsIncreasing relevance to industryIn 1993, fewer than 30% of top 500 supercomputers were commercial, now, 57% are commercialA variety of application areasCommercialFinance and insuranceMedicineAerospace and AutomobilesTelecomOil explorationShoes! (Nike)Potato chips!Toys!ScientificWeather predictionEarthquake modelingEpidemic modelingMaterialsEnergyComputational biologyAstro-physicsSupercomputing PowerSupercomputing PowerThe amount of parallelism too is increasing, with the high end having over 200,000 coresThe amount of parallelism too is increasing, with the high end having over 200,000 coresGeographic DistributionGeographic DistributionNorth America has over half the top 500 systemsHowever, Europe and East Asia too have a significant shareChina is determined to be a supercomputing superpowerTwo of its national supercomputing centers have top-five supercomputersJapan has the top machine and two in the top fivePlanning a $ 1.3 billion exascale supercomputer in 2020Asian Supercomputing TrendsAsian Supercomputing TrendsChallenges in SupercomputingChallenges in SupercomputingHardware can be obtained with enough moneyBut obtaining good performance on large systems is difficultSome DOE applications ran at 1% efficiency on 10,000 coresThey will have to deal with a million threads soon, and with a billion at the exa-scaleDon’t think of supercomputing as a means of solving current problems faster, but as a means of solving problems we earlier thought we could not solveDevelopment of software tools to make use of the machines easierArchitectural TrendsArchitectural TrendsMassive parallelism10K processor systems will be commonplaceLarge end already has over 500K processorsSingle chip multiprocessingAll processors will be multicoreHeterogeneous multicore processorsCell used in the PS3GPGPU80-core processor from IntelProcessors with hundreds of cores are already commercially availableDistributed environments, such as the GridBut it is hard to get good performance on these systemsAccelerating Applications with GPUsAccelerating Applications with GPUsOver a hundred cores per GPUHide memory latency with thousands of threadsCan accelerate a traditional computer to a teraflopGPU cluster at FSUQuantum Monte Carlo applicationsAlgorithmsLinear algebra, FFT, compression, etcSmall Discrete Fourier Transforms Small Discrete Fourier Transforms (DFT) on GPUs(DFT) on GPUsGPUs are effective for large DFTs, but not small DFTsHowever, they can be effective for a large number of small DFTsUseful for AFQMC We use the asymptotically slow matrix-multiplication based DFT for very small sizesWe combine it with mixed-radix for larger sizesWe use asynchronous memory transfer to deal with host-device data transfer overheadComparison of DFT PerformanceComparison of DFT PerformanceComparison of 512 simultaneous DFTs without host-device data transfer2-D DFTs3-D DFTsPetascale Quantum Monte CarloPetascale Quantum Monte CarloOriginally a DOE funded project involving collaboration between ORNL, UIUC, Cornell, UTK, CWM, and NCSUNow funded by ORAU/ORNLScale Quantum Monte Carlo applications to petascale (one million gigaflops) machinesLoad balancing, fault tolerance, other optimizationsLoad BalancingLoad BalancingIn current implementations, such as QWalk and QMCPack, cores send excess walkers to cores with fewer walkersIn the new algorithm (alias method), cores may send more than their excess, and receive walkers even if they originally had an excessLoad can be balanced with each core receiving from at most one other coreAlso optimal in maximum number of walkers receivedTotal number of walkers sent may be twice the optimalPerformance ComparisonPerformance ComparisonMean number of walkers migratedMaximum number of receivesComparisons with QWalkProcess-Node AffinityProcess-Node AffinityNode allocation is not necessarily ideal for minimizing communicationProcess-node affinity can, therefore, be importantAllocated nodes for a 12,000 core run on JaguarLoad Balancing with AffinityLoad Balancing with AffinityRenumbering the nodes improves load balancing and AllGather timeBasic load balancing Load balancing after renumberingResults on JaguarPotential Research TopicsPotential Research TopicsHigh Performance Computing on Multicore ProcessorsAlgorithms, Applications, and


Introduction to Research

Download Introduction to Research
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Research and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Research 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?