UW-Madison ME 964 - HPC vs. HTC CUDA Programming Model - D460484

Home> Schools> University of Wisconsin, Madison> Mechanical Engineering (ME) > ME 964> HPC vs. HTC CUDA Programming Model

UW-Madison ME 964 - HPC vs. HTC CUDA Programming Model

School name University of Wisconsin, Madison

Course Me 964- High Performance Computing for Engineering Applications

Pages 49

Download Save

Unformatted text preview:

Slide 1Before We Get Started…Slide 3Slide 4Slide 5Slide 6Slide 7AcknowledgementsWhy GPU computing in ME964?Layout of Typical Hardware ArchitectureBandwidth in a CPU-GPU SystemSlide 12Latency, DRAM Memory AccessParallel Computing on a GPUCPU vs. GPU – Flop Rate (GFlops)What is Driving this Evolution?ALU – Arithmetic Logic Unit [one-slide detour]GPU Computing – The Basic IdeaWhat is GPGPU ?Shaders [one slide detour]GPGPU ConstraintsCUDACUDA Architecture The Software StackThe 30,000 Feet PerspectiveRunning Code on Parallel Computers [one slide detour]The CUDA Way: Extended CCUDA Programming Model: A Highly Multithreaded CoprocessorGPU: Underlying HardwareNVIDIA TESLA C1060 [Newton’s GPU cards]GPU Processor TerminologyCompute Capability [of a Device] vs. CUDA VersionCompatibility IssuesSlide 33Before We Get Started…Environment SetupConfiguring the ProjectWriting CodeHelpful HintsHelpful Hints (cont’d)Slide 40Some Quick Notes…Getting StartedRemote Desktop to NewtonStart HPC Job ManagerCreating a JobCreating a Job - NotesFinishing UpWhy Did It Fail?Other NotesME964High Performance Computing for Engineering Applications"I hear there's rumors on the Internets that we're going to have a draft.“ G. W. Bush© Dan Negrut, 2011ME964 UW-MadisonHPC vs. HTCCUDA Programming ModelAccessing Newton & Building CUDA appsFebruary 01, 2011Before We Get Started…Last timeWrap up overview of parallel computingAmdhal’s Law of diminishing returnFlynn’s taxonomy of computer architecturesTypes of parallel computing architecturesTodayHPC vs. HTCStart discussion about GPU programming and CUDABuilding CUDA apps in Visual Studio 2008. Running apps through the HPC scheduler on NewtonAssignment 2 postedProblem 1: making sure you can run a CUDA jobProblem 2: simple problem -> run basic job on the GPU (one block with four parallel threads)Use the forum to post questions/answers/commentsDue: Feb 8, 11:59 pm, email to [email protected] 2High Performance Computing (HPC)vs. High Throughput Computing (HTC)High Performance ComputingTopic of interest in this classThe idea: run one executable as fast as you canMight spend one month running one DFT job or a week on a CFD job…High Throughput Computing The idea: run as many applications as you can, possibly at the same time on different machinesExample: bone analysis in ABAQUSYou have uncertainty in the length of the bone (20 possible lengths) in the material of the bone (10 values for Young’s modulus) in the loading of the bone (50 force values with different magnitude/direction) . Grand total: 10,000 ABAQUS runsWe have 1400 workstations hooked up together on-campus -> use Condor to schedule the 10,000 independent ABAQUS jobs and have them run on scattered machines overnightExample: folding@home – volunteer your machine to run a MD simulation when it’s idle3High Performance Computing (HPC)vs. High Throughput Computing (HTC)High Performance ComputingUsually one cluster (e.g. Newton) or one massively parallel architecture (e.g. IBM Blue Gene or Cray) that is dedicated to running one large application that requires a lot of memory, a lot of compute power, and a lot of communicationExample: each particle in a MD simulation requires (due to long range electrostatic interaction) to keep track of a large number of particles that it interacts with. Needs to query and figure out where these other particles are at any time step of the numerical integration What is crucial is the interconnect between the processing unitsTypically some fast dedicated interconnect ( e.g. InfiniBand), which operates at 40 GB/sEuclid@UW-Madison: 1 GB/s Ethernet, Bluewaters@UIU/C: 100 GB/s, Tianhe-I claims double the speed of InfinibandTypically uniform hardware components: e.g. 10,000 Intel Xeon 5520, or 64 Tesla C2050 cards, etc.Comes at a premium $$$4High Performance Computing (HPC)vs. High Throughput Computing (HTC)High Throughput ComputingUsually a collection of heterogeneous compute resources linked through a slow connection, most likely EthernetExample: 120 Windows workstations in the CAE labs (all sorts of machines, some new, some old)When CAE machine 58 runs an ABAQUS bone simulation there is no communication needed with CAE machine 83 that runs a different ABAQUS scenarioDon’t need to spend any money, you can piggyback on resources that are willing to make themselves availableVery effective to run Monte Carlo type analyses5High Performance Computing (HPC)vs. High Throughput Computing (HTC)You can do HPC on a configuration that has slow interconnectIt will run very very slow…You can do HTC on an IBM Blue GeneYou need to have the right licensing system in place to “check out” 10,000 ABAQUS licensesYou will use the processors but will waste the fast interconnect that made the machine expensive in the first placeUniversity of Wisconsin-Madison well known due to the pioneering work in the area of HTC done by Professor Miron Livny in CSUW-Madison solution for HTC: Condor, used by a broad spectrum of organizations from academia and industryOther commercial solutions now available for HTC: PBSWorks, form AltairGoogle and Amazon are heavily invested in the HTC ideaThe line between HPC and HTC is blurred when it comes to cloud computingCloud computing: you rely on hardware resources made available by a third party. The solution of choice today for HTC. If the machines in the cloud linked by fast interconnect one day might consider running HPC jobs there as well…6End: Overview of H&S for parallel computing Beginning: GPU Computing, CUDA Programming Model7AcknowledgementsMany slides herein include material developed at the University of Illinois Urbana-Champaign by Professor W. Hwu and Adjunct Professor David Kirk (the latter was Chief Scientist at NVIDIA back in the day)The slides are used with the permission of the authors, which is gratefully acknowledgedSlides that include material produced by professors Hwu and Kirk contain a HK-UIUC logo in the lower left corner of the slideSeveral other slides are lifted from other sources as indicated along the way8Why GPU computing in ME964?Class devoted to High Performance Computing in Engineering ApplicationsGPU computing is not quite High Performance Computing (HPC)However, it shares with HPC the important aspect that they both draw on parallel

View Full Document


School:
Email:
New Password:
Confirm Password:

UW-Madison ME 964 - HPC vs. HTC CUDA Programming Model

Sign up for free to view:

Please select your school