UW-Madison ME 964 - Building CUDA apps under Visual Studio Accessing Newton - D2449469

Home> Schools> University of Wisconsin, Madison> Mechanical Engineering (ME) > ME 964> Building CUDA apps under Visual Studio Accessing Newton

DOC PREVIEW

UW-Madison ME 964 - Building CUDA apps under Visual Studio Accessing Newton

School name University of Wisconsin, Madison

Course Me 964- High Performance Computing for Engineering Applications

Pages 41

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 41 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

ME964High Performance Computing for Engineering ApplicationsWe live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology. Carl Sagan© Dan Negrut, 2011ME964 UW-MadisonBuilding CUDA apps under Visual Studio Accessing NewtonCUDA Programming ModelCUDA APIFebruary 03, 2011Before We Get Started… Last time HPC vs. HTC Quick overview of the CPU/GPU hardware and latency/bandwidths of relevance GPGPU, GPU programming and CUDA Started discussion on building CUDA apps in Visual Studio 2008 Today Andrew: wrap up building CUDA apps in Visual Studio 2008 Andrew: running apps through the HPC scheduler on Newton End quick overview of CUDA programming model Start discussion of CUDA API HW Due next Tu: two problems Includes reading of two papers: Amdhal and Manferdelli (links on the class website)23Setting up a Visual Studio 2008 CUDA ProjectBefore We Get Started… Assumptions Visual Studio 2008 installed CUDA Toolkit and GPU Computing SDK 3.2 installed (Optional) Developer Drivers installed Overview Set system environment variables for missing DLLs Run CUDA VS Wizard Set up project properties Compile and run example CUDA program4Environment Setup Why? Some DLLs missing from the %PATH% Add ;%NVSDKCOMPUTE_ROOT%\C\common\lib to system PATH environment variable Under: My Computer ->System Properties ->Advanced ->Environment Variables Alternatively: copy any missing DLLs to output dir5Configuring the Project Install CUDA VS Wizard* (http://goo.gl/Fh55o) Start Visual Studio, File -> New Project CUDA{32|64} -> CUDAWinApp, give name Next -> select Empty Project -> Finish Right click project -> Properties (if using shrUtils from Nvidia) Linker -> General Add ‘$(NVSDKCOMPUTE_ROOT)\shared\lib’ to Add’l Lib Dirs Linker -> Input Add shrUtils{32|64}{|D}.lib to Add’l Deps (Choose 32 or 64, nothing or D if release or debug) CUDA Build Rule -> General Add ‘$(NVSDKCOMPUTE_ROOT)\shared\inc’ to Add’l Include Dirs6* (Preferable) Use CUDA build rules directly, but the Wizard sets various other options for youWriting Code F7 to compile, F5 to debug (if you have a CUDA-capable GPU) Once it works, copy to Newton (next)7Helpful Hints(Compiling) Missing header (*.h) files: Add path to the ‘include’ dir containing the header Symbol not found Add *.lib file under dependencies, may need to add library dir(When running program) *.dll not found Either add the DLL dir to system path or copy DLL to program dir Black (cmd) window opens, quickly disappears Run from command prompt or (dangerous if on cluster) add a system(“pause”) at the end8Helpful Hints (cont’d) For code completion (‘Intellisense’) and pretty colors: Run *.reg file in %CUDA_PATH%\extras\visual_studio_integration In Visual Studio, Tools -> Options -> Text Editor -> File Extension Extension: cu Editor: Microsoft Visual C++910Running a CUDA app on the GPU cluster (Newton)Some Quick Notes… Must be inside the College of Engineering firewall to access Newton From a CAE lab On UWNet-Engineering wireless Via Engineering VPN (not WiscVPN) Remote Desktop via Kaviza - http://remote.cae.wisc.edu/dt/ User accounts managed by CAE All students registered for the class are eligible Auditors from outside CoE can request a temporary account Will see ‘ME964’ under your groups in My.CAE once you have access to the cluster This presentation will cover how to submit jobs via Remote Desktop Future presentation will show how to install HPC Pack and submit jobs from your local machine11Getting Started Copy all files for your program to Newton \\newton.msvc.wisc.edu\Data\ME964\%username% (replace %username% with your CAE username)12Remote Desktop to Newton Computer: newton.msvc.wisc.edu User name: ENGR\%username%13Start HPC Job Manager Start -> All Programs -> Microsoft HPC Pack 2008 R214Creating a Job New Single-Task Job Command line: program to run Working directory: \\newton\Data\ME964\%username% Give a filename for STDERR & STDOUT, though STDIN is optional Select number of cores to use Your program must be written to take advantage of them (egvia threads)15Creating a Job - Notes Single-Task Job simplest to setup New Job gives you much more control Parametric Sweep lets you run the same program with different parameters (Monte Carlo) GPUs are not currently reserved – be careful Working on this, HPC Pack does not natively let you do it All files for your program must reside in the working directory – unlike Condor, HPC Pack does not take care of this for you16Finishing Up After submitting, you can monitor the job’s progress in the Job Manager Once it finishes (or fails), double click for more info Which tasks finished/failed Task outputs Where each task ran17Why Did It Fail? Most common: libraries not installed 2ndmost common: compiled using wrong version of CUDA Toolkit When in doubt, reinstall from \\newton.msvc.wisc.edu\Data\Downloads\Drivers Does it run locally? Check log files for STDOUT and STDERR Still not working? Contact Andrew18Other Notes Programs must be able to finish/die without any user interaction, otherwise will hang OpenGL/DirectX are not available TCC Mode enabled, cards are only good for number crunching Also see: http://technet.microsoft.com/en-us/library/ff919670(WS.10).aspx1920Back to the Overview of the CUDA Programming ModelA Simple C-CUDA Program You want to add two vectors A and B and store the result in a vector C Assume that the size of the vectors is N=512 Here’s how things get done (some details omitted such as #define N 512) 21Execution Configuration: Grids and Blocks A kernel is executed as a grid of blocks of threads All threads in a kernel can access several device data memory spaces A block [of threads] is a batch of threads that can cooperate with each other by: Synchronizing their execution Efficiently sharing data through a low latency shared memory Threads from two different blocks cannot cooperate!!! This has important software design implicationsHostKernel 1Kernel 2DeviceGrid 1Block(0, 0)Block(1, 0)Block(2, 0)Block(0, 1)Block(1, 1)Block(2, 1)Grid 2Block (1, 1)Thread(0, 1)Thread(1, 1)Thread(2, 1)Thread(3,

View Full Document