ME964High Performance Computing for Engineering ApplicationsWe live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology. Carl Sagan© Dan Negrut, 2011ME964 UW-MadisonBuilding CUDA apps under Visual Studio Accessing NewtonCUDA Programming ModelCUDA APIFebruary 03, 2011Before We Get Started… Last time HPC vs. HTC Quick overview of the CPU/GPU hardware and latency/bandwidths of relevance GPGPU, GPU programming and CUDA Started discussion on building CUDA apps in Visual Studio 2008 Today Andrew: wrap up building CUDA apps in Visual Studio 2008 Andrew: running apps through the HPC scheduler on Newton End quick overview of CUDA programming model Start discussion of CUDA API HW Due next Tu: two problems Includes reading of two papers: Amdhal and Manferdelli (links on the class website)23Setting up a Visual Studio 2008 CUDA ProjectBefore We Get Started… Assumptions Visual Studio 2008 installed CUDA Toolkit and GPU Computing SDK 3.2 installed (Optional) Developer Drivers installed Overview Set system environment variables for missing DLLs Run CUDA VS Wizard Set up project properties Compile and run example CUDA program4Environment Setup Why? Some DLLs missing from the %PATH% Add ;%NVSDKCOMPUTE_ROOT%\C\common\lib to system PATH environment variable Under: My Computer ->System Properties ->Advanced ->Environment Variables Alternatively: copy any missing DLLs to output dir5Configuring the Project Install CUDA VS Wizard* (http://goo.gl/Fh55o) Start Visual Studio, File -> New Project CUDA{32|64} -> CUDAWinApp, give name Next -> select Empty Project -> Finish Right click project -> Properties (if using shrUtils from Nvidia) Linker -> General Add ‘$(NVSDKCOMPUTE_ROOT)\shared\lib’ to Add’l Lib Dirs Linker -> Input Add shrUtils{32|64}{|D}.lib to Add’l Deps (Choose 32 or 64, nothing or D if release or debug) CUDA Build Rule -> General Add ‘$(NVSDKCOMPUTE_ROOT)\shared\inc’ to Add’l Include Dirs6* (Preferable) Use CUDA build rules directly, but the Wizard sets various other options for youWriting Code F7 to compile, F5 to debug (if you have a CUDA-capable GPU) Once it works, copy to Newton (next)7Helpful Hints(Compiling) Missing header (*.h) files: Add path to the ‘include’ dir containing the header Symbol not found Add *.lib file under dependencies, may need to add library dir(When running program) *.dll not found Either add the DLL dir to system path or copy DLL to program dir Black (cmd) window opens, quickly disappears Run from command prompt or (dangerous if on cluster) add a system(“pause”) at the end8Helpful Hints (cont’d) For code completion (‘Intellisense’) and pretty colors: Run *.reg file in %CUDA_PATH%\extras\visual_studio_integration In Visual Studio, Tools -> Options -> Text Editor -> File Extension Extension: cu Editor: Microsoft Visual C++910Running a CUDA app on the GPU cluster (Newton)Some Quick Notes… Must be inside the College of Engineering firewall to access Newton From a CAE lab On UWNet-Engineering wireless Via Engineering VPN (not WiscVPN) Remote Desktop via Kaviza - http://remote.cae.wisc.edu/dt/ User accounts managed by CAE All students registered for the class are eligible Auditors from outside CoE can request a temporary account Will see ‘ME964’ under your groups in My.CAE once you have access to the cluster This presentation will cover how to submit jobs via Remote Desktop Future presentation will show how to install HPC Pack and submit jobs from your local machine11Getting Started Copy all files for your program to Newton \\newton.msvc.wisc.edu\Data\ME964\%username% (replace %username% with your CAE username)12Remote Desktop to Newton Computer: newton.msvc.wisc.edu User name: ENGR\%username%13Start HPC Job Manager Start -> All Programs -> Microsoft HPC Pack 2008 R214Creating a Job New Single-Task Job Command line: program to run Working directory: \\newton\Data\ME964\%username% Give a filename for STDERR & STDOUT, though STDIN is optional Select number of cores to use Your program must be written to take advantage of them (egvia threads)15Creating a Job - Notes Single-Task Job simplest to setup New Job gives you much more control Parametric Sweep lets you run the same program with different parameters (Monte Carlo) GPUs are not currently reserved – be careful Working on this, HPC Pack does not natively let you do it All files for your program must reside in the working directory – unlike Condor, HPC Pack does not take care of this for you16Finishing Up After submitting, you can monitor the job’s progress in the Job Manager Once it finishes (or fails), double click for more info Which tasks finished/failed Task outputs Where each task ran17Why Did It Fail? Most common: libraries not installed 2ndmost common: compiled using wrong version of CUDA Toolkit When in doubt, reinstall from \\newton.msvc.wisc.edu\Data\Downloads\Drivers Does it run locally? Check log files for STDOUT and STDERR Still not working? Contact Andrew18Other Notes Programs must be able to finish/die without any user interaction, otherwise will hang OpenGL/DirectX are not available TCC Mode enabled, cards are only good for number crunching Also see: http://technet.microsoft.com/en-us/library/ff919670(WS.10).aspx1920Back to the Overview of the CUDA Programming ModelA Simple C-CUDA Program You want to add two vectors A and B and store the result in a vector C Assume that the size of the vectors is N=512 Here’s how things get done (some details omitted such as #define N 512) 21Execution Configuration: Grids and Blocks A kernel is executed as a grid of blocks of threads All threads in a kernel can access several device data memory spaces A block [of threads] is a batch of threads that can cooperate with each other by: Synchronizing their execution Efficiently sharing data through a low latency shared memory Threads from two different blocks cannot cooperate!!! This has important software design implicationsHostKernel 1Kernel 2DeviceGrid 1Block(0, 0)Block(1, 0)Block(2, 0)Block(0, 1)Block(1, 1)Block(2, 1)Grid 2Block (1, 1)Thread(0, 1)Thread(1, 1)Thread(2, 1)Thread(3,
View Full Document