UW-Madison ME 964 - The CUDA Compiler Driver NVCC - D2224598

Home> Schools> University of Wisconsin, Madison> Mechanical Engineering (ME) > ME 964> The CUDA Compiler Driver NVCC

DOC PREVIEW

UW-Madison ME 964 - The CUDA Compiler Driver NVCC

School name University of Wisconsin, Madison

Course Me 964- High Performance Computing for Engineering Applications

Pages 30

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 30 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Last modified on: 11/5/2007 The CUDA Compiler Driver NVCCii 11/5/2007 Document Change History Version Date Responsible Reason for Change beta 01-15-2007 Juul VanderSpek Initial release 0.1 05-25-2007 Juul VanderSpek CUDA 0.1 release 1.0 06-13-2007 Juul VanderSpek CUDA 1.0 release 1.1 10-12-2007 Juul VanderSpek CUDA 1.1 release1 11/5/2007 Introduction Overview CUDA programming model The CUDA Toolkit targets a class of applications whose control part runs as a process on a general purpose computer (Linux, Windows), and which use an NVIDIA GPU as coprocessor for accelerating SIMD parallel jobs. Such jobs are ‘self- contained’, in the sense that they can be executed and completed by a batch of GPU threads entirely without intervention by the ‘host’ process, thereby gaining optimal benefit from the parallel graphics hardware. Dispatching GPU jobs by the host process is supported by the CUDA Toolkit in the form of remote procedure calling. The GPU code is implemented as a collection of functions in a language that is essentially ‘C’, but with some annotations for distinguishing them from the host code, plus annotations for distinguishing different types of data memory that exists on the GPU. Such functions may have parameters, and they can be ‘called’ using a syntax that is very similar to regular C function calling, but slightly extended for being able to specify the matrix of GPU threads that must execute the ‘called’ function. During its life time, the host process may dispatch many parallel GPU tasks. See Figure 1, ref […]. CUDA sources Hence, source files for CUDA applications consist of a mixture of conventional C++ ‘host’ code, plus GPU ‘device’ (i.e. GPU-) functions. The CUDA compilation trajectory separates the device functions from the host code, compiles the device functions using proprietory NVIDIA compilers/assemblers, compiles the host code using any general purpose C/C++ compiler that is available on the host platform, and afterwards embeds the compiled GPU functions as load images in the host object file. In the linking stage, specific CUDA runtime libraries are added for supporting remote SIMD procedure calling and for providing explicit GPU manipulation such as allocation of GPU memory buffers and host-GPU data transfer. Purpose of nvcc This compilation trajectory involves several splitting, compilation, preprocessing, and merging steps for each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as compilation for device emulation, or the generation of ‘fat device code binaries’). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers. Additionally, instead of being a specific CUDA compilation driver, nvcc mimics the behavior of general purpose compiler drivers (such as gcc), in thatThe CUDA Compiler Driver 2 11/5/2007 it accepts a range of conventional compiler options, such as for defining macros and include/library paths, and for steering the compilation process. All non-CUDA compilation steps are forwarded to a general C compiler that is available on the current platform, and in case this compiler is an instance of the Microsoft Visual Studio compiler, nvcc will translate its options into appropriate ‘cl’ command syntax. This extended behavior plus ‘cl’ option translation is intended for support of application build and make scripts when these must be portable across Linux and Windows platforms.The CUDA Compiler Driver 3 11/5/2007 /* --------------------------- target code ------------------------------*/ struct acosParams { float *arg; float *res; int n; }; __global__ void acos_main (struct acosParams parms) { int i; for (i = threadIdx.x; i < parms.n; i += ACOS_THREAD_CNT) { parms.res[i] = acosf(parms.arg[i]); } } /* --------------------------- host code ------------------------------*/ int main (int argc, char *argv[]) { cudaError_t cudaStat; float* acosRes = 0; float* acosArg = 0; float* arg = malloc(N*sizeof(arg[0])); float* res = malloc(N*sizeof(res[0])); struct acosParams funcParams; ... fill arguments array ‘arg’ .... cudaStat = cudaMalloc ((void **)&acosArg, N * sizeof(acosArg[0])); cudaStat = cudaMemcpy (acosArg, arg, N * sizeof(arg[0]), cudaMemcpyHostToDevice); funcParams.res = acosRes; funcParams.arg = acosArg; funcParams.n = N; acos_main<<<1,ACOS_THREAD_CNT>>>(funcParams); cudaStat = cudaMemcpy (res, acosRes, N * sizeof(res[0]), cudaMemcpyDeviceToHost); ... process result array ‘res’ .... } Figure 1: Example of CUDA source fileThe CUDA Compiler Driver 4 11/5/2007 Compilation Phases Nvcc identification macro Nvcc predefines the macro __CUDACC__. This macro can be used in sources to test whether they are currently being compiled by nvcc. Nvcc phases A compilation phase is the a logical translation step that can be selected by command line options to nvcc. A single compilation phase can still be broken up by nvcc into smaller steps, but these smaller steps are ‘just’ implementations of the phase: they depend on seemingly arbitrary capabilities of the internal tools that nvcc uses, and all of these internals may change with a new release of the CUDA Toolkit Hence, only compilation phases are stable across releases, and although nvcc provides options to display the compilation steps that it executes, these are for debugging purposes only and must not be copied and used into build scripts. Nvcc phases are selected by a combination of command line options and input file name suffixes, and the execution of these phases may be modified by other command line options. In phase selection, the input file suffix defines the phase input, while the command line option defines the required output of the phase. 0 provides a full explanation of the nvcc command line options. 0 will explain more on the the different input and intermediate file types. The following paragraphs will list the recognized file name suffixes and the supported compilation phases. Supported input file suffixes The following table defines how nvcc interprets its input files .cu CUDA source file, containing host code and device functions .cup Preprocessed CUDA source file, containing

View Full Document