OpenCLMike [email protected] State Universitymjb – November 21, 2011Oregon State UniversityComputer GraphicsOregon State UniversitySpeedReaching the Promised LandNVIDIA GPUsCUDAKnights Cornermjb – November 21, 2011Oregon State UniversityComputer GraphicsGeneral ProgrammabilityIntel CPUsBut, the problem is that you have to use a vendor-specific APIOpenCL – Vendor-independent GPU ProgrammingYour OpenCL CodeKC Compiler and LinkerKC codeCUDA Compiler and LinkerCUDA codeorCTM Compiler and LinkerCTM codeormjb – November 21, 2011Oregon State UniversityComputer GraphicsLinkerOpenCL for Knights Corner SystemsOpenCL for NVIDIA Systemsand LinkerOpenCL for AMD/ATI Systemsand LinkerThis happens in the vendor-specific driverOpenCL• A C-like language, originally proposed by Apple, now an industry standard•Like CUDA, OpenCL can share data with OpenGL• You write one program, but designate a C/C++ part of it to run on the CPU and an OpenCL part to run on the GPU • You can’t ask for threads in the OpenCL part, but the translation process might create them for you. (Also, you can use them in the CPU part via OpenMP, pthreads, etc.)mjb – November 21, 2011Oregon State UniversityComputer Graphicsthem for you. (Also, you can use them in the CPU part via OpenMP, pthreads, etc.)voidmul( int n, float *a, float *b, float *c){int i;for ( int i = 0; i < n; i++ )c[i] = a[i] * b[i];}OpenCL wants you to break the problem downmjb – November 21, 2011Oregon State UniversityComputer Graphicskernel voidmul( global float *a, global float *b, global float *c){int id = get_global_id (0 );c[id] = a[id] * b[id];}OpenCL also supports vector parallelismPart of OpenCL is vector-oriented, meaning that it can perform a single instruction on multiple data values at the same time (SIMD). Vector data types are: charn, intn, floatn, where n = 2, 4, 8, or 16.float4 f, g;f = (float4)( 1.f, 2.f, 3.f, 4.f );mjb – November 21, 2011Oregon State UniversityComputer Graphicsf = (float4)( 1.f, 2.f, 3.f, 4.f );float16 a16, x16, y16, z16;f.x = 0.;f.xy = g.zw;x16.s89ab = f;float16 a16 = x16 * y16 +
View Full Document