DOC PREVIEW
UT CS 395T - OpenCL

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1IntroductionOpenCL Design GoalsOpenCL Platform ModelOpenCL Programming ModelOpenCL Task-Parallel KernelsOpenCL Memory modelOpenCL ObjectsOpenCL Kernel ObjectsOpenCL Program ObjectsOverall PipelineOpenCL C LanguageOpenCL C LanguageOpenCL C LanguageSummaryOpenCLIntroductionOpen standard for parallel programming across heterogenous devicesDevices can consist of CPUs, GPUs, embedded processors etc – uses all the processing resources availableIncludes a language based on C99 for writing kernels and API used to define and control the devicesParallel computing through task-based and data-based parallelism.OpenCL Design GoalsUse all computational resources the systemPlatform independenceProvide a data and task parallel computational modelProvide a programming model which abstracts the specifics of the underlying hardwareSpecify accuracy of floating-point computationsSupport both desktop and handheld/portables.OpenCL Platform ModelHost connected to one or more OpenCL devicesDevice consists of one or more coresExecution per processor may be SIMD or SPMDContexts group together devices and enable inter-device communicationContextContextDevice A - CPUDevice A - CPUDevice B - GPUDevice B - GPUDevice C - DSPDevice C - DSPHOSTHOSTContextContextOpenCL Programming ModelKernel – basic unit of execution – data parallelProgram – collection of kernels and other related functionsKernels executed across a collection of work-items – one work-item per computationWork-items grouped together into workgroupsWorkgroups executed together on one deviceMultiple workgroups are executed independentlyApplications queue kernel instances for execution in-order, but they may be executed in-order or out-of-orderOpenCL Task-Parallel KernelsSome compute devices can also execute task-parallel kernelsExecute as a single work itemImplemented as either a kernel in OpenCL C or a native C/C++ functionOpenCL Memory modelPrivate memory is available per work item Local memory shared within workgroupNo synchronization between workgroupsSynchronization possible between work items in a workgroupGlobal/Constant memory for access by work-items – not synchronizedHost memory - access through the CPUMemory management is explicitData should be moved from host->global->local and backOpenCL ObjectsDevices – multiple cores on CPU/GPU together taken as a single deviceKernels executed across all cores in a data-parallel mannerContexts – Enable sharing between different devicesDevices must be within the same context to be able to shareQueues – used for submitting work, one per deviceBuffers – simple chunks of memory like arrays; read-write accessImages – 2D/3D data structuresAccess using read_image(), write_image()Either read or write within a kernel, but not bothOpenCL Kernel ObjectsDeclared with a kernel qualifierEncapsulate a kernel functionKernel objects are created after the executable is builtExecutionSet the kernel argumentsEnqueue the kernelKernels are executed asynchronouslyEvents used to track the execution statusUsed for synchronizing execution of two kernelsclWaitForEvents(), clEnqueueMarker() etc.OpenCL Program ObjectsEncapsulateA program source/binaryList of devices and latest successfully built executable for each deviceList of kernel objectsKernel source specified as a string can be provided and compiled at runtime using clCreateProgramWithSource() – platform independenceOverhead – compiling programs can be expensiveOpenCL allows for reusing precompiled binariesOverall PipelineOpenCL C LanguageDerived from ISO C99No standard headers, function pointers, recursion, variable length arrays, bit fieldsAdded features: work-items, workgroups, vector types, synchronizationAddress space qualifiersOptimized image accessBuilt-in functions specific to OpenCLData-typesChar, uchar, short, ushort, int, uint, long, ulongBool, intptr_t, ptrdiff_t, size_t, uintptr_t, halfImage2d_t, image3d_t, sampler_tVector types – portable, varying length (2,4,8,16), endian safeChar2,ushort4,int8,float16,double2 etc.OpenCL C LanguageWork-item and workgroup functionsget_work_dim(), get_global_size()get_group_id(), get_local_id()Vector operations and components are pre-defined as a language featureKernel functionsget_global_id() – gets the next work itemConversionsExplicit – convert_destType<_sat><_roundingMode>Reinterpret – as_destTypeScalar and pointer conversions follow C99 rulesNo implicit conversions/casts for vector typsOpenCL C LanguageAddress spacesKernel pointer arguments must use global, local or constantDefault for local variables is privateImage2d_t and image3d_t are always in global address spaceGlobal variables must be in constant address spaceCasting between different address spaces undefinedSummaryPortable and high-performance frameworkComputationally intensive algorithmsAccess to all computational resourcesWell defined memory/computational modelAn efficient parallel programming languageC99 with extensions for task and data parallelismSet of built in functions for synchronization, math and memory operationsOpen standard for parallel computing across heterogenous collection of


View Full Document

UT CS 395T - OpenCL

Documents in this Course
TERRA

TERRA

23 pages

Byzantine

Byzantine

32 pages

Load more
Download OpenCL
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view OpenCL and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view OpenCL 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?