UT CS 395T - OpenCL - D1892959

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> OpenCL

DOC PREVIEW

UT CS 395T - OpenCL

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Slide 1IntroductionOpenCL Design GoalsOpenCL Platform ModelOpenCL Programming ModelOpenCL Task-Parallel KernelsOpenCL Memory modelOpenCL ObjectsOpenCL Kernel ObjectsOpenCL Program ObjectsOverall PipelineOpenCL C LanguageOpenCL C LanguageOpenCL C LanguageSummaryOpenCLIntroductionOpen standard for parallel programming across heterogenous devicesDevices can consist of CPUs, GPUs, embedded processors etc – uses all the processing resources availableIncludes a language based on C99 for writing kernels and API used to define and control the devicesParallel computing through task-based and data-based parallelism.OpenCL Design GoalsUse all computational resources the systemPlatform independenceProvide a data and task parallel computational modelProvide a programming model which abstracts the specifics of the underlying hardwareSpecify accuracy of floating-point computationsSupport both desktop and handheld/portables.OpenCL Platform ModelHost connected to one or more OpenCL devicesDevice consists of one or more coresExecution per processor may be SIMD or SPMDContexts group together devices and enable inter-device communicationContextContextDevice A - CPUDevice A - CPUDevice B - GPUDevice B - GPUDevice C - DSPDevice C - DSPHOSTHOSTContextContextOpenCL Programming ModelKernel – basic unit of execution – data parallelProgram – collection of kernels and other related functionsKernels executed across a collection of work-items – one work-item per computationWork-items grouped together into workgroupsWorkgroups executed together on one deviceMultiple workgroups are executed independentlyApplications queue kernel instances for execution in-order, but they may be executed in-order or out-of-orderOpenCL Task-Parallel KernelsSome compute devices can also execute task-parallel kernelsExecute as a single work itemImplemented as either a kernel in OpenCL C or a native C/C++ functionOpenCL Memory modelPrivate memory is available per work item Local memory shared within workgroupNo synchronization between workgroupsSynchronization possible between work items in a workgroupGlobal/Constant memory for access by work-items – not synchronizedHost memory - access through the CPUMemory management is explicitData should be moved from host->global->local and backOpenCL ObjectsDevices – multiple cores on CPU/GPU together taken as a single deviceKernels executed across all cores in a data-parallel mannerContexts – Enable sharing between different devicesDevices must be within the same context to be able to shareQueues – used for submitting work, one per deviceBuffers – simple chunks of memory like arrays; read-write accessImages – 2D/3D data structuresAccess using read_image(), write_image()Either read or write within a kernel, but not bothOpenCL Kernel ObjectsDeclared with a kernel qualifierEncapsulate a kernel functionKernel objects are created after the executable is builtExecutionSet the kernel argumentsEnqueue the kernelKernels are executed asynchronouslyEvents used to track the execution statusUsed for synchronizing execution of two kernelsclWaitForEvents(), clEnqueueMarker() etc.OpenCL Program ObjectsEncapsulateA program source/binaryList of devices and latest successfully built executable for each deviceList of kernel objectsKernel source specified as a string can be provided and compiled at runtime using clCreateProgramWithSource() – platform independenceOverhead – compiling programs can be expensiveOpenCL allows for reusing precompiled binariesOverall PipelineOpenCL C LanguageDerived from ISO C99No standard headers, function pointers, recursion, variable length arrays, bit fieldsAdded features: work-items, workgroups, vector types, synchronizationAddress space qualifiersOptimized image accessBuilt-in functions specific to OpenCLData-typesChar, uchar, short, ushort, int, uint, long, ulongBool, intptr_t, ptrdiff_t, size_t, uintptr_t, halfImage2d_t, image3d_t, sampler_tVector types – portable, varying length (2,4,8,16), endian safeChar2,ushort4,int8,float16,double2 etc.OpenCL C LanguageWork-item and workgroup functionsget_work_dim(), get_global_size()get_group_id(), get_local_id()Vector operations and components are pre-defined as a language featureKernel functionsget_global_id() – gets the next work itemConversionsExplicit – convert_destType<_sat><_roundingMode>Reinterpret – as_destTypeScalar and pointer conversions follow C99 rulesNo implicit conversions/casts for vector typsOpenCL C LanguageAddress spacesKernel pointer arguments must use global, local or constantDefault for local variables is privateImage2d_t and image3d_t are always in global address spaceGlobal variables must be in constant address spaceCasting between different address spaces undefinedSummaryPortable and high-performance frameworkComputationally intensive algorithmsAccess to all computational resourcesWell defined memory/computational modelAn efficient parallel programming languageC99 with extensions for task and data parallelismSet of built in functions for synchronization, math and memory operationsOpen standard for parallel computing across heterogenous collection of

View Full Document

UT CS 395T - OpenCL

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

UT CS 395T - OpenCL

Sign up for free to view:

Please select your school