Unformatted text preview:

Spring 2010 Prof. Hyesoon Kim OpenCL Spec http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• OpenCL (open computing Language): a framework for writing programs that execute across heterogeneous platforms considering CPUs, GPUs, and other processors. • Initiated by Apple Inc. Now AMD, Intel, NVIDIA, etc. • AMD gave up CTM (close to Metal) and decided to support OpenCL• Nvidia will full support openCL1.0 Participating companies. http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdfCPUsMultiple cores drivingperformance increase Multi-processorprogramming GPUsIncreasing general purpose data-parallel computingimproving numerical precisionGraphics APIs and Shading Languages Emerging IntersectionOpenCLHeterogeneous Computinghttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Supports both data- and task-based parallel programming models (CPU: task, GPU: data) • Utilizes a subset of ISO C99 with extensions for parallelism• Defines consistent numerical requirements based on IEEE 754• Defines a configuration profile for handheld and embedded devices• Efficiently interoperates with OpenGL, OpenGL ES and other graphics APIshttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Software developers write parallel programs that will run on many devices • Hardware developers target openCL• Enables OpenCL on mobile and embedded siliconhttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Platform Model• Memory Model• Execution Model• Programming Modelhttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdfOne Host+ one ore more compute devices -Each compute device is composed of one or more compute units -Each compute unit is further divided into one or more processing units http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• OpenCL Program:– Kernels• Basic unit of executable code – similar to C function• Data-parallel or task-parallel– Host Program• Collection of compute kernels and internal functions• Analogous to a dynamic libraryhttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Kernel Execution– The host program invokes a kernel over an index space called an NDRange• NDRange = “N-Dimensional Range”• NDRange can be a 1, 2, or 3-dimensional space– A single kernel instance at a point in the index space is called a work-item• Work-items have unique global IDs from the index space• CUDA thread Ids – Work-items are further grouped into work-groups• Work-groups have a unique work-group ID• Work-items have a unique local ID within a work-group• CUDA Block IDs http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdfTotal number of work-items = Gxx GySize of each work-group = Sxx Syhttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Contexts are used to contain and manage the state of the “world”• Kernels are executed in contexts defined and manipulated by the host– Devices– Kernels - OpenCL functions– Program objects - kernel source and executable– Memory objects• Command-queue - coordinates execution of kernels– Kernel execution commands– Memory commands - transfer or mapping of memory object data– Synchronization commands - constrains the order of commands• Applications queue compute kernel execution instances– Queued in-order– Executed in-order or out-of-order– Events are used to implement appropriate synchronization of execution instanceshttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Shared memory– Relaxed consistency– (similar to CUDA) • Global memory– Global memory in CUDA • Constant memory – Constant memory in CUDA • Local memory (local memory to work group) – Shared memory in CUDA • Private memory (private to a work item) – local memory in CUDA http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdfGlobalConstantLocalPrivateHostDynamic allocationRead/write accessDynamic allocationRead/write accessDynamic allocationNo accessDynamic allocation No access KernelNo allocationRead/Write access Static allocationRead-onlyaccess Static allocationRead/write access Static allocationRead/write access• a relaxed consistency memory model– Across work-items (threads) no consistency – Within a work-item (thread) load/store consistency  in order execution – Consistency of memory shared between commands are enforced through synchronization http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Define N-Dimensional computation domain– Each independent element of execution in an N-Dimensional domain is called a work-item– N-Dimensional domain defines the total number of work-items that execute in parallel = global work size• Work-items can be grouped together — work-group– Work-items in group can communicate with each other– Can synchronize execution among work-items in group to coordinate memory access• Execute multiple work-groups in parallel– Mapping of global work size to work-group can be implicit or explicithttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Data-parallel execution model must be implemented by all OpenCL compute devices• Users express parallelism by – using vector data types implemented by the device,– enqueuing multiple tasks, and/or– enqueuing native kernels developed using a programming model orthogonal to OpenCL.http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Work-items in a single-work group– Similar to _synchthreads (); • Synchronization points between commands and command-queues – Similar to multiple kernels in CUDA but more generalized.– Command-queue barrier• Ensure all previously queued commands are executed and memory are reflected. – Waiting on an event.http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• OpenCL Platform layer: The platform layer allows the host program to discover openCLdevices and their capabilities and to create contexts.• OpenCL Runtime: The runtime allows the host program to manipulate contexts once they have been created.• OpenCL Compiler: The OpenCL compiler creates program executables that contain OpenCL kernelshttp://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf• Platform layer allows applications to query for platform specific features• Querying platform info (i.e., OpenCL profile)• Querying devices– clGetDeviceIDs()• Find out what compute devices are on the system• Device types include CPUs, GPUs, or Accelerators–


View Full Document

GT CS 4803 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?