This preview shows page 1-2-3-4-29-30-31-32-33-60-61-62-63 out of 63 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RapidmindBackground on GPGPUGPU Tutorial• Goal: 3-D image -> 2-D image• 2 main stages:– Convert 3-D coordinates to 2-D windows• Vertex processing– Fill in 2-D windows• Fragment processingGPU Hardware PipelineGPUCPUApplicationApplicationTransform& LightTransform& LightRasterizeRasterizeShadeShadeVideoMemory(Textures)VideoMemory(Textures)Xformed, Lit Vertices(2D)Graphics StateRender-to-textureAssemblePrimitivesAssemblePrimitivesVertices (3D)Screenspace triangles(2D)Fragments (pre-pixels)Final Pixels (Color,Depth)FragmentProcessorFragmentProcessorVertexProcessorVertexProcessorGPU Parallelism• Parallelism @ vertex and fragmentcalculationsvpfp fp fp fpfp fp fp fpvp vp vp vp vpRasterizerfp fp fp fp fp fp fp fpFrame bufferVertexprocessorsFragmentprocessorsGPU Programmability• Vertex and fragment processors can beprogrammed• Shader = programs written for vertex andfragment calculations• Vertex shaders = transformation, lighting• Fragment shaders = texture, color, fogGPU SIMD• Vertex processors all run SAME shaderprogram• Fragment processor all run SAME shaderprogramvpfp fp fp fpfp fp fp fpvp vp vp vp vpRasterizerfp fp fp fp fp fp fp fpFrame bufferVertexprocessorsFragmentprocessorsGPU Drawbacks• No integer data operands• No integer operations– e.g. bit shift, AND, OR, XOR, NOT• No double precision arithmetic• Unusual programming modelGPU Improvement• NVIDIA GeForce G80 – unified pipelineand shader• CUDA – Computer Unified DeviceArchitecture• Unified stream processors– Vertices, pixels, geometry, physics– General purpose floating point processors– Scalar processor rather than vector processorNVIDIA GeForce 8800Facts and MotivationsWhy Are GPUs So Fast?• GPU originally specialized for math-intensive, highlyparallel computation• So, more transistors can be devoted to data processingrather than data caching and flow controlProblem: GPGPU• OLD: GPGPU – trick the GPU into general-purposecomputing by casting problem as graphics– Turn data into images (“texture maps”)– Turn algorithms into image synthesis (“rendering passes”)• Promising results, but:– Tough learning curve, particularly for non-graphics experts– Potentially high overhead of graphics API– Highly constrained memory layout & access model– Need for many passes drives up bandwidth consumption• New GPGPU: Many high level tools are available for use– Rapidmind, Peakstream(now acquired by google), CUDA …Platform overview andProgramming modelPlatform overview• RapidMind is a development andruntime platform that enablessingle threaded, manageableapplications to fully access multi-core processors.• With RapidMind, developerscontinue to write code in standardC++ and use their existing skills,tools and processes.• The RapidMind platform thenparallelizes the application acrossmultiple cores and manages itsexecution.Platform overview• API– Intuitive, integrates with existing C++ compilers, and requires nonew tools or workflow• Platform– Code Optimizer analyzes and optimizes computations to removeoverhead– Load Balancer plans and synchronizes work to keep all cores fullyutilized– Data Manager reduces data bottlenecks– Logging/Diagnostics detects and reports performance bottlenecks• Processor Support Modules– x86 processors from AMD and Intel– ATI/AMD and NVIDIA GPUs– Cell Blade, Cell Accelerator Board, PS3SIMD (Single Instruction MultipleData)• All parallel execution units are synchronized– they respond to a single instruction from singleprogram counter• Operates on Vectors of data all of the same type– member elements of vector must have same meaningfor parallelism to be useful• Achieves data level parallelismSPMD (Single Program MultipleData)• A subcategory of MIMD (Multiple InstructionMultiple Data)• Tasks are split up and run simultaneously ondifferent processors with different input data• Processors run program at independent points asopposed to the lockstep execution of SIMD• Usually refers to message passing vs sharedmemoryGPU SIMD/SPMD• The processors all share the same program counter and pipeline.– When processor 1 is at instruction 23, all the processors are at instruction23.• The limited support for control flow:– Each processor has it's own execution mask that can conditionally beexecuted for one instruction.– Thus if you have a loop starting at instruction 10 and ending with aconditional branch on instruction 23 then; if just one processor has tocontinue looping but all 127 other processors are ready to leave the loopthey (the 127) will be masked off from executing until the single processorhas finally exited the loop.More powerful than regular SIMD, but not have overhead on control flow.GPU SIMD cont• Sub grouping reduces this impact as each subgroup has it'sown program counter, set of masks and processors. If theloop scenario occurs then only the processors in the groupare affected - thus say in a sub group of 32 processors, 1loops and the other 31 are masked off. The remainingprocessors in the other subgroups are not affected.• Note, it is believed that it is a feature of G80 to make it more suitablefor GPGPU. Not very clear that GLSL can make use of that or not.Rapidmind SPMD• Allows control flow in the kernel program• More powerful than SIMD• Example code:Program p;p = RM_BEGIN {In<Value3f> a, b;Out<Value3f> c;Value3f d = f(a, b); RM_IF ( all( a > 2.0f ) ) {c = d + a * 2.0f;} RM_ELSE {c = d - a * 2.0f;} RM_ENDIF;} RM_END;• The control flow can be converted to corresponding control flows inGLSL, but the overhead on control flow (due to hardware) still existsJust in time compilation• Converting program definition intoOpenGL codes at runtime• Program algebra : operations on theprograms (discussed later)• Two modes : retained mode / intermediatemodeJust in time compilation• First, it decides which "backend" should be responsible forthe program executions.– Backends form the connection between the RapidMind platform and aparticular piece of target hardware, E.g Cell BE, OpenGL-basedGPUs, and a fallback backend.• Once a suitable backend has been chosen (a process that isgenerally instantaneous), it is asked to execute the programunder the given conditions.– The first time this is done generally causes the program object to becompiled for that particular backend, similar to the way a JITenvironment behaves. Once a program has been compiled, it is


View Full Document

UCLA COMSCI 239 - Rapidmind

Download Rapidmind
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Rapidmind and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Rapidmind 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?