This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

!"#$%&&%'() *') $+,") -%.%*+/) '#) 0+#-) 1'2%"&) '3) 2+#*) '#) +//) '3) *0%&) 4'#,) 3'#) 2"#&'(+/) '#1/+&&#''$)5&")%&).#+(*"-)4%*0'5*)3"")2#'6%-"-)*0+*)1'2%"&)+#")('*)$+-")'#)-%&*#%75*"-)3'#2#'3%*)'#)-%#"1*)1'$ $"#1%+/)+-6+(*+.")+(-)*0 +*)1'2%"&)&0'4)*0%&)('*%1")'()*0")3%#&*)2+.")'#%(%*%+/)&1#""()'3)+)-%&2/+ 8)+/'(.) 4%*0)*0")35//) 1%*+*%'(9):'28#%.0*&) 3'#)1'$2'("(*&) '3) *0%&4'#,)'4("-)78) ' * 0 " # & )*0+();:<)$5&*)7")0'('#"-9);7&*#+1*%(.)4%*0)1#"-%*)%&) 2 "# $ % * * " - 9 )='1'28) '*0"#4%&">) *') #"257/%&0>) *') 2'&*) '() &"#6"#&>) *') #"-%&*#%75*") *') /%&*&>) '#) *') 5&") +(81'$2'("(*) '3) *0%&) 4'#,) %()'*0"#) 4'#,&) #"?5%#"&) 2#%'#) &2"1%3%1) 2"#$%&& %'() +(-@'#)+) 3""9!"#$%&&%'(&)$+8)7")#"?5"&*"-)3#'$)!57/%1+*%'(&)A"2*9>);:<>)B(19>)CDCD)E#'+-4+8>)F"4G'#,>)FG)CHHIJ)KL;>)3+M)NC)OPCPQ)RJSTHURC>)'#)2"#$%&&%'(&V+1$9'#.9Brook for GPUs: Stream Computing on Graphics HardwareIan Buck Tim Foley Daniel Horn Jeremy Sugerman Kayvon Fatahalian Mike Houston Pat HanrahanStanford UniversityAbstractIn this pap er, we present Brook for GPUs, a systemfor general-purpose computation on programmable graphicshardware. Brook extends C to include simple data-parallelconstructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system thatabstracts and virtualizes many aspects of graphics hardware.In addition, we present an analysis of the effectiveness of theGPU as a compute engine compared to the CPU, to deter-mine when the GPU can outperform the CPU for a particu-lar algorithm. We evaluate our system with five applications,the SAXPY and SGEMV BLAS operators, image segmen-tation, FFT, and ray tracing. For these applications, wedemonstrate that our Brook implementations perform com-parably to hand-written GPU code and up to seven timesfaster than their CPU counterparts.CR Categories: I.3.1 [Computer Graphics]: Hard-ware Architecture—Graphics processors D.3.2 [Program-ming Languages]: Language Classifications—Parallel Lan-guagesKeywords: Programmable Graphics Hardware, DataParallel Computing, Stream Computing, GPU Computing,Brook1 IntroductionIn recent years, commodity graphics hardware has rapidlyevolved from being a fixed-function pipeline into having pro-grammable vertex and fragment processors. While this newprogrammability was introduced for real-time shading, it hasb een observed that these processors feature instruction setsgeneral enough to perform computation beyond the domainof rendering. Applications such as linear algebra [Kr¨ugerand Westermann 2003], physical simulation, [Harris et al.2003], and a complete ray tracer [Purcell et al. 2002; Carret al. 2002] have been demonstrated to run on GPUs.Originally, GPUs could only be programmed using as-sembly languages. Microsoft’s HLSL, NVIDIA’s Cg, andOpenGL’s GLslang allow shaders to be written in a highlevel, C-like programming language [Microsoft 2003; Market al. 2003; Kessenich et al. 2003]. However, these lan-guages do not assist the programmer in controlling otheraspects of the graphics pipeline, such as allocating texturememory, loading shader programs, or constructing graphicsprimitives. As a result, the implementation of applicationsrequires extensive knowledge of the latest graphics APIs aswell as an understanding of the features and limitations ofmodern hardware. In addition, the user is forced to ex-press their algorithm in terms of graphics primitives, suchas textures and triangles. As a result, general-purpose GPUcomputing is limited to only the most advanced graphicsdevelopers.This paper presents Brook, a programming environmentthat provides developers with a view of the GPU as a stream-ing coprocessor. The main contributions of this paper are:• The presentation of the Brook stream programmingmodel for general-purpose GPU computing. Throughthe use of streams, kernels and reduction operators,Brook abstracts the GPU as a streaming processor.• The demonstration of how various GPU hardware lim-itations can be virtualized or extended using our com-piler and runtime system; specifically, the GPU mem-ory system, the numb er of supported shader outputs,and support for user-defined data structures.• The presentation of a cost model for comparing GPUvs. CPU performance tradeoffs to better understandunder what circumstances the GPU outperforms theCPU.2 Background2.1 Evolution of Streaming HardwareProgrammable graphics hardware dates back to the origi-nal programmable framebuffer architectures [England 1986].One of the most influential programmable graphics systemswas the UNC PixelPlanes series [Fuchs et al. 1989] culmi-nating in the PixelFlow machine [Molnar et al. 1992]. Thesesystems embedded pixel processors, running as a SIMD pro-cessor, on the same chip as framebuffer memory. Peercy etal. [2000] demonstrated how the OpenGL architecture [Wooet al. 1999] can be abstracted as a SIMD processor. Eachrendering pass implements a SIMD instruction that per-forms a basic arithmetic operation and updates the frame-buffer atomically. Using this abstraction, they were ableto compile RenderMan to OpenGL 1.2 with imaging exten-sions. Thompson et al. [2002] explored the use of GPUs asa general-purpose vector processor by implementing a soft-ware layer on top of the graphics library that performedarithmetic computation on arrays of floating point numbers.SIMD and vector processing operators involve a read, anexecution of a single instruction, and a write to off-chip mem-ory [Russell 1978; Kozyrakis 1999]. This results in signifi-cant memory bandwidth use. Today’s graphics hardwareexecutes small programs where instructions load and storedata to local temporary registers rather than to memory.This is a major difference between the vector and streamprocessor abstraction [Khailany et al. 2001].The stream programming model captures computationallocality not present in the SIMD or vector mo dels throughthe use of streams and kernels. A stream is a collectionof records requiring similar computation while kernels are777W)PHHU);:<)HXIHTHIHC@HU@HRHHTHXXX)YD9HHInput RegistersOutput RegistersConstantsTemp RegistersTexturesShaderProgramFigure 1: Programming model for current programmablegraphics hardware. A shader program operates on a singleinput element (vertex or fragment) stored in the input regis-ters and writes the execution result into the output registers.functions applied to each element of a stream. A streamingprocessor executes a kernel over all elements of an inputstream, placing the


View Full Document

UCLA COMSCI 239 - brook04

Download brook04
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view brook04 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view brook04 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?