Graphics HardwareFrom last time…Graphics from a system’s perspectiveOpenGL 1.4 - Graphics PastOpenGL 2.0 - Graphics TodayA GPU block diagramVertex processor capabilitiesVertex processor Inputs & OutputsFragment processor capabilitiesFragment processor Inputs & OutputsGPU ProgrammabilityHistorical: GeForce 3 Vertex ProcessorGPU/CPU DifferencesRecent GPUs (GeForce 6)Recent fragment processorsFragment unit performanceProgramming the BeastHigh-level languages to the rescueExplosion of GPU HLLsShading language usesHLLs are still hard to useEnter the GPU IDEFX Composer & RenderMonkeyGPGPUGraphics hardware futureGraphics hardware futureNext time2/26/071Graphics HardwareComputer GraphicsCOMP 770 (236)Spring 2007Instructor: Brandon Lloyd2/26/072From last time…■ Texture coordinates■ Uses of texture maps° reflectance and other surface parameters° lighting° geometry■ Solid textures2/26/073Graphics from a system’sperspective■ Graphics operations most frequently executed on a co-processor called a Graphics Processing Unit (GPU)■ Dedicated buses between the “host” CPU and the GPU° AGP, PCI Express■ Separate GPU memory° Framebuffer, textures, etc.■ Shared memory with CPU2/26/074OpenGL 1.4 - Graphics Past■ Fixed-function graphics pipeline° every step neatly planned■ PHILOSOPHY: Performance > Flexibility■ Extended by committee■ Why process anything other than polygons or the occasional pixel?A fragment is a potentialVertex TransformsCull, Clip& ProjectProcessAndRasterizePrimitiveFragmentProcessingPer-Fragment OperationsFrameBufferOperationsFrameBufferTextureMemoryRead BackControlPixel Pack & UnpackDisplayHost Commands2/26/075OpenGL 2.0 - Graphics Today■ Programmable processing units ° Programmable vertex and fragment processors° (Exposes what was always there beneath the covers)■ Texture memory – general-purpose data storageVertexProcessorVertexProcessorCull, Clip& ProjectProcessandRasterizePrimitiveFragmentProcessorFragmentProcessorPer-Fragment OperationFrameBufferOperationFrameBufferTextureMemoryRead BackControlPixel Pack & UnpackDisplayHost Commands2/26/076A GPU block diagram■ GeForce 6 Series■ Massive parallelism° pipelining° multiple data paths■ Mix of programmable and hardwired function blocks■ Simple inter-processor connectivity ■ High-bandwidthmemory interfaces2/26/077Vertex processor capabilities■ Lighting, Material and Geometry flexibility■ Vertex programs replace the following parts of the pipeline:° Vertex & Normal transformation° Normalization and rescaling° Per-Vertex Lighting Calculations° Color application ° Texture coordinate generation & transformation■ The vertex program does NOT replace:° Perspective divide and viewport (NDC) mapping° Clipping° Backface culling° Primitive assembly (Triangle setup, edge equations, etc.)2/26/078Vertex processor Inputs & Outputs■ Vertex “shader” is supplied with a number of parameters° Vertex parameters, OpenGL state, user supplied parameters■ Results written into prearranged locations (registers) that are “understood” by later processing stepsVertex ProcessorStandard OpenGL attributesglColor, glNormalglVertex, glMultiTexCoordUser-Defined AttributesUser-Defined Uniform VariableseyePosition, lightPosition, modelScaleFactor, etc.Standard OpenGL StateStandard OpenGL variablesVertex & texture coords, Vertex colorUser-Defined variablesModel coordinates,Normals, hVector,toEyeVector, etcModelViewMatrix, glLightSource[0,..n],glFogColor, glFrontMaterial, etc.2/26/079Fragment processor capabilities■ Flexibility for texturing and per-pixel operations■ Fragment programs replace the following parts of the OpenGL pipeline:Operations on interpolated values Pixel zoomTexture access Scale and biasTexture application (modulate, add) Color table lookupFog (color, depth) ConvolutionColor sums (blends, mattes) Color matrix■ The Fragment shader does NOT replace:Scan Conversion HistogramCoverage Pixel packing and unpackingScissor StippleAlpha test Depth testStencil test Alpha blendingLogical ops DitheringPlane masking Z-buffer replacement test2/26/0710Fragment processor Inputs & Outputs■ Fragment “shader” is supplied with a number of parameters° fragment parameters, OpenGL state, user supplied parameters■ Results written into prearranged locations (registers) that are “understood” by later processing stepsFragment ProcessorStandard Rasterizer attributescolor (r, g, b, a), depth (z), textureCoordinatesUser-Defined AttributesNormals, modelCoord,density, etcUser-Defined Uniform VariableseyePosition, li:ghtPosition, modelScaleFactor, epsilon, etc.TextureMemoryTextures, Tables,TempStorageStandard OpenGL variablesFragmentColor, FragmentDepth2/26/0711GPU Programmability■ The first Vertex and Fragment programs were written in low-level, H/W-specific assembly languages° specific capabilities (eg. floating point only in Vertex shaders, fixed-point only in Fragment shaders)■ Trend is toward Higher-Level languages° GeForce 8800 has unified shaders (same capabilities for both Vertex and Fragment shaders)ApplicationVertexProcessorFragmentProcessorProcessandRasterizePrimitivePerFragment& FrameBufferOpsFrameBufferVertex“Shader”ProgramApplicationProgramFragment“Shader”Program2/26/0712Historical: GeForce 3 Vertex Processor■ In the beginning, resources were limited■ Difficult to do anything, even at the assemblylevel■ Useful macros for° Vector-scalar mult° Vector-vector add° Dot-product° Normalizebecame the programmingmethod of choiceVertex Attributes16x4 registersVertex Program128 instrsVertex Results15x4 registersUniformParameters(don’t change on each vertex)96x4 registersTempRegisters12x4 registersVectorFloating PtDatapath2/26/0713GPU/CPU Differences■ Early GPUs offered no branching support■ Conditional operations insteadIf (regA < 0)regB = regC■ No general indirect access to memory (i.e. lookup tables, textures, etc.)■ Limited Arrays (uniform parameters)■ Fixed vector sizes (2, 3 & 4)Vertex Attributes16x4 registersVertex Program128 instrsVertex Results15x4 registersUniformParameters(don’t change on each vertex)96x4 registersTempRegisters12x4 registersVectorFloating PtDatapath2/26/0714Recent GPUs (GeForce 6)■ 512 total instructions64K executed per primitive■ Independent execution (MIMD)■ Branching■ Subroutine calls■ Flexible per-vertex processing■ General purpose vector registers4-element (x,y,z,w)■ Vertex Cache■
View Full Document