DOC PREVIEW
CMU CS 15463 - Lecture

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

3D Graphics Hardware15-463 Graphics IISpring 1999TopicsGraphics ArchitectureUniprocessor AccelerationFront-End Multiprocessing– Pipelined– ParallelBack-End Multiprocessing– Pipelined– ParallelGraphics ArchitectureProgramming InterfaceDatabase contains primitives from modelImmediate mode– Program calls library with individual primitives– Need to enmumerate entire database for each frame– No temporal coherence between framesRetained mode– Primitives are remembered between frames– Program calls library to make incremental changes– Can potentially exploit temporal coherenceFrame Buffer OrganizationDedicated bank of RAM for pixel values– Components are rgb, alpha, z, etc.CPU or display processor writes pixelsVideo controller reads pixels– Generates video signal for monitorContention between CPU/DP and video controllerDRAM OrganizationRead or write just a few bits at a timeMultiple chips in parallel for more bitsBits organized internally in a 2D gridRow select and column select identify particularbits– Row select has extra set-up latency– No penalty for column selectPage mode chips exploit thisVideo RAM (VRAM) (1983)Extra bit port for video signal generation– Avoids contention with video controllerBuffer loaded during “stolen” memory cycle– Whole row can be copied in a single cycleMultiple banks needed for big frame buffers– Single chip can’t keep up with video signalUniprocessor AccelerationPeripheral Display ProcessorReads/writes frame buffer in place of CPUAcceleration for 2D operationsRead/write many pixels in parallelBuilt-in support for lines, circles, bitblt, etc.No transformations, floating-point, etc.Now limited only by speed of memoryCPU ExtensionsAdditional instructions in CPU for 3D graphicsPioneered by i860 (1989)– 64-bit pixel operations: linear interpolation, z-buffercompare, conditional update MMX (1995?)– Reinterprets x86 floating-point “registers”– Up to 8 simultaneous integer operations8x8bit, 4x16bit, 2x32bit, 1x64bit(Clamped) arithmetic, logical, pack/unpack3DNow (1998)Reinterprets MMX registersTwo simultaneous floating-point operations– Add, subtract, multiply, min, max, etc.– Reciprocal and reciprocal square root approximationThree instructions implement Newton’s methodOn-chip table lookup for initial approximationTwo successive iterations to refine approximationSuperscalar execution permits 4 operations/cyclePerformance BarriersFloating-point geometry processing– Transformation, clipping, lightingInteger pixel processing– Scan conversionFrame-buffer memory bandwidth– Z-buffer comparison, shading, alpha blendingWhat can we do about these?Front-End MultiprocessingPipeliningExtra processors for distinct tasksPrimitives are processed in discrete stages– Need result of stage i to compute stage i+1Process stage i of primitive j concurrently withstage i+1 of primitive j+1Increase in throughput, (small) increase in latency– Individual latency usually isn’t importantParallelismExtra processors for uniform tasksSIMD– One instruction affects multiple operands– “Disable bit” for conditional operations– Advantage: no extra control logic– Disadvantage: inflexible for non-uniform tasks– Examples: MMX, 3DNowMIMD– Each processor has its own instruction stream– Synchronization/interconnect can be difficultPipeline Front EndMany distinct stages naturally lead to pipeliningCPU usually handles display traversalGeometry subsystem is floating-point intensive– Transformations can be done in parallelBy componentBy vector (vertex or normal)– Transformation bundled with trivial accept/reject– One processor per clip plane– Division by w is expensiveCompute 1/w and multiply x, y, and z by thatGeometry Engine (1982)Used in early SGI machines~One chip per stage of front-end pipeline– Configuration word determines which stageEach chip reads and writes command stream– Vertices inserted and deleted during clippingReplaced by commodity chips in later SGIs– Weitek 3332 and Intel i860Parallel Front EndSignificantly harder than pipelining front end– Need to recombine streams at back end for orderedrendering algorithms– Processor contention over shared databaseRealityEngine processes geometry in parallel– Command processor sends primitives to geometryengines in round robin order– Geometry engine is not pipelined (single i860)– Triangle bus broadcasts transformed triangles to back-end processorsBack-End MultiprocessingTaxonomyObject order: outer loop over objects– Z-buffer, depth-sort, and BSP-tree algorithmsImage order: outer loop over pixelsImage parallel = parallel object order– Parallel inner loop over image pixels– Partitioned image, logic-enhanced memoryObject parallel = parallel image order– Parallel inner loop over objects– Processor-per-primitive, tree-structuredPipelined Object OrderPolygon-edge-span processor pipeline– Polygon unit finds x, z, and rgb deltas for each edge– Edge unit computes left/right x, z, and rgbbounds/deltas for each span– Span unit interpolates z and rgb for each pixelImplements z buffer comparePoor load balance: span processor is bottleneck– Polygons*edges*spans*pixels– Pixel cache to increase memory bandwidthExcellent localityPipelined Image OrderScan-line pipeline– Y sorter finds first scan line of each polygon– Active segment sorter sorts segments on scan line by xA segment is a span of a single polygon– Visible span generator compares z values of segments– Shader computes rgb values of visible spansNo frame buffer if hardware is fast enough (!)Again, poor load balance: shader is bottleneck– Need more than 1 processor/stage: parallel processingParallel Object Order/Image ParallelMultiple processors render pixels in parallel– Predominant back-end architectureContiguous partition– Each processor handles a specific block of pixels– Don’t look at primitives outside my regionInterleaved partition– Each processor handles every nth pixel– Must look at every primitive– Worst case isn’t as bad, best case isn’t as good– Predominant architectureParallel Object Order/Image ParallelSIMD– All processors work on same nxn pixel block– Single control logic for all processors– Many conditional branches (complex algorithms) resultin poor utilizationMIMD– Processors can work on pixels in different blocks– Requires control logic for each processor– Better utilization, but needs fast interconnectRealityEngine


View Full Document

CMU CS 15463 - Lecture

Documents in this Course
Lecture

Lecture

36 pages

Wrap Up

Wrap Up

5 pages

morphing

morphing

16 pages

stereo

stereo

57 pages

mosaic

mosaic

32 pages

faces

faces

33 pages

MatTrans

MatTrans

21 pages

matting

matting

27 pages

matting

matting

27 pages

wrap up

wrap up

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

40 pages

15RANSAC

15RANSAC

54 pages

lecture

lecture

48 pages

Lecture

Lecture

42 pages

Lecture

Lecture

11 pages

Lecture

Lecture

52 pages

Lecture

Lecture

39 pages

stereo

stereo

57 pages

Lecture

Lecture

75 pages

texture

texture

50 pages

Lectures

Lectures

52 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?