Unformatted text preview:

Image processing on GPUsRahul NarainCOMP790-058: GPGPMarch 7, 20072Image processing• Image = 2D array of color values (1D or 3D)• Most image processing algorithms are inherently parallelDo “the same thing” for every pixel• Memory intensive with coherent lookups3Image processing2D imagePer-pixel operationsMemory intensiveAccuracy is not critical2D textureFragment programFast texture lookupGood!Image processing maps well to GPUs4Image processing on GPUsScreen-aligned quadof output image sizeInput imagebound to textureVertices Pixels processedusing fragment shaderRead input pixelsvia texture lookupOutput in textureor framebufferCPU5Topics• Color correction• Convolution• Wavelet transforms• Anisotropic diffusion and depth of field• HDR and tone mapping6Color correction• Brightness/contrast, hue/saturation, gamma, thresholding, Levels and Curves, …7Color correction• Process each pixel independentlyt : ℝ3→ ℝ3• Usually process each channel independentlytR, tG, tB: ℝ → ℝ• Pass three lookup tables as a 1D RGB texturegR[x,y] = tR[fR[x,y]]8Convolutiong[x,y] = ∑ f[x+i,y+j] h[i,j]• Pass kernel h and sampling coordinates [i,j] as uniform data arrays• Requires N or N2texture lookups per pixelUsed to be a problem on old graphics cardsEXT_convolution is only supported by SGI9ConvolutionConvolution with limited texture lookups:1. Clear output buffer2. For each pass:1. In vertex program, generate k texture coordinates corresponding to adjacent pixels2. In fragment program, compute partial sum of kterms and add to output bufferRequires N/k passes10Convolution• Now only limited by fragment program instruction length• All texture lookups access nearby pixelsVery fast due to cache coherence11Convolution• Fialka and Čadík: NVIDIA GeForce 6600• GPU outperforms CPU in all cases12Convolution• 3D convolution for volume data• Current GPUs don’t allow high-precision 3D texturesLoad slices into several 2D textures instead• Multiple passes to loop over slices• Only 16 textures can be bound at a timeUse multi-pass algorithm if kernel is wider in z13Non-linear filtering• Median filterg[x,y] = median { f[x+i,y+j] }• Can be done naïvely for smallish filter sizesKnown fast algorithms are not parallelizable• Even then, naïve GPU is faster than fast CPU• Viola et al: 1.17× speedup on 5×5×5 volume filter using NVIDIA GeForce FX 580014Non-linear filtering• Bilateral filterg[x] = k−1∑ f[x′] hs[x′−x] hr[f[x′]−f[x]]k = ∑ hs[x′−x] hr[f[x′]−f[x]]• Naïve approach: 1.52× speedup [Viola et al]• Paris and Durand’s fast approximation [2006] should be parallelizable on GPU15Wavelet transforms• Multi-resolution decomposition of a signal• Basis functions are localized in both position and frequency16Wavelet transformsf cjcj−1cj−2cj−3dj−1dj−2dj−3……∗h ↓2 ∗h ↓2 ∗h ↓2 ∗h ↓2∗g ↓2 ∗g ↓2 ∗g ↓2 ∗g ↓2fcjcj−1cj−2cj−3…dj−1dj−2dj−3…↑2 ∗h ↑2 ∗h ↑2 ∗h ↑2 ∗h↑2 ∗g ↑2 ∗g ↑2 ∗g ↑2 ∗gDecompositionReconstruction17Wavelet transforms• All wavelet coefficients stored in a textureTwo for ping-pong• Each pass reads/writesa subset of the texture• Convolutions areseparable18Wavelet transforms• Forward DWT:cj−1[n] = ∑ h[k] cj[2n−k], dj−1[n] = ∑ g[k] cj[2n+1−k]zzzzj−1= [ccccj−1ddddj−1]• Boundary extension using indirection texture19Wavelet transforms• Inverse DWT:cj[n] = ∑ h[k] c′j−1[(n−k)/2] + ∑ g[k] d′j−1[(n−k)/2]• Two cases depending on whether n is even• Avoid conditionals using precomputedindirection texture20Wavelet transforms21Wavelet transforms• Wong et al: NVIDIA GeForce 7800 GTX• Performance gain over CPU for large images22Diffusion• Diffuse intensities over image at varying rates• Anisotropic diffusionlow diffusion at edges• Depth of fieldradius of confusion23Diffusionu′ = ∇·(g ∇u)• Discretize differential equation over pixel gridFinite differences in spaceImplicit 1st-order Euler in time• Solve linear system of equations per iterationAAAAk(uuuuk) uuuuk+1= rrrrk(uuuuk)24Diffusion• AAAA is sparse, banded with known structure• Don’t want to represent whole matrix in memory• Structure of AAAA allows simplification25DiffusionRumpf and Strzodka [2001]:• Use Jacobi or conjugate gradient iterationse.g. xxxxi+1= F(xxxxi) = DDDD−1(rrrr − (A A A A − DDDD)xxxxi)• Corresponds directly to image blending• Can be implemented directly in OpenGL!• NVIDIA GeForce 3: 8ms per iteration on 256×256 image26Diffusion1. Upload original image uuuu0to texture2. For each timestep k:1. Initialize r.h.s. rrrrk(usually equals uuuuk)2. (If necessary) calculate image of diffusion coefficients ggggkusing lookup table3. Initialize xxxx0= rrrrk4. For each iteration i:Calculate xxxxi+1= F(xxxxi) using image blending5. Store the solution uuuuk+1= xxxxi+127DiffusionKass et al [2005]:• Approximate by two 1D diffusions instead• n linear systems for n rows, tridiagonal AAAA’s• Represent AAAA’s using 3 channels of each row of 2D texture• Solve in parallel using cyclic reduction• NVIDIA GeForce 7800: 0.15s for 1024×102428Diffusion1. Gaussian elimination on odd rows in parallel2. Copy smaller system of even rows to new texture; solve recursively3. Propagate solution to odd rows29HDR• OpenEXR: half datatype = 16-bit floating point• Identical to native half datatype on GPUs• Floating-point textures allow HDR30Tone mapping• Displaying HDR images on LDR devices• Reduce the dynamic range of an HDR image while “looking the same”• Several techniques• Reinhard et al.’s methodhas been implementedin real-time on the GPU31Tone mapping• Compute log average luminance• Rescale pixel luminances by average• Find local average luminance of each pixelConvolve with Gaussian filters of various widthsCompare to find best scale for each pixel• Apply transfer function based on per-pixel local average luminance32Tone mappingFirst pass• Compute log average luminanceSum over entire image by repeated reductionSeveral passes• Convolve rescaled image with Gaussian filters of various widths and compareAccumulate results for “best” scale in textureFinal pass• Apply transfer function33Tone mapping• Goodnight et al: ATI Radeon 9800• GPU is faster that CPU in all cases34Conclusion• GPUs significantly accelerate image processingPixel-level parallelismHigh memory


View Full Document

UNC-Chapel Hill COMP 790 - Image processing on GPUs

Download Image processing on GPUs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Image processing on GPUs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Image processing on GPUs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?