DOC PREVIEW
The GPU market

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The COA-415 GuGPuThe GPU marketThe GPGPUHistory of the Graphics Processing UnitOverview of DirectX and OpenGLThe COA-415 GPGPUIntel or Amd?CUDAIs this a real car?Processing Flow on CUDAStepsWhy CUDA?CUDA GPU vs. CPUCommon OperationsThe Typical GPU vs. the GPGPUThe Typical GPU vs. the GPGPU (cont.)Example ProgramExample chipsetThe Even Cooler NVIDIA Tesla C2050Real World ApplicationsReal World Applications (cont.)THE COA-415 GUGPUGeneral purpose graphics processing unitThe GPU marketThe integrated GPU90% of the GPU market is integrated graphics1Unless you manufacture motherboards there's no way inThe dedicated GPUThe Two main players are NVidia and ATIBoth companies are pursuing GPGPU technology1. http://www.anandtech.com/show/2339/23The GPGPUGeneral Purpose Graphics Processing UnitGPU’s excel in floating point processing and parallel computing.Its drawbacks are large power consumption and the fact that its abilities are limited by the latencies in data transfer between the GPGPU and the CPU.But how did the GPU evolve into the GPGPU?History of the Graphics Processing UnitModern graphics processors are a relatively new technology, starting out as VGA controllers as recently as 15 years ago. The catalyst of graphics processing technology was the advent of new and better semiconductors that allowed more operations to be preformed.APIs like OpenGL and Microsoft's DirectX began being developed that let programers write code that could be executed on the GPUOverview of DirectX and OpenGLOpenGLOpenGL was developed so programmers could write one set of code that could be executed on a wide range of graphics cards.Originally for windows 95.Developed by Silicon Graphics and Maintained by the Kronos GroupDirect XDeveloped by Microsoft.Made to be more hardware specific.Also developed originally for windows 95.The COA-415 GPGPUIn order to be effective as a general purpose graphics processing unit our device must do two thingsFirst it must communicate with the CPU and be able to transfer data and instructions back and forth. It must also be able to perform complex single and double precision floating point operations using parallel processing.This way the GPGPU will be able to quicly execute floating point operations that would take a long amount of time to execute on the CPU.AMD CPUIntel or Amd?For the COA-415 GPGPU we have decided to use the Intel architecture since it lends itself more to sharing CPU memory with the GPUSouth BridgeIntel CPUNorth BridgeGPUGPUChipsetNorth BridgeCPU coreDDR2GPUMemoryDDR2GPUMemoryCUDAThe COA-415 GPGPU will implement the CUDA standard.Compute Unified Device Architecture Developed by NVIDIAParallel Computing architectureUsed to bypass the Graphics APIs that are normally used on the GPU to execute C and C++ code on the GPUThe SDK currently supports wrappers for Python, Java, FORTRAN and MATLAB.Is this a real car?http://www.randomcontrol.com/theatre?movie=images/randomcontrol/gallery/arion/02.%20video/03.movProcessing Flow on CUDASteps1. Transfers processing data from main memory to the GPU’s memory by making a copy.2. The instruction is sent from the CPU to the GPU3. The process is execute in parallel on all the cores of the GPU4. The processed data is copied back to main memory for the CPU to use again.Why CUDA?CUDA allows stream processing that is not restricted to graphics.CUDA can handle thread numbers in the thousandsCUDA implements shared memory between threadsUnlike a CPU which implements anywhere between 1 and 16 cores a GPU can support hundreds of cores and process data in parallel between them.CUDA GPU vs. CPUCommon OperationsMost of the operations that a GPU handles are with floating point numbers.IEEE 754 standard 32 bit single and 64 bit double precision32bit -64bit -IEE 754 operationsArithmetic- add, subtract, multiply, divide, square root etc.Load and Store operationsComparison operationsRounding31 = sign bit63 = sign bit 52-0 = mantissa23-0 = mantissa30-23 = exponent62-52 = exponentThe Typical GPU vs. the GPGPUThe typical GPU doesn’t fetch instructions, rather it is told to do an operation.The typical graphics pipeline:Which are fixed function and which are programable?InputVertex ShaderPixel ShaderRasterizerGeometry ShaderOutputThe Typical GPU vs. the GPGPU (cont.)Unlike the traditional GPU which processes instructions in a sequence, the GPGPU takes in a matrix like data structure of information and maps it across blocks of cores within the GPU’s architecture.This breaks up a larger problem into many small problems which can be computed in parallel.Example ProgramcudaArray* cu_array;texture<float, 2> tex; // Allocate arraycudaChannelFormatDesc description = cudaCreateChannelDesc<float>();cudaMallocArray(&cu_array, &description, width, height); // Copy image data to arraycudaMemcpy(cu_array, image, width*height*sizeof(float), cudaMemcpyHostToDevice); // Bind the array to the texturecudaBindTextureToArray(tex, cu_array); // Run kerneldim3 blockDim(16, 16, 1);dim3 gridDim(width / blockDim.x, height / blockDim.y, 1);kernel<<< gridDim, blockDim, 0 >>>(d_odata, height, width);cudaUnbindTexture(tex); __global__ void kernel(float* odata, int height, int width){ unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y; float c = tex2D(tex, x, y); odata[y*width+x] = c;}http://en.wikipedia.org/wiki/CUDAExample chipsetThe NVIDIA GTX-480GPU Engine Specs:Memory Specs:http://www.nvidia.com/object/product_geforce_gtx_480_us.htmlCUDA Cores 480Graphics Clock (MHz) 700WMHzProcessor Clock (MHz) 1401WMHzTexture Fill Rate (billion/sec)42WMemory Clock (MHz) 1848!Standard Memory Config 1536WMBWGDDR5Memory Interface Width 384-bitMemory Bandwidth (GB/sec)177.4The Even Cooler NVIDIA Tesla C2050 Form Factor 9.75" PCIe x16 form factor# of Tesla GPUs 1# of CUDA Core 448Frequency of CUDA Cores 1.15 GHzDouble Precision floating point performance (peak) 515 GflopsSingle Precision floating point performance (peak) 1.03 TflopsTotal Dedicated Memory Tesla C2050, Tesla C2070 3GB GDDR5, 6GB GDDR5Memory Speed 1.5 GHzMemory Interface 384-bitMemory Bandwidth 144 GB/secPower Consumption 247W TDPSystem Interface PCIe x16 Gen2Thermal Solution Active FansinkReal World Applications The obvious application of the GPGPU is processor intensive 3D


The GPU market

Download The GPU market
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The GPU market and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The GPU market 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?