DOC PREVIEW
UW-Madison ME 964 - Parallel Computing with MATLAB

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Parallel Computing with MATLABNarfi StefanssonParallel Computing Development ManagerMathWorks2Agenda Products and terminology GPU capabilities Multi-process capabilities How are customers using this?3User’s DesktopParallel ComputingToolboxCompute ClusterMATLAB DistributedComputing ServerMATLAB WorkersParallel Computing with MATLAB on CPU4Single processorMulticore Multiprocessor ClusterGrid,CloudGPUEvolving With Technology Changes5Why GPUs and why now? Double support– Single/double performance inline with expectations Operations are IEEE Compliant Cross-platform support now available6What came in R2010b? Parallel Computing Toolbox– GPU support– Broader distributed array algorithm support (QR, rectangular \) MATLAB Distributed Computing Server– GPU support– Run as user with MathWorks job manager– Non-shared file system support Simulink®– Real-Time Workshop®support with PCT and MDCS7What came in R2011a? Parallel Computing Toolbox– Deployment of local workers– More GPU support– More distributed array algorithm support MATLAB Distributed Computing Server– Enhanced support for Microsoft HPC Server– More GPU support– Remote service start in Admin Center8GPU Support Call GPU(s) from MATLAB or toolbox/server worker Support for CUDA 1.3 enabled devices and up9Programming Parallel ApplicationsLevel of controlMinimalSomeExtensiveRequired effortNoneStraightforwardInvolved10Summary of Options for Targeting GPUsLevel of controlMinimalSomeExtensiveParallel OptionsUse GPU arrays with MATLAB built-in functionsExecute custom functions on elements of the GPU arrayCreate kernels from existing CUDA code and PTX files11GPU Array Functionality Array data stored in GPU device memory Algorithm support for over 100 functions Integer, single, double, real and complex support12Example:GPU Arrays>> A = someArray(1000, 1000);>> G = gpuArray(A); % Push to GPU memory…>> F = fft(G); >> x = G\b; …>> z = gather(x); % Bring back into MATLAB13GPUArray Function Support >100 functions supported– fft, fft2, ifft, ifft2– Matrix multiplication (A*B)– Matrix left division (A\b)– LU factorization– ‘ .’– abs, acos, …, minus, …, plus, …, sin, …– conv, conv2, filter– indexing14GPU Array benchmarks* Results in Gflops, matrix size 8192x8192. Limited by card memory. Computational capabilities not saturated.A\b*TeslaC1060TeslaC2050 (Fermi)Quad-core Intel CPURatio (Fermi:CPU)Single191250485:1Double63.1128255:1Ratio3:12:12:115GPU Array benchmarksMTIMESTeslaC1060TeslaC2050 (Fermi)Quad-core Intel CPURatio (Fermi:CPU)Single365409597:1Double75175296:1Ratio4.8:12.3:12:1FFTTeslaC1060TeslaC2050 (Fermi)Quad-core Intel CPURatio (Fermi:CPU)Single50992.2943:1Double22.5441.4730:1Ratio2.2:12.2:11.5:116Example: arrayfun: Element-Wise Operations>> y = arrayfun(@foo, x); % Execute on GPUfunction y = foo(x)y = 1 + x.*(1 + x.*(1 + x.*(1 + ...x.*(1 + x.*(1 + x.*(1 + x.*(1 + ...x.*(1 + x./9)./8)./7)./6)./5)./4)./3)./2);17Some arrayfun benchmarksCPU[4] = multhithreading enabledCPU[1] = multhithreading disabledNote: Due to memory constraints, a different approach is used at N=15 and above.18Example:Invoking CUDA Kernels% Setupkern = parallel.gpu.CUDAKernel(‘myKern.ptx’, cFcnSig)% Configurekern.ThreadBlockSize=[512 1];kern.GridSize=[1024 1024];% Run[c, d] = feval(kern, a, b);19Example: Corner Detection on the CPUdx = cdata(2:end-1,3:end) - cdata(2:end-1,1:end-2);dy = cdata(3:end,2:end-1) - cdata(1:end-2,2:end-1);dx2 = dx.*dx;dy2 = dy.*dy;dxy = dx.*dy;gaussHalfWidth = max( 1, ceil( 2*gaussSigma ) );ssq = gaussSigma^2;t = -gaussHalfWidth : gaussHalfWidth;gaussianKernel1D = exp(-(t.*t)/(2*ssq))/(2*pi*ssq); % The Gaussian 1D filtergaussianKernel1D = gaussianKernel1D / sum(gaussianKernel1D);smooth_dx2 = conv2( gaussianKernel1D, gaussianKernel1D, dx2, 'valid' );smooth_dy2 = conv2( gaussianKernel1D, gaussianKernel1D, dy2, 'valid' );smooth_dxy = conv2( gaussianKernel1D, gaussianKernel1D, dxy, 'valid' );det = smooth_dx2 .* smooth_dy2 - smooth_dxy .* smooth_dxy;trace = smooth_dx2 + smooth_dy2;score = det - 0.25*edgePhobia*(trace.*trace);1. Calculate derivatives2. Smooth using convolution3. Calculate score20Example: Corner Detection on the GPUcdata = gpuArray( cdata );dx = cdata(2:end-1,3:end) - cdata(2:end-1,1:end-2);dy = cdata(3:end,2:end-1) - cdata(1:end-2,2:end-1);dx2 = dx.*dx;dy2 = dy.*dy;dxy = dx.*dy;gaussHalfWidth = max( 1, ceil( 2*gaussSigma ) );ssq = gaussSigma^2;t = -gaussHalfWidth : gaussHalfWidth;gaussianKernel1D = exp(-(t.*t)/(2*ssq))/(2*pi*ssq); % The Gaussian 1D filtergaussianKernel1D = gaussianKernel1D / sum(gaussianKernel1D);smooth_dx2 = conv2( gaussianKernel1D, gaussianKernel1D, dx2, 'valid' );smooth_dy2 = conv2( gaussianKernel1D, gaussianKernel1D, dy2, 'valid' );smooth_dxy = conv2( gaussianKernel1D, gaussianKernel1D, dxy, 'valid' );det = smooth_dx2 .* smooth_dy2 - smooth_dxy .* smooth_dxy;trace = smooth_dx2 + smooth_dy2;score = det - 0.25*edgePhobia*(trace.*trace);score = gather( score );0. Move data to GPU4. Bring data back21arrayfunCan execute entire scalar programs on the GPU(while, if, for, break, &, &&, …)function [logCount,t] = mandelbrotElem( x0, y0, r2, maxIter)% Evaluate the Mandelbrot function for a single elementz0 = complex( x0, y0 );z = z0;count = 0;while count <= maxIter && (z*conj(z) <= r2)z = z*z + z0;count = count + 1;end% . . . Etc. . . .22Summary of Options for Targeting GPUsLevel of controlMinimalSomeExtensiveParallel OptionsUse GPU arrays with MATLAB built-in functionsExecute custom functions on elements of the GPU arrayCreate kernels from existing CUDA code and PTX files23Parallel Computing enables you to …Larger Compute PoolLarger Memory Pool11 26 4112 27 4213 28 4314 29 4415 30 4516 31 4617 32 4717 33 4819 34 4920 35 5021 36 5122 37 52Speed up Computations Work with Large Data24Programming Parallel ApplicationsLevel of controlMinimalSomeExtensiveParallel OptionsLow-LevelProgramming Constructs:(e.g. Jobs/Tasks, MPI-based)High-LevelProgramming Constructs:(e.g. parfor, batch, distributed)Support built into Toolboxes25WorkerWorkerWorkerWorkerWorkerWorkerWorkerWorkerTOOLBOXESBLOCKSETSParallel Computing with MATLAB on CPU26Parallel Support in Optimization Toolbox  Functions: – fmincon Finds a constrained minimum of a function of several variables– fminimax Finds a minimax solution of a function of several variables– fgoalattain Solves the multiobjective goal attainment


View Full Document

UW-Madison ME 964 - Parallel Computing with MATLAB

Documents in this Course
Load more
Download Parallel Computing with MATLAB
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Computing with MATLAB and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Computing with MATLAB 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?