SBU CSE 591 - Case study - Iterative MRI Reconstruction

Unformatted text preview:

CSE 591: GPU ProgrammingCase Study: Iterative MRI ReconstructionKl M llKlaus MuellerComputer Science DepartmentStony Brook UniversityECE498ALECE498ALProgramming Massively Parallel Processorsgg yLt 1617 A li ti C St dLecture16-17: Application Case Study –Quantitative MRI Reconstruction© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign2ObjectiveObjective• To learn about computational thinking skills through a pggconcrete example– Problem formulation– Designing implementations to steer around limitations– Validating resultsUd di h i f i–Understanding the impact of your improvements• A top to bottom experience!© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign3AcknowledgementsAcknowledgementsSam S. Stone§, Haoran Yi§, Justin P. Haldar†, Deepthi Nandakumar, Bradley P. Sutton†, Zhi-Pei Liang†, Keith Thulburin*§C t f R li bl d†Bk Itittf§Center for Reliable and High-Performance Computing†Beckman Institute forAdvanced Science and TechnologyDepartment of Electrical and Computer EngineeringUniversity of Illinois at Urbana-Champaign© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign4* University of Illinois, Chicago Medical CenterOverviewOverview•Magnetic resonance imagingggg• Non-Cartesian Scanner Trajectory•Least-squares (LS) reconstruction algorithmLeastsquares (LS) reconstruction algorithm• Optimizing the LS reconstruction on the G80–Overcoming bottlenecksg– Performance tuning• Summaryy© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign5Reconstructing MR ImagesReconstructing MR ImagesCartesian Scan Data Spiral Scan DataGriddingGriddingFFT LSCartesian scan data + FFT: Slow scan, fast reconstruction, images may be poor© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign62Reconstructing MR ImagesReconstructing MR ImagesCartesian Scan Data Spiral Scan DataGridding1Gridding1FFT LSSpiral scan data + Gridding + FFT: Fast scan, fast reconstruction, better images© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign721 Based on Fig 1 of Lustig et al, Fast Spiral Fourier Transform for Iterative MR Image Reconstruction, IEEE Int’l Symp. on Biomedical Imaging, 2004Reconstructing MR ImagesReconstructing MR ImagesCartesian Scan Data Spiral Scan DataGriddingGriddingFFT Least-Squares (LS)Spiral scan data + LSSuperior images at expense of significantly more computation© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign82An Exciting Revolution - Sodium Map of th B ithe Brain• Images of sodium in the brain– Very large number of samples for increased SNR– Requires high-quality reconstruction• Enables study of brain-cell viability before anatomic changes occur in stroke and cancer treatment – within days!© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign9Courtesy of Keith Thulborn and Ian Atkinson, Center for MR Research, University of Illinois at ChicagoLeast-Squares ReconstructionLeastSquares ReconstructiondFFFHH=ρCompute Q for FHFAcquire Data• Q depends only on scanner configurationFHdd d dAcquire DataH•FHd depends on scan data• ρ found using linear solverHCompute FHd• Accelerate Q and FHd on G80– Q: 1-2 days on CPUFHd67h CPUFind ρ–FHd: 6-7 hours on CPU– ρ: 1.5 minutes on CPU© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign10for (m = 0; m < M; m++) {phiMag[m] = rPhi[m]*rPhi[m] +iPhi[m]*iPhi[m];for (m = 0; m < M; m++) {rMu[m] = rPhi[m]*rD[m] +iPhi[m]*iD[m];iPhi[m]*iPhi[m];for (n = 0; n < N; n++) {expQ = 2*PI*(kx[m]*x[n] +iPhi[m]*iD[m];iMu[m] = rPhi[m]*iD[m] –iPhi[m]*rD[m];ky[m]*y[n] +kz[m]*z[n]);rQ[n] +=phiMag[m]*cos(expQ);for (n = 0; n < N; n++) {expFhD = 2*PI*(kx[m]*x[n] +ky[m]*y[n] +kz[m]*z[n]);rQ[n] +=phiMag[m]*cos(expQ);iQ[n] +=phiMag[m]*sin(expQ);}}kz[m]*z[n]);cArg = cos(expFhD);sArg = sin(expFhD);(a) Q computationrFhD[n] += rMu[m]*cArg –iMu[m]*sArg;iFhD[n] +=iMu[m]*cArg +iFhD[n] + iMu[m] cArg +rMu[m]*sArg;}}(b) FHd computationQ v.s. FHD© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign11Algorithms to Acceleratefor (m = 0; m < M; m++) {rMu[m] = rPhi[m]*rD[m] +iPhi[m]*iD[m]• Scan data– M = # scan pointskx ky kz = 3D scan dataiPhi[m]*iD[m];iMu[m] = rPhi[m]*iD[m] –iPhi[m]*rD[m];–kx, ky, kz = 3D scan data• Pixel data– N = # pixelsfor (n = 0; n < N; n++) {expFhD = 2*PI*(kx[m]*x[n] +ky[m]*y[n] +kz[m]*z[n]);–x, y, z = input 3D pixel data–rFhD, iFhD= output kz[m]*z[n]);cArg = cos(expFhD);sArg = sin(expFhD);ppixel data• Complexity is O(MN)•Inner looprFhD[n] += rMu[m]*cArg –iMu[m]*sArg;iFhD[n] += iMu[m]*cArg +rMu[m]*sArg;•Inner loop– 13 FP MUL or ADD ops– 2 FP trig ops© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign12rMu[m] sArg;}}–12 loads, 2 storesFrom C to CUDA: Step 1What unit of work is assigned to each thread?What unit of work is assigned to each thread?for (m = 0; m < M; m++) {M [ ] Phi[ ]* D[ ] iPhi[ ]*iD[ ]rMu[m] = rPhi[m]*rD[m] + iPhi[m]*iD[m];iMu[m] = rPhi[m]*iD[m] – iPhi[m]*rD[m];for (n = 0; n < N; n++) {(; ;){expFhD = 2*PI*(kx[m]*x[n] + ky[m]*y[n] + kz[m]*z[n]);cArg = cos(expFhD);sArg = sin(expFhD);rFhD[n] += rMu[m]*cArg – iMu[m]*sArg;iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;}}© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign13One Possibility__global__ void cmpFHd(float* rPhi, iPhi, phiMag, kx, ky, kz, x, y, z, rMu, iMu, int N) {One Possibilityint m = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;rMu[m] = rPhi[m]*rD[m] + iPhi[m]*iD[m];rMu[m] = rPhi[m]*rD[m] + iPhi[m]*iD[m];iMu[m] = rPhi[m]*iD[m] – iPhi[m]*rD[m];for (n = 0; n < N; n++) {expFhD = 2*PI*(kx[m]*x[n] + ky[m]*y[n] + kz[m]*z[n]);cArg = cos(expFhD); sArg = sin(expFhD);This code does notrFhD[n] += rMu[m]*cArg – iMu[m]*sArg;iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;}}This code does not work correctly! The accumulation needs to use© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009University of Illinois, Urbana-Champaign14}needs to use atomic operation.One Possibility-Improved__global__ void cmpFHd(float* rPhi, iPhi, rD, iD, kx, ky, kz, x, y, z, rMu, iMu, int N) {One Possibility Improvedint m = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;float rMu_reg,


View Full Document
Download Case study - Iterative MRI Reconstruction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Case study - Iterative MRI Reconstruction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Case study - Iterative MRI Reconstruction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?