Unformatted text preview:

ECE 498AL Lecture 19: Performance Case Studies: Ion Placement Tool, VMD Guest Lecture by John Stone Theoretical and Computational Biophysics Group NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and TechnologyOutlineMolecular Modeling: Ion PlacementOverview of Ion Placement ProcessOverview of Direct Coulomb Summation (DCS) AlgorithmDirect Coulomb Summation (DCS) Algorithm DetailDCS Computational ConsiderationsSingle Slice DCS: Simple C Version (Slow, even for the CPU!)DCS Algorithm Design ObservationsAn Approach to Writing CUDA KernelsDCS Observations for GPU ImplementationCUDA DCS Implementation OverviewDCS CUDA Block/Grid Decomposition (non-unrolled)DCS CUDA Block/Grid Decomposition (non-unrolled)DCS Version 1: Const+Precalc 187 GFLOPS, 18.6 Billion Atom Evals/SecDCS Version 1: Kernel StructureDCS CUDA Block/Grid Decomposition (unrolled)DCS CUDA Algorithm: Unrolling LoopsDCS CUDA Block/Grid Decomposition (unrolled)DCS Version 2: Const+Precalc+Loop Unrolling 259 GFLOPS, 33.4 Billion Atom Evals/SecDCS Version 2: Inner LoopDCS Version 3: Const+Shared+Loop Unrolling+Precalc 268 GFLOPS, 36.4 Billion Atom Evals/SecDCS Version 3: Kernel StructureDCS Version 4: Const+Loop Unrolling+Coalescing 291.5 GFLOPS, 39.5 Billion Atom Evals/SecDCS Version 4: Kernel StructureDCS CUDA Block/Grid Decomposition (unrolled, coalesced)Multi-GPU DCS Potential Map CalculationMulti-GPU DCS Algorithm:Multi-GPU DCS PerformanceMulti-GPU DCS Performance: Initial Ion Placement Lattice CalculationMulti-GPU DCS Performance: Time-averaged Electrostatics CalculationExperiences Integrating CUDA Kernels Into VMDVMD/CUDA Integration ObservationsVMD/CUDA Integration Observations (2)VMD/CUDA Integration Observations (3)VMD/CUDA Code OrganizationSummaryQuestions?© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign1ECE 498ALLecture 19: Performance Case Studies: Ion Placement Tool, VMDGuest Lecture by John StoneTheoretical and Computational Biophysics GroupNIH Resource for Macromolecular Modeling and BioinformaticsBeckman Institute for Advanced Science and Technology© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign2Outline•Explore CUDA versions of the direct Coulomb summation (DCS) algorithm–Used for ion placement and time-averaged electrostatic potential calculations–Some thoughts on how to approach writing CUDA kernels–Detailed look at a few CUDA implementations of DCS–Multi-GPU DCS potential map calculation•Experiences integrating CUDA kernels into VMD© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign3Molecular Modeling: Ion Placement•Biomolecular simulations attempt to replicate in vivo conditions in silico•Model structures are initially constructed in vacuum•Solvent (water) and ions are added as necessary to reproduce the required biological conditions•Computational requirements scale with the size of the simulated structure© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign4Overview of Ion Placement Process•Calculate initial electrostatic potential map around the simulated structure considering the contributions of all atoms•Ions are then placed one at a time:–Find the voxel containing the minimum potential value–Add a new ion atom at location of minimum potential–Add the potential contribution of the newly placed ion to the entire map–Repeat until the required number of ions have been added© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign5Overview of Direct Coulomb Summation (DCS) Algorithm•One of several ways to compute the electrostatic potentials on a grid, ideally suited for the GPU•Approximation-based methods such as multilevel summation can achieve much higher performance at the cost of some numerical accuracy and flexibility•We’ll only discuss DCS for computing electrostatic maps:–conceptually simple algorithm well suited to the GPU–easy to fully explore–requires very little background knowledge, unlike other methods•DCS: for each lattice point, sum potential contributions for all atoms in the simulated structure: potential += charge[i] / (distance to atom[i])© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign6Direct Coulomb Summation (DCS) Algorithm Detail•At each lattice point, sum potential contributions for all atoms in the simulated structure: potential += charge[i] / (distance to atom[i])Atom[i]Distance to Atom[i]Lattice point being evaluated© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign7DCS Computational Considerations•Suitability of direct Coulomb summation (DCS) for ion placement: –Highly data parallel–Single-precision FP arithmetic is adequate–Numerical accuracy can be further improved by compensated summation, spatially ordered summation groupings, etc…•In a CPU-only ion placement implementation, 99% of the run time is consumed in the initial potential map calculation•Interesting test case since potential maps are also useful for both visualizations and analysis•Forms a template for similar spatially evaluated function summation algorithms in CUDA© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign8Single Slice DCS: Simple C Version (Slow, even for the CPU!)void cenergy(float *energygrid, dim3 grid, float gridspacing, float z, const float *atoms, int numatoms) { int i,j,n; int atomarrdim = numatoms * 4; for (j=0; j<grid.y; j++) { float y = gridspacing * (float) j; for (i=0; i<grid.x; i++) { float x = gridspacing * (float) i; float energy = 0.0f; for (n=0; n<atomarrdim; n+=4) { // calculate potential contribution of each atom float dx = x - atoms[n ]; float dy = y - atoms[n+1]; float dz = z - atoms[n+2]; energy += atoms[n+3] / sqrtf(dx*dx + dy*dy + dz*dz); } energygrid[grid.x*grid.y*k + grid.x*j + i] = energy; } }}© John E. Stone, 2007ECE 498AL, University of Illinois, Urbana-Champaign9DCS Algorithm Design Observations•Ion placement maps require evaluation of ~20 potential lattice points per atom for a typical biological structure•Atom list has the smallest memory footprint, best choice for the inner loop (both CPU and GPU)•Lattice point coordinates are computed on-the-fly•Atom coordinates are made relative to the origin of the potential map, eliminating redundant arithmetic•Arithmetic can be significantly


View Full Document

U of I ECE 498 - Lecture 19

Documents in this Course
Load more
Download Lecture 19
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 19 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 19 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?