DOC PREVIEW
UW-Madison ECE 734 - ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU Vikrant Soman Department of Electrical Engineering University of Wisconsin, Madison, Wisconsin, USA Abstract The Spherical Harmonic Transform is a critical computational kernel of the dynamics algorithms for numerical weather prediction and climate modeling. As atmospheric models push towards higher resolutions it has become necessary to accelerate this computationally intensive transform. Previous work has made attempts to parallelize and optimize the transform [1] [2] [3] [4], but none have exploited the advantages of the NVIDIA’s General Purpose Graphics Processor Unit (GPGPU), a very recent SIMD type architecture. This paper describes a CPU-GPU type implementation for computation of Spherical Harmonic Transform. The implementation shows gain in terms of computation time and a low error rate, when compared to the implementation discussed in [1]. Keywords: Spherical Harmonic Transform, GPU, Parallel Computing 1 Introduction The governing equations for global spectral weather model are derived from the conservation laws of mass, momentum, and energy. Vorticity, divergence, temperature, surface pressure and moisture equations are the main constituents of it. Expansion of the global field is done using spherical harmonics. Thus, spherical harmonic transform is a critical computational kernel of the dynamics algorithms for numerical weather prediction and climate modeling. The spherical harmonic transform is used to project grid point data on the sphere onto the spectral modes in an analysis step and an inverse transform reconstructs grid point data from the spectral information in a synthesis step. As atmospheric models push towards higher resolutions it has become necessary to accelerate this computationally intensive transform. This project aims to take a step in this direction by exploiting the recent technology of NVIDIA’s Graphics Processors Unit (GPU) which is a SIMD type architecture. The rest of the paper is organized as follows. Section 2 covers prior work in this area, Section 3 describes the Spherical Harmonic Transforms highlighting the data intensive parts in it and also describes the architecture of NVIDIA GPGPU. Section 4 gives an overview of the CPU-GPU implementation. Section 5 provides the results of the implementation Section 6 concludes the paper, Section 7 provides direction for future work and Section 8 provides references. Appendix A, B, C has codes for various kernels implemented as part of this work. 2 Prior Work and Motivation The task of parallelizing the computation of spherical harmonic transforms is not a new art. In fact climate and weather modelers were one of the first users of parallel computers. The calculation of the spherical harmonic transform, is a two step process and methods mentioned in [2] and [4] take advantage of this and parallelize each of the steps. Other methods are on the lines of approximating the original transform equations in a Fourier basis since FFT implementations are computationally quicker. The algorithm explained in [1] gives a matrix implementation of the spherical harmonic transform which essentially makes the transform vectorizable. This is of interest in this context considering that the NVIDIA GPGPU provides some tremendous acceleration in terms of time for algorithms involving the BLAS operations.This provides the motivation for this work where a combined CPU-GPU implementation of the vectorizable algorithm in [1] is expected to give acceleration in terms of computation time and show gains over a conventional CPU implementation. 3 A. Spherical Harmonic Transforms Spherical Harmonic Transforms (SHTs) are essentially Fourier transforms on the sphere. For flows in a global domain, the prefered basis set for approximation of functions on the sphere is the spherical harmonic basis. The spherical harmonic transform is used to project grid point data on the sphere onto the spectral modes in an analysis step and an inverse transform reconstructs grid point data from the spectral information in a synthesis step. The synthesis step is described in Equation (1). The analysis step is described by Equations (2) and (3) consisting of the computation of the Fourier coefficient ξm and the Legendre transform that incorporates the Gaussian weights corresponding to the Gaussian latitudes μj = sin(θ j). ………. (1) ………… (2)……….. (3) Legendre functions can be shown as the angular solutions of the Laplacian equation in spherical coordinates. In geodesic and other geo-potential applications the Legendre functions can be represented as For a Gaussian grid the triangular spectral truncation requires the number of longitudes I ≥ 3M + 1, and number of latitudes J = I/2, where M refers to the modal truncation number. The code implemented in [1] which we accelerate using this truncation although it is easy to extend it to different truncations. The Analysis step algorithm summarized is as below. 1. Compute Fourier coefficients using the direct Fourier transform. 2. Compute spectral coefficients by direct Legendre's transform. 3. Perform calculations in spectral domain. The Synthesis step is summarized below. 1. Input spectral coefficients. 2. Compute Fourier coefficients using inverse Legendre's transform 3. Compute Gaussian grid point values using the Inverse Fourier Transforms. The fig 1 shows both the steps in form of a flowchart. Figure 1. Flowchart for analysis and synthesis B. NVIDIA® GPGPU Architecture and CUDA The sudden surge of popularity of the GPGPU for algorithm computation within the engineering fraternity has been due to the fact that GPU is specialized for compute-intensive, highly parallel computation – exactly what graphics rendering is about – and therefore designed such that more transistors are devoted to data processing rather than data caching and flow control, as schematically illustrated by Figure 2. Figure 2. CPU vs GPU. More specifically, the GPU is especially well-suited to address problems that can be expressed as data-parallel computations – the same program is executed on many data elements in parallel – with high arithmetic intensity – the ratio of arithmetic operations to memory operations. Because the same program is executed for each data element, there is a lower requirement for sophisticated flow control; and because it is executed on many


View Full Document

UW-Madison ECE 734 - ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU

Documents in this Course
Load more
Download ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ACCELERATING SPHERICAL HARMONIC TRANSFORMS ON THE NVIDIA® GPU 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?