DOC PREVIEW
Berkeley COMPSCI C267 - Assignment

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267 Assignment 0Douglas Mason1I. STUDENT BIOGRAPHYA. Research InterestsThis is my second year of graduate study for a PhD in physics, so at my tender stageof academic development my research interests are still wide open. At Berkeley and theNational Lab, I simulate graphene nanoribbons in large ensembles to understand their con-ductive properties. I will complete this project soon and move onto density functionalsimulation and possibly quantum monte carlo methods. I am on exchange for a year fromHarvard University, so that when I return in the fall, I will begin work on developing theprinciples behind quantum monte carlo.B. Course GoalsMy work in physics for the past few years has led me to developing large applicationswhich have always proven to be limited by computational power. My imagination, it turnsout, is just too big for the current state-of-the-art computers! And this is why I came toBerkeley: to absorb the perspectives and principles of its computer science department, insum, to learn what the true limitations are. I am taking this course to learn how I canbreak through some of these barriers through intelligent deployment of parallel computingprinciples. Thus I hope to learn how much work it will take me to actually implement aparallel solution, what the benefits are, and how to intelligently marshal my energies towardsparallel computing to accomplish my scientific goals.The most efficient means of accomplishing this perspective, I imagine, is to take a problemI am interested in and just gosh darn try it in parallel!II. PARALLEL APPLICATIONA. IntroductionChemists and physicists are interested in computationally modelling the behavior ofmolecules a nd nanoparticles to study their ground states and thus their behavior with light.Such research is paramount in the pursuit of ideal molecular candidates for future solar2Figure 1: Each panel depicts the problem sent to each of three processors (red, blue, green) running3DFFT in parallel. The sphere of fourier coefficients is visualized in the first (a) panel, with radiusd. The cube has length 2d on each side.cells[5].Calculating the ground state (lowest-energy) electronic structure of any molecule or par-ticle is a widely studied problem, and a number of software packages have been developed toaccomplish the goal. Efficiency of these programs is pa r amount, and since these programsare often run on large supercomputing clusters, parallel applications to speed the processhave r eceived wide attention. In this summary, I will examine one par t icular parallel applica-tion, the parallel three-dimensional fourier transform, which underlies Paratec and PWSCFpackages[1, 3].B. The Three-Dimensional Fast Fourier Transform (3DFFT)The codebase for the Par atec package, which forms the “core” of many electronic structurestudies, r elies about one-third on BLAS3, one-third on hand-coded computer code, and3finally one-third on thre three- dimensional fo urier transform[4]. It thus benefits signficantlyfrom improving the FFT’s efficiency on the computing cluster.The key steps t o parallel 3DFFT, which aims to calculate the fourier coefficients withina “g-vector sphere”, to quote Andrew Canning’s paper with minor alterations, are1. We begin with vertical columns in the region of the g-vector sphere (Figure 1(a)). Eachprocessor pads with zeros the ends of each column to form full length z-columns oneach processor (Figure 1(a) to Figure 1(b)) . It is immaterial which processor recieveswhich column.2. Each processor performs one-dimensional FFTs on its set of z-columns.3. The cylinder of data is now reorganized from z-columns to y-columns (ordered by theirx,z indices) with each processor now holding a contiguous set of y-columns. Data fromall processors is synchronized going from Figure 1(b) to Figure 1(c), as can be seenby the changes in color o f the data elements. Each processor is given as closely aspossible the same number of y-columns.4. The y-columns are padded with zeros at the ends to form full length columns (Figure1(c) to Figure 1(d)). The complete data set is now a slab of dimension d in the xdirection and 2d in the other direction.5. Each processor performs one-dimensional FFTs on its set of y-columns.6. The slab of data is now transformed from y-columns (x,z ordered) to x- columns (y,zordered) with each processor now having a set of contiguous x-columns (Figure 1(d)to Fgure 1(e)). Each processor is given as closely as possible the same number ofx-columns. Communications are minimized at this step since most of the transforma-tions are local to the processor with only data at the interfaces of the colored blocksbeing communicated. In the ideal case where there are complete (y, x) pla nes on eachprocessor the transpose can be done locally on each processor and there ar e no com-munications. Due to our choice of data layouts in the FFT the main communicationsare in step 3 where the data set (the cylinder) is much smaller than the slab.7. The x-columns are now padded at the ends with zeros so the glo ba l data set is nowthe complete cube of side 2d (see fig ure 1(f )).4Figure 2: Perofrmance of the implemented parallel 3DFFT on several systems. The code maintainsrelatively stable peack performance even as the number of processors grows exponentially large8. Each processor performs one-dimensional FFTs on its set of x-columns producing thefinal distributed real space representation of the wavefunction in x,y,z order.C. PerformanceParatec’s 3DFFT routines are written in Fortran 90 and utilizes the MPI library, andbenefits g r eat ly from highly optimized (read: vectorized) one-dimensional FFT librariesfor the processors on which it runs. It scales well with an extremeley large number ofprocessors, as evidenced in the above table. The highest-performing cluster, the FranklinXT4, currently ranks seventh in the world[2]. Since this code is used in many electronicstructure calculations, it would be difficult to ascertain the highest-performing cluster it ha sbeen tested on.D. NoteThe Andrew Canning presenta t ion Parallel Methods for Nano/Materials S cience App l i -cations can be found athtt p:/ /www.cs.berkeley.edu/~demmel/cs267_Spr06/Lectures/Lecture28/Lecture_28_Canning_Nanoscience_06.ppt[1] Pwscf homepage - http://www.pwscf.org/.[2] Top 500 list - http://www.top500.org/list/2008/11/100.[3] Nersc homepage - http://www.nersc.gov/projects/paratec/, 2009.5[4] Andrew Canning. Parallel methods for nano-materials science


View Full Document

Berkeley COMPSCI C267 - Assignment

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Assignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Assignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Assignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?