Berkeley COMPSCI C267 - Assignment - D3001666

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Assignment

DOC PREVIEW

Berkeley COMPSCI C267 - Assignment

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 267 Assignment 0Douglas Mason1I. STUDENT BIOGRAPHYA. Research InterestsThis is my second year of graduate study for a PhD in physics, so at my tender stageof academic development my research interests are still wide open. At Berkeley and theNational Lab, I simulate graphene nanoribbons in large ensembles to understand their con-ductive properties. I will complete this project soon and move onto density functionalsimulation and possibly quantum monte carlo methods. I am on exchange for a year fromHarvard University, so that when I return in the fall, I will begin work on developing theprinciples behind quantum monte carlo.B. Course GoalsMy work in physics for the past few years has led me to developing large applicationswhich have always proven to be limited by computational power. My imagination, it turnsout, is just too big for the current state-of-the-art computers! And this is why I came toBerkeley: to absorb the perspectives and principles of its computer science department, insum, to learn what the true limitations are. I am taking this course to learn how I canbreak through some of these barriers through intelligent deployment of parallel computingprinciples. Thus I hope to learn how much work it will take me to actually implement aparallel solution, what the benefits are, and how to intelligently marshal my energies towardsparallel computing to accomplish my scientific goals.The most efficient means of accomplishing this perspective, I imagine, is to take a problemI am interested in and just gosh darn try it in parallel!II. PARALLEL APPLICATIONA. IntroductionChemists and physicists are interested in computationally modelling the behavior ofmolecules a nd nanoparticles to study their ground states and thus their behavior with light.Such research is paramount in the pursuit of ideal molecular candidates for future solar2Figure 1: Each panel depicts the problem sent to each of three processors (red, blue, green) running3DFFT in parallel. The sphere of fourier coefficients is visualized in the first (a) panel, with radiusd. The cube has length 2d on each side.cells[5].Calculating the ground state (lowest-energy) electronic structure of any molecule or par-ticle is a widely studied problem, and a number of software packages have been developed toaccomplish the goal. Efficiency of these programs is pa r amount, and since these programsare often run on large supercomputing clusters, parallel applications to speed the processhave r eceived wide attention. In this summary, I will examine one par t icular parallel applica-tion, the parallel three-dimensional fourier transform, which underlies Paratec and PWSCFpackages[1, 3].B. The Three-Dimensional Fast Fourier Transform (3DFFT)The codebase for the Par atec package, which forms the “core” of many electronic structurestudies, r elies about one-third on BLAS3, one-third on hand-coded computer code, and3finally one-third on thre three- dimensional fo urier transform[4]. It thus benefits signficantlyfrom improving the FFT’s efficiency on the computing cluster.The key steps t o parallel 3DFFT, which aims to calculate the fourier coefficients withina “g-vector sphere”, to quote Andrew Canning’s paper with minor alterations, are1. We begin with vertical columns in the region of the g-vector sphere (Figure 1(a)). Eachprocessor pads with zeros the ends of each column to form full length z-columns oneach processor (Figure 1(a) to Figure 1(b)) . It is immaterial which processor recieveswhich column.2. Each processor performs one-dimensional FFTs on its set of z-columns.3. The cylinder of data is now reorganized from z-columns to y-columns (ordered by theirx,z indices) with each processor now holding a contiguous set of y-columns. Data fromall processors is synchronized going from Figure 1(b) to Figure 1(c), as can be seenby the changes in color o f the data elements. Each processor is given as closely aspossible the same number of y-columns.4. The y-columns are padded with zeros at the ends to form full length columns (Figure1(c) to Figure 1(d)). The complete data set is now a slab of dimension d in the xdirection and 2d in the other direction.5. Each processor performs one-dimensional FFTs on its set of y-columns.6. The slab of data is now transformed from y-columns (x,z ordered) to x- columns (y,zordered) with each processor now having a set of contiguous x-columns (Figure 1(d)to Fgure 1(e)). Each processor is given as closely as possible the same number ofx-columns. Communications are minimized at this step since most of the transforma-tions are local to the processor with only data at the interfaces of the colored blocksbeing communicated. In the ideal case where there are complete (y, x) pla nes on eachprocessor the transpose can be done locally on each processor and there ar e no com-munications. Due to our choice of data layouts in the FFT the main communicationsare in step 3 where the data set (the cylinder) is much smaller than the slab.7. The x-columns are now padded at the ends with zeros so the glo ba l data set is nowthe complete cube of side 2d (see fig ure 1(f )).4Figure 2: Perofrmance of the implemented parallel 3DFFT on several systems. The code maintainsrelatively stable peack performance even as the number of processors grows exponentially large8. Each processor performs one-dimensional FFTs on its set of x-columns producing thefinal distributed real space representation of the wavefunction in x,y,z order.C. PerformanceParatec’s 3DFFT routines are written in Fortran 90 and utilizes the MPI library, andbenefits g r eat ly from highly optimized (read: vectorized) one-dimensional FFT librariesfor the processors on which it runs. It scales well with an extremeley large number ofprocessors, as evidenced in the above table. The highest-performing cluster, the FranklinXT4, currently ranks seventh in the world[2]. Since this code is used in many electronicstructure calculations, it would be difficult to ascertain the highest-performing cluster it ha sbeen tested on.D. NoteThe Andrew Canning presenta t ion Parallel Methods for Nano/Materials S cience App l i -cations can be found athtt p:/ /www.cs.berkeley.edu/~demmel/cs267_Spr06/Lectures/Lecture28/Lecture_28_Canning_Nanoscience_06.ppt[1] Pwscf homepage - http://www.pwscf.org/.[2] Top 500 list - http://www.top500.org/list/2008/11/100.[3] Nersc homepage - http://www.nersc.gov/projects/paratec/, 2009.5[4] Andrew Canning. Parallel methods for nano-materials science

View Full Document

Berkeley COMPSCI C267 - Assignment

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Berkeley COMPSCI C267 - Assignment

Sign up for free to view:

Please select your school