Berkeley COMPSCI C267 - Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code - D761126

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code

DOC PREVIEW

Berkeley COMPSCI C267 - Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Optimization and Evaluation of a Titanium Adaptive MeshRefinement CodeAmir Kamil Ben Schwarz Jimmy [email protected] [email protected] [email protected] 19, 2004AbstractAdaptive Mesh Refinement (AMR) is a grid-based ap-proach to approximating the solutions to partial differ-ential equations (PDEs). It is governed by the principlethat some portions of the grid require more computationto solve than other regions. AMR uses an adaptive ap-proach to adjust the resolution of the computational do-main to reflect the varying computational needs. A high-performance implementation of an AMR code is providedby the Chombo framework—a set of tools for implement-ing the finite difference method to solve PDEs. Chomboprovides a C++ implementation that uses the MessagePassing Interface (MPI) for communication.Titanium is a Java-based language for parallel program-ming that uses a shared memory space abstraction. It of-fers many features to assist a developer in crafting parallelprograms. However, the ease of use can often come at thecost of performance. In this paper, we evaluate an im-plementation of AMR provided by the Applied Numeri-cal Algorithms Group at the Lawrence Berkeley NationalLaboratories. We explain several optimizations that wehave performed on the code to improve its performance.Our contribution is an optimized implementation that runs23% faster than the original Titanium code. It is eithercomparable to or faster than the C++/MPI Chombo codeon all the platforms we have tested.1 IntroductionThis paper explores optimizations and analyses carriedout on an AMR code that is implemented using the Ti-tanium programming language [4]. Titanium provides ashared memory abstraction and many parallel program-ming constructs to simplify programming scientific codes.Unfortunately these features sometimes come at the priceof inefficiency. For example, Titanium allows a programto have both local and remote pointers; remote pointerspoint to objects residing on another machine. Pointer op-erations, such as dereferencing, can be very expensive,because at runtime a number of checks need to be per-formed, and possibly communication needs to be done.In order to write an efficient program, the software devel-oper using this abstraction must be aware of the intricatedetails of the underlying language framework. Our contri-bution is to analyze a Titanium implementation of AMRand optimize the constructs that are the sources of ineffi-ciency.The remainder of the paper is organized as follows.Section 2 describes related work. Section 3 explains thebasic AMR algorithm along with some important datastructures. Section 4 discusses the analysis we have per-formed on the initialization code—which accounts for alarge percentage of the overall running time—and severaloptimizations performed to it. Section 5 lists our profilingresults. A load balancing study is explained in Section 6.Our optimizations to the exchange method of the AMRcode is described in Section 7.2. The final performancedata is presented in Section 8, and conclusions are in Sec-tion 9.1 PROC 1PROC 3PROC 2Communication: Exchange of Borders Communication: InterpolationFigure 1: Sources of Communication in AMR2 Related WorkThe Titanium programming language provides the basisfor many of our optimizations [4, 2]. Titanium is a dialectof Java, an object-oriented language, however the AMR isnot written using an object oriented paradigm. AMR++,on the other hand, provides an object-oriented designfor the problem [3]. Chombo is a high-performanceAMR code provided by the Applied Numerical Algo-rithms Group at the Lawrence Berkeley National Labo-ratory [1].3 Synopsis of AMRAMR is a grid-based approach to approximating the solu-tions to PDEs. The grid is subdivided into smaller gridswhen a finer resolution is needed. A grid level refers toa set of regions with the same resolution; these regionsare rectangular, and may overlap slightly. When refine-ment is done, we increase the resolution and divide theregion into a set of smaller region. The smaller regionsneed not include all of the larger region. In particular,areas of the larger region which need not be further re-fined are not broken down. At any given level, the set ofregions are distributed among the processes. It is not a re-quirement that a processor contain all the finer resolutionregions within a larger box. The AMR is carried out bydoing multigrid V-cycles on all the regions.There are two sources of communication in the AMRcode, as shown in Figure 1. The first is when two regionsare near each other, and to perform the computation theyneed to retrieve values from each other. We shall referto this type of communication as exchange of shared bor-ders. In the figure, processor 2 and processor 3 share asmall region, and the overlapping part represents the ex-change communication that will take place. The secondsource of communication is when a finer resolution re-gion is contained within a region with a larger resolution,and they are owned by different processors. A basic op-eration in the multigrid method is interpolation betweenresolutions. More specifically, the region layout is con-structed top down starting with the most coarse grid. If afiner grid is needed, the coarse grid points are interpolatedto figure out the values for the smaller region. Vice-versa,coming “up” on the multigrid v-cycle, interpolation needsto be done to set the values of coarser grid points based onthe finer resolution points. This bidirection interpolationresults in communication between the processors contain-ing those grids.4 Initialization TuningIn the initialization stage a distributed data structure isbuilt. The data structure consists of all the box layoutsand their distribution among processors. Figure 2 showsa sample data structure for a simple two processor case.Note that each processor has a pointer to the data on ev-ery other processor. The intuitive way to implement this isto have each processor broadcast its pointers to the otherprocessors. In practice, however, broadcasts must pro-ceed serially so this can be a limiting factor. An alterna-tive approach is to exchange pointers to the top level of2Figure 2: Example of Data Structures Built During Initializationpointers and then use Titanium’s array copy mechanism,which can proceed in parallel. We have evaluated bothimplementations and the results are presented in Table 1.The

View Full Document

Berkeley COMPSCI C267 - Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

Berkeley COMPSCI C267 - Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code

Sign up for free to view:

Please select your school