DOC PREVIEW
Berkeley COMPSCI 294 - Synthesis of Distributed Arrays in Titanium

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1 Introduction2 Background2.1 Titanium Background2.2 Distributed Arrays2.2.1 Distribution2.2.2 Replicated Portions2.2.3 Access Patterns3 Language Modifications3.1 Specification Language3.2 Using Distributed Arrays3.2.1 Declaration and Construction3.2.2 Array Access and Functions3.2.3 New Constructs4 Implementation4.1 Implementation Strategy4.2 Work in Progress4.2.1 Multilevel Arrays4.2.2 Titanium Limitations5 Applications5.1 Game of Life5.2 Knapsack6 Discussion6.1 Limitations6.2 Future Work6.2.1 Array Types6.2.2 Analyses and Optimizations7 Related Work8 ConclusionSynthesis of Distributed Arrays in TitaniumAmir KamilComputer Science Division, University of California, [email protected] 17, 20061 IntroductionIn parallel programs, data is distributed between all the threads in the program. Many times it is necessary for one thread toaccess data that is located on a different thread, requiring communication between threads. The cost of communication canbe a significant portion of the total running time of a parallel program (e.g. >40% for Titanium adaptive mesh refinementon 3 nodes [13]), and load imbalance can have a significant detrimental effect on performance. As a result, much of theeffort involved in writing parallel code is focused on optimizing the data structures and access patterns, and the resultingimplementation tends to be far removed from an abstract description of the structure. The end result is a large piece of codethat is difficult to understand from the points of view of both correctness and performance. The goal of this project is to allowa programmer to specify data structure details at a high level, reducing the required programming effort while still resulting ingood performance.Ignoring data distribution, most data structures used in parallel programming have a relatively simple abstract layout. Forexample, the main data structure may be just a multidimensional array. Distributing the structures across multiple threadsintroduces significant complexity. The following are a few issues that come up:• Which thread owns what pieces of the data structure?• Which parts of the data structure should be replicated?• When and how should consistency of the replicated parts be maintained?Ideally, these issues would be resolved independently of the algorithm that operates on the distributed data, decoupling thedata structure implementation from the algorithm.In this paper, we present a distributed array synthesizer for the Titanium language. The synthesizer allows specificationof arrays with varying answers to the above questions in a simple language, and generates the Titanium code to implementthe arrays. The implementation is abstracted from the user, so that the same algorithm can be performed on different types ofdistributed arrays.2 Background2.1 Titanium BackgroundThe Titanium programming language [14] is a high performance dialect of Java designed for distributed machines. It is asingle program, multiple data (SPMD) language, so all threads execute the same code image. In addition, Titanium hasa global address space abstraction, so that any thread can directly access memory on another thread. However, accessingmemory on another thread can be much slower than on the same thread, since network communication may be required.Titanium provides support for multidimensional arrays. Arrays are specified over N -dimensional rectangular domains,which consist of a lower bound, an upper bound, and a stride. For example, the domain [[1,1] : [5,5] : [2,2]]has a lower bound of [1,1], an upper bound of [5,5], and a stride of [2,2], so it consists of the set of points {[1,1],[3,3], [5,5]}.1Figure 1: Two possible distributions of a two dimensional grid.Support for distributed arrays, however, is weak in Titanium. They can be built from the ground up by allocating eachthread’s portion locally on that thread, and then exchanging pointers to these portions among all threads. Each thread’s portionmust be manually computed, and a programmer must manually translate an access to the distributed array into an access tothe right thread’s portion of the array. This process is very tedious and error-prone.2.2 Distributed ArraysDistributed arrays in general are characterized by multiple factors, including dimension, distribution, replicated portions, andaccess patterns. Dimension is obvious, but the rest require some explanation.2.2.1 DistributionArrays can be distributed among the available threads in many ways, and each dimension can be distributed in different ways.In this paper, we focus on regular distributions, of which the following are most common:• blocked: cells in a dimension are divided into blocks of a given size, and each block is given to a different thread. Theleft side of Figure 1 shows a distribution that is blocked in two dimensions, with a block size of 2 in each dimension.• cyclic: each thread gets every nth cell in a dimension, where n is the total number of threads. The right side of Figure1 shows a cyclic distribution in one dimension, where there are two threads total.• blocked-cyclic: similar to blocked, except that each thread gets every nth block in a dimension. In this paper, we donot support this distribution type.Not all dimensions of an array need be distributed. The right side of Figure 1 shows a distribution in which only one dimensionof a two-dimensional array is distributed.2.2.2 Replicated PortionsSome parts of a distributed array may be replicated, for decreased communication, for algorithmic purposes such as in multi-grid [6, 10, 7], or for lowering memory and computational requirements such as in adaptive mess refinement [13].In many parallel algorithms, the value of an array element depends on the values of its neighbors. At thread boundaries ina distributed array, these neighbors may be on a different thread. In order to decrease communication, ghost cells are used tocache these values. Ghost cells must updated whenever the cached value changes, so as to be coherent with the real cells thatthey correspond to.In multigrid algorithms, the distributed array consists of multiple refinement levels. Each level corresponds to the entirephysical domain represented by the array, but at a different resolution. For example, the bottom level of an array may containa cell for each point in the physical domain, the next level a cell for every four points, and so on. The refinement ratio is theratio of the resolutions between levels,


View Full Document

Berkeley COMPSCI 294 - Synthesis of Distributed Arrays in Titanium

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Synthesis of Distributed Arrays in Titanium
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Synthesis of Distributed Arrays in Titanium and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Synthesis of Distributed Arrays in Titanium 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?