DOC PREVIEW
Computational Sciences and Parallelism

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

fcomm = tcomm/((n tfloat) (5)TITLE Computational Sciences and Parallelism BYLINE Geoffrey Fox School of Informatics and Computing And Pervasive Technology Institute Indiana University Bloomington IN SYNONYMS Applications and Parallelism Problem Architectures DEFINITION Here we ask which applications should run in parallel and correspondingly which areas of computational science will benefit from parallelism. In studying this we will discover which applications benefit from particular hardware and software choices. A driving principle is that in parallel programming, one must map problems into software and then into hardware. The architecture differences in source and target of these maps will affect the efficiency and ease of parallelism. DISCUSSION Introduction I have an application – can and should it be parallelized and if so, how should this be done and what are appropriate target hardware architectures; what is known about clever algorithms and what are recommended software technologies? Fox introduced in [1] a general approach to this question by considering problems and the computer infrastructure on which they are executed as complex systems. Namely each is a collection of entities and connections between them governed by some laws. The entities can be illustrated by mesh points, particles, data points for problems; cores, networks and storage locations for hardware; objects, instructions and messages for software. The processes of deriving numerical models, generating the software to simulate model, compiling the software, generating the machine code and finally executing the program on particular hardware can be considered as maps between different complex systems. Many performances models and analyses have been developed and these describe the quality of map. We know that maps are essentially never perfect and describing principles for quantifying this is a goal of this entry. At a high level, we understand that the architecture of problem and hardware/software must match; given this we have quantitative conditions that the performance of the parts of the hardware must be consistent with the problem. For example if two mesh points in problem are strongly connected, then bandwidth between components of hardware to which they are mappedmust be high. In this discussion, we want to describe the issues of parallelism and here there are two particularly interesting general results. Firstly we can usually define a space (the domain of entities) and a time associated with a complex system. Time is nature’s time for the complex system that describes time dependent simulations. However for linear algebra time for that complex system is an iteration count. Note for the simplest sequential computer hardware there is no space and just a time defined by the control flow. Thus in executing problems on computers one is typically mapping all or part of the space of the problem onto time for the computer and parallel computing corresponding case where both problem and computer have well defined spatial extent. Mapping is usually never 1:1 and reversible and “information is lost” as one maps one system into another. In particular one fundamental reason why automatic parallelism can be hard is that the mapping of problem into software has thrown away key information about the space-time structure of original problem. Language designers in this field try to find languages that preserve key information needed for parallelism while hardware designers design computers that can work around this loss of information. For an example use of arrays in many data parallel languages from APL, HPF to Sawzall, can be viewed as a way to preserve spatial structure of problems when expressed in these languages. In this article, we will not discuss these issues in depth but rather discuss what is possible with “knowledgeable users” mapping problems to computers or particular programming paradigms. Simple Example We consider the simple case of a problem whose complex system spatial structure is represented as a 2D mesh. This comes in material science when one considers local forces between a regular array of particles or in the finite difference approach to solving Laplace or Poisson’s equation in two dimensions. There are many important subtleties such as adaptive meshes and hierarchical multigrid methods but in the simplest formulation such problems are set up as a regular grid of field values where the basic iterative update links nearest neighbors in two dimensions. If the points are labeled by an index pair (i,j), then Jacobi's method (not state of the art but chosen as simplicity allows a clear discussion) can be written φ (i,j) is replaced by (φLeft + φRight + φUp + φDown ) / 4 (1) where φLeft =φ (i,j-1)and similarly for φRight, φUp, and φDown. Such problems would usually be parallelized by a technique that is often called "domain decomposition" or "data parallelism" although these terms are not very precise. Parallelism is naturally found for such problems by dividing the domain up into parts and assigning each part to a different processors as seen in Figure 1. Here the problem represented as a 16x16 mesh is solved on a 4x4 mesh of processors. For problems coming from nature this geometric view is intuitive as say in a weather simulation, the atmosphere over California evolves independently from that over Indiana and so can be simulated on separate processors. This is only true for short time extrapolations – eventually information flows between these sites and their dynamics are mixed. Of course it is the communication of data between the processors (either directly in adistributed memory or implicitly in a shared memory) that implements this eventual mixing. Figure 1: Communication Structure for 2D Complex System Example. The dots are the 256 points in the problem. Shown by dashed lines is the division into 16 processors. The circled points are the halo or ghost grid points communicated to processor they surround. Such block data decompositions typically lead to a SPMD (Single Program Multiple Data) structure with each processor executing the same code but on different data points and with differing boundary conditions. In this type of problem, processors at the edge of the (4x4) mesh do not see quite the same communication and compute complexity as the “general case” of an inside processor shown in figure 1. For the local nearest neighbor


Computational Sciences and Parallelism

Download Computational Sciences and Parallelism
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Sciences and Parallelism and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Sciences and Parallelism 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?