New version page

Optimizing Distributed Application Performance

Upgrade to remove ads

This preview shows page 1-2-3 out of 10 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Optimizing Distributed Application Performance UsingDynamic Grid Topology-Aware Load BalancingGregory A. Koenig and Laxmikant V. Kal´eDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaign{koenig,kale}@cs.uiuc.eduAbstractGrid computing offers a model for solving large-scalescientific problems by uniting computational resourcesowned by multiple organizations to form a single cohesiveresource for the duration of individual jobs. Despite the ap-peal of using Grid computing to solve large problems, itsuse has been hindered by the challenges involved in devel-oping applications that can run efficiently in Grid environ-ments. One substantial obstacle to deploying Grid applica-tions across geographically distributed resources is cross-site latency. While certain classes of applications, such asmaster-slave style or functional decomposition type appli-cations, lend themselves well to running in Grid environ-ments due to inherent latency tolerance, other classes ofapplications, such as tightly-coupled applications in whicheach processor regularly communicates with its neighbor-ing processors, represent a significant challenge to deploy-ment on Grids.In this paper, we present a dynamic load balancing tech-nique for Grid applications based on graph partitioning.This technique exploits knowledge of the topology of theGrid environment to partition the computation’s commu-nication graph in such a way as to reduce the volume ofcross-site communication, thus improving the performanceof tightly-coupled applications that are co-allocated acrossdistributed resources. Our technique is particularly wellsuited to codes from disciplines like molecular dynamics orcosmology due to the non-uniform structure of communica-tion in these types of applications. We evaluate the effective-ness of our technique when used to optimize the execution ofa tightly-coupled classical molecular dynamics code calledLeanMD deployed in a Grid environment.1-4244-0910-1/07/$20.00c2007 IEEE.1. IntroductionOne of the attractive features of Grid computing [8, 9] isthat resources in geographically distant places can be mobi-lized to meet computational needs as they arise. Softwaresuch as Globus [7] allows the creation of so-called “virtualorganizations” in which computational resources owned bymultiple physical organizations are united to form a singlecohesive resource for the duration of a single computationaljob.A particularly challenging issue when deploying Gridapplications across geographically distributed computa-tional resources is overcoming the effects of the latency be-tween sites. While the interconnects used within today’sclusters can typically deliver application-to-application la-tencies of a few microseconds, wide-area network laten-cies are usually measured in tens or hundreds of millisec-onds. Certain classes of applications can achieve good per-formance in environments such as this. For example, ap-plications that employ functional decomposition, such asclimate models in which an atmosphere computation runson one cluster and an ocean computation runs on anothercluster, are very good candidates for deployment in Gridenvironments because the volume of communication trav-eling across cluster boundaries is much less than the vol-ume of communication internal to each cluster. In contrast,some classes of applications present serious challenges todeployment in Grid computing environments. For example,tightly-coupled applications where each processor commu-nicates with its neighboring processors during every iter-ation present a significant challenge. Coping with the ef-fects of wide-area latency is critical for achieving good per-formance with these types of applications when deployingthem in a Grid computing environment.In previous work we have shown that it is possible toachieve good performance with tightly-coupled applica-tions in Grid computing environments by leveraging latencytolerance features in an adaptive middleware layer [19]. Bydecomposing applications into a large number of message-driven objects, runtime systems such as Charm++ andAdaptive MPI allow time that would otherwise be wastedwaiting for communication with neighbors across a clusterboundary to be overlapped with useful work driven by ob-jects within the local cluster.The contribution of this paper is the demonstration thatthe dynamic load balancing capabilities of the Charm++ andAdaptive MPI systems can be used to further improve per-formance for tightly-coupled applications running in Gridcomputing environments. The technique developed for thiswork exploits knowledge of the communication topology ofthe Grid environment to partition the computation’s com-munication graph in such a way as to reduce the volume ofcross-site communication, thus improving the performanceof tightly-coupled applications that are co-allocated acrossdistributed resources. In this paper, we focus our exami-nation of the effectiveness of this technique on codes fromdisciplines like molecular dynamics or cosmology. In thesetypes of problems, elements in the problem domain (e.g.,atoms, planets, etc.) interact with other elements within aspecified cutoff distance; elements outside this cutoff dis-tance represent little or no influence on a given element.This characteristic presents unique opportunities for loadbalancing by mapping work to the processors in a Grid com-putation such that the amount of wide-area communicationneeded can be reduced.2. Enabling TechnologiesIn this section, we describe the enabling technologiesupon which our work is based. These technologies includethe Charm++ and Adaptive MPI runtime systems as well asthe Virtual Machine Interface message layer.2.1. Charm++ and Adaptive MPICharm++ [14] is a message-driven parallel programminglanguage based on C++ and designed with the goal of en-hancing programmer productivity by providing a high-levelabstraction of a parallel computation while at the sametime delivering good performance. Programs written inCharm++ are decomposed into a number of cooperatingobjects called chares. Execution within chares is message-driven in a style similar to Charm++ contemporaries such asActive Messages [26], Fast Messages [22], and Nexus [10].When a chare receives a message, the message triggers theexecution of a corresponding method within the chare tohandle the message asynchronously. Chares may be orga-nized into indexed collections called chare arrays, and mes-sages may be sent to individual


Download Optimizing Distributed Application Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimizing Distributed Application Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimizing Distributed Application Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?