DOC PREVIEW
RIT EECC 756 - Customized Dynamic Load Balancing

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Review of“Customized Dynamic Load Balancing for aNetwork of Workstations”Taken from work done by:Mohammed Javeed Zaki, Wei Li, Srinivasan ParthasarathyComputer Science Department, University of Rochester (June 1997)Presenter: Jacqueline EwellIntroductionWith the rapid advances in computational power, speed, cost, memory, and high-speednetwork technologies, a “Network of Workstations” provides an attractive scalablealternative compared to custom parallel machines. Scalable distributed shared-memoryarchitectures rely on the ability for each processor to maintain a load balance. Loadbalancing involves assigning each processor a proportional amount of work bases on itsperformance, thereby minimizing total execution time of the program. There are twotypes of load balancing, static and dynamic. Static load balancing allows the programmerto delegate responsibility to each processor before run time. The simplest approach is thestatic block-scheduling scheme, which assigns equal blocks of iterations to each of theprocessors. Another scheme of static load balancing is the static interleaved scheme,which assigns iteration in a cyclic fashion. This makes programming much easier and cansolve many problems inherit in load balancing caused by processor heterogeneity andnon-uniform loops. Static scheduling avoids run time scheduling overheads.However, due to transient external loads applied by multiple-users on a network ofworkstations, heterogeneity with processors (different speeds), memory (varying amountsof available memory), networks (varying cost of communication among pairs ofprocessors), and software (programs may have varying amount of work in each iteration),a dynamic approach to load balancing is necessary. Dynamic load balancing allows,during run-time, the ability to delegate work based on run-time performance of thenetworked set of workstations, keeping in mind the trade-off of task-switching and load-imbalance costs. The idea of Dynamic vs. Static load balance is not new. However,there have been many load-balancing schemes developed, each with specific applicationsin mind under varying programs and system parameters. Therefore, wouldn’t it be niceto customize dynamic load balancing dependant on applications to yield the bestperformance possible? The paper entitled “Customized Dynamic Load Balancing for aNetwork of Workstations”, identifies this problematic situation and presents a hybridcompile-time and runtime modeling and decision process, which selects the best scheme,based on run-time feedback from the processors and its performance.Dynamic Load Balancing (DLB) StrategiesWhen the execution time of loop iterations is not predictable at compile time, runtimedynamic scheduling should be used with the additional runtime cost of managing taskallocation. There have been a number of models developed for dynamic loop scheduling.The task queue model has been targeted towards shared memory machines, while thediffusion model has been used for distributed memory machines [4,5,6]. The task queuemodel assumes a central task queue where if a processor finishes its assigned work, morework can be obtained from the queue. This approach can be self-guided or a master mayremove the work and allocate it the processor. In the diffusion model, all work isdelegated to each processor and when an imbalance is detected between a processor andits neighbor, work movement occurs.A third scheduling strategy may be used and is the bases of the paper in discussion. Theapproach uses past performance information to predict the future performance. Amongthese approaches have been the Global Distributed scheme, the Global Central scheme,the Local Distributed scheme, the Global Centralized scheme using automatic generationof parallel programs, the Local Distributed receiver-initiated scheme, and the LocalCentralized scheme. In all cases, the load balancing involves periodic exchange ofperformance information. The method discussed in the paper implements, instead ofperiodic exchange, an interrupt-base receiver. Its goal is to select among the givenschemes, the best load balancing scheme at run-time.These strategies lie on the extremes points on the two axes, global vs. local and centralvs. distributed. There are more schemes that lie within the extreme points, and they willbe explored in future works.Global Strategy:In the global scheme, all load-balancing decisions are made on a global scale. That is, allprocessors get an interrupt once one processor has completed its work. Then all theprocessors send their performance profile to the load balancer. In the Centralizedscheme, the load balancer is located in a centralized area, often on the master processor.After calculating the new distribution, the load balancer sends instructions to all theprocessors who have to send work to the others. The receiving processors just wait untilthey have received all the data necessary to proceed. In a Distributed scheme, the loadbalancer is replicated on all the processors. The profile information is broadcast to all theprocessors and the receiving processors wait for the work while the sending processorsship the data. This method eliminates the need for instructions from the masterprocessor.Local Strategy:In the local scheme, the processors are divided into groups of size K. The load balancingdecisions are only done within a group (K block static partitioning; other approaches suchDistributedLocal GlobalCentralizeas K-nearest neighbor and dynamic partitioning may also be used). Profile information isonly exchanged with a group. The Centralized scheme still only has one load balancer,instead of one per group. The load balancer, besides learning the profile performance ofeach processor, must also keep track of which group each processor is in. As for theDistributed scheme, each processor still maintains its own load balancer and profiles aresent amongst all the processors in the group.Tradeoffs:Global vs. Local: For the Global scheme, since global information is available atsynchronization time, the work distribution is nearly optimal, and convergence is fasterthan Local scheme. However, communication and synchronization cost is much higherin the Global scheme. For the Local scheme, even though initially, the groups may bepartitioned statically, all groups are assumed to be divided up evenly by performance andspeed, performance of each processor may change over time. Thus, one


View Full Document

RIT EECC 756 - Customized Dynamic Load Balancing

Documents in this Course
Load more
Download Customized Dynamic Load Balancing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Customized Dynamic Load Balancing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Customized Dynamic Load Balancing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?