DOC PREVIEW
RIT EECC 756 - Customized Dynamic Load Balancing

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Review of“Customized Dynamic Load Balancing for a Network of Workstations”IntroductionDynamic Load Balancing (DLB) StrategiesGlobal Strategy:Local Strategy:Tradeoffs:DLB Modeling and Decision ProcessModeling ParametersTotal CostSynchronization CostCost of Calculating New DistributionCost is usually very small.Cost of Data MovementDecision ProcessRuntime SystemExperimentsMatrix Multiplication:Matrix Multiplication ResultsTRFDTRFD ResultsModeling ResultsConclusionsReferencesA Review of“Customized Dynamic Load Balancing fora Network of Workstations”Taken from work done by:Mohammed Javeed Zaki, Wei Li, Srinivasan ParthasarathyComputer Science Department, University of Rochester (June 1997)Presenter: Jacqueline EwellIntroductionWith the rapid advances in computational power, speed, cost, memory, and high-speed network technologies, a “Network of Workstations” provides an attractive scalable alternative compared to custom parallel machines. Scalable distributed shared-memory architectures rely on the ability for each processor to maintain a load balance. Load balancing involves assigning each processor a proportional amount of work bases on its performance, thereby minimizing total execution time of the program. There are two types of load balancing, static and dynamic. Static load balancing allows the programmerto delegate responsibility to each processor before run time. The simplest approach is thestatic block-scheduling scheme, which assigns equal blocks of iterations to each of the processors. Another scheme of static load balancing is the static interleaved scheme, which assigns iteration in a cyclic fashion. This makes programming much easier and cansolve many problems inherit in load balancing caused by processor heterogeneity and non-uniform loops. Static scheduling avoids run time scheduling overheads. However, due to transient external loads applied by multiple-users on a network of workstations, heterogeneity with processors (different speeds), memory (varying amountsof available memory), networks (varying cost of communication among pairs of processors), and software (programs may have varying amount of work in each iteration),a dynamic approach to load balancing is necessary. Dynamic load balancing allows, during run-time, the ability to delegate work based on run-time performance of the networked set of workstations, keeping in mind the trade-off of task-switching and load-imbalance costs. The idea of Dynamic vs. Static load balance is not new. However, therehave been many load-balancing schemes developed, each with specific applications in mind under varying programs and system parameters. Therefore, wouldn’t it be nice to customize dynamic load balancing dependant on applications to yield the best performance possible? The paper entitled “Customized Dynamic Load Balancing for a Network of Workstations”, identifies this problematic situation and presents a hybrid compile-time and runtime modeling and decision process, which selects the best scheme, based on run-time feedback from the processors and its performance. Dynamic Load Balancing (DLB) StrategiesWhen the execution time of loop iterations is not predictable at compile time, runtime dynamic scheduling should be used with the additional runtime cost of managing task allocation. There have been a number of models developed for dynamic loop scheduling.The task queue model has been targeted towards shared memory machines, while the diffusion model has been used for distributed memory machines [4,5,6]. The task queue model assumes a central task queue where if a processor finishes its assigned work, more work can be obtained from the queue. This approach can be self-guided or a master may remove the work and allocate it the processor. In the diffusion model, all work is delegated to each processor and when an imbalance is detected between a processor and its neighbor, work movement occurs. A third scheduling strategy may be used and is the bases of the paper in discussion. The approach uses past performance information to predict the future performance. Among these approaches have been the Global Distributed scheme, the Global Central scheme, the Local Distributed scheme, the Global Centralized scheme using automatic generation of parallel programs, the Local Distributed receiver-initiated scheme, and the Local Centralized scheme. In all cases, the load balancing involves periodic exchange of performance information. The method discussed in the paper implements, instead ofperiodic exchange, an interrupt-base receiver. Its goal is to select among the given schemes, the best load balancing scheme at run-time. These strategies lie on the extremes points on the two axes, global vs. local and central vs. distributed. There are more schemes that lie within the extreme points, and they will be explored in future works.Global Strategy: In the global scheme, all load-balancing decisions are made on a global scale. That is, allprocessors get an interrupt once one processor has completed its work. Then all the processors send their performance profile to the load balancer. In the Centralized scheme, the load balancer is located in a centralized area, often on the master processor. After calculating the new distribution, the load balancer sends instructions to all the processors who have to send work to the others. The receiving processors just wait until they have received all the data necessary to proceed. In a Distributed scheme, the load balancer is replicated on all the processors. The profile information is broadcast to all theprocessors and the receiving processors wait for the work while the sending processors ship the data. This method eliminates the need for instructions from the master processor. Local Strategy:In the local scheme, the processors are divided into groups of size K. The load balancing decisions are only done within a group (K block static partitioning; other approaches suchDistributedLocal GlobalCentralizeas K-nearest neighbor and dynamic partitioning may also be used). Profile information isonly exchanged with a group. The Centralized scheme still only has one load balancer, instead of one per group. The load balancer, besides learning the profile performance of each processor, must also keep track of which group each processor is in. As for the Distributed scheme, each processor still maintains its own load balancer and profiles are sent


View Full Document

RIT EECC 756 - Customized Dynamic Load Balancing

Documents in this Course
Load more
Download Customized Dynamic Load Balancing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Customized Dynamic Load Balancing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Customized Dynamic Load Balancing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?