Unformatted text preview:

Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by Mohammed Javeed Zaki Wei Li Srinivasan Parthasarathy Computer Science Department University of Rochester June 1997 Presenter Jacqueline Ewell Static vs Dynamic Load Balancing Static Load Balancing allows the programmer to delegate work before runtime can accommodate for heterogeneous processor and nonuniform loops avoids runtime scheduling overheads needs to know all information about Workstations ahead of time Dynamic Load Balancing ability to delegate work based on runtime performance of a Network of Workstations NOW transient external loads by multiple users heterogeneous processors memory availability network bandwidths and contentions and software leads to a more logical choice of dynamic load balancing Dynamic Load Balancing Strategies Task Queue Model a centralized queue of work Work queue Diffusion Model all work is delegated to each processor when an imbalance is detected between it and its neighbor work is moved Predict future performance from past performance Exchange of performance information Global Distributed Scheme Distributed Global Centralized Scheme Local Distributed Scheme Centralized Local Centralized Scheme Local Global Dynamic Load Balancing Strategies Global all load balancing is done on a global scale Local processors are divided into groups size K and load balancing decisions are done within a group Centralized the load balancer is located on one processor Distributed the load balancer is replicated on every processor Dynamic Load Balancing Strategies Global Centralized Global Distributed Load Balancer P1 P2 P3 Load Balancer Load Balancer Load Balancer P1 P2 P3 Load Balancer Load Balancer P1 P2 G1 G1 P3 P4 Pn Local Distributed Load Balancer P2 Pn Local Centralized P1 Load Balancer Pn Load Balancer Load Balancer Load Balancer P3 P3 G2 G2 Pn Strategy Tradeoffs Global vs Local Global information is available at synchronization time therefore work distribution is optimal Global scheme synchronization and communication cost is much higher Local scheme groups may sit idle while other groups are overloaded Centralized vs Distributed Centralized scheme one load balancer will hurt scalability Centralized scheme distribution calculations are on one processor therefore done sequentially Distributed all to all exchange of performance profile therefore network contention could be a problem DLB Modeling Decision Process Modeling Parameters number of processors normalized processor speed number of neighbors Processor Parameters data size number of loop iterations work per iteration of bytes to be comm iteration time per iteration Program Parameters network latency bandwidth network topology Network Parameters maximum load duration of persistence of load External Load Modeling DLB Modeling Decision Process cont Total DLB Cost Synchronization Cost Cost of Calculating New Distribution Cost of Sending Instructions Cost of Data Movement only applies to centralized schemes DLB Modeling Decision Process cont Synchronization Cost GCDLB one to all P all to one P GDDLB one to all P all to all P2 LCDLB one to all K all to one K LDDLB one to all K all to all K2 Cost of Calculating New Distribution Usually very small Cost of Sending Instructions Number of send Messages Latency Cost of Data Movement Number of Message Latency Number of Iterations Moved Number of Bytes that need to be communicated per iteration Bandwidth DLB Modeling Decision Process cont Initially work will be divided equally among all processors Synchronization 1 Pth work has been done load function is known average effective speed is know Performance Metric number of iteration per second load function and other parameters are plugged into the model to select the best strategy Work Movement if amount of work to be moved is above a threshold Profitability Analysis move work only if there is a 10 improvement in execution time Experiment Global Schemes are best computation communication ratio is high More Processors More Synchronization Cost favors Local Scheme Global is still better at 16processors Centralized master sequential redistribution instruction sends and delay factors add sufficient overheads to Centralized scheme Experiment Amount of work iteration is small Local Distributed is favored As data size increases Global Distributed does better On 16 processors Local Distributed is the best Local is better than Global since computation comm Ratio is small Distributed is better than Centralized Modeling Results Conclusions Different Schemes are best for different applications Customized Dynamic Load Balancing is essential when transient external loads are introduced Given the model it is possible to select a good scheduling scheme Future Work Other Dynamic Load Balancing Schemes need to be incorporate into the model not lying on the extremes Instead of Local Central have one master per group Local schemes work should be exchanged between different groups Dynamic Group memberships


View Full Document

RIT EECC 756 - Customized Dynamic Load Balancing for a Network of Workstations

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Customized Dynamic Load Balancing for a Network of Workstations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Customized Dynamic Load Balancing for a Network of Workstations and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?