DOC PREVIEW
Berkeley COMPSCI C267 - Load Balancing

This preview shows page 1-2-3-25-26-27-28-50-51-52 out of 52 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 52 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

04/18/2007 CS267 Lecture 24 1CS 267: Applications of Parallel ComputersLoad BalancingKathy Yelickwww.cs.berkeley.edu/~yelick/cs267_sp0704/18/2007 CS267 Lecture 24 2Outline• Motivation for Load Balancing• Recall graph partitioning as load balancing technique• Overview of load balancing problems, as determined by• Task costs• Task dependencies• Locality needs• Spectrum of solutions• Static - all information available before starting• Semi-Static - some info before starting• Dynamic - little or no info before starting• Survey of solutions• How each one works• Theoretical bounds, if any• When to use it04/18/2007 CS267 Lecture 24 3Load Imbalance in Parallel ApplicationsThe primary sources of inefficiency in parallel codes:• Poor single processor performance • Typically in the memory system• Too much parallelism overhead• Thread creation, synchronization, communication• Load imbalance• Different amounts of work across processors• Computation and communication• Different speeds (or available resources) for the processors• Possibly due to load on the machine• How to recognizing load imbalance• Time spent at synchronization is high and is uneven across processors, but not always so simple …04/18/2007 CS267 Lecture 24 4Measuring Load Imbalance• Challenges:• Can be hard to separate from high synch overhead• Especially subtle if not bulk-synchronous• “Spin locks” can make synchronization look like useful work• Note that imbalance may change over phases• Insufficient parallelism always leads to load imbalance• Tools like TAU can help (acts.nersc.gov)04/18/2007 CS267 Lecture 24 5Review of Graph Partitioning• Partition G(N,E) so that• N = N1U … U Np, with each |Ni| ~ |N|/p• As few edges connecting different Niand Nkas possible• If N = {tasks}, each unit cost, edge e=(i,j) means task i has tocommunicate with task j, then partitioning means• balancing the load, i.e. each |Ni| ~ |N|/p• minimizing communication volume• Optimal graph partitioning is NP complete, so we use heuristics (see earlier lectures)• Spectral• Kernighan-Lin• Multilevel• Speed of partitioner trades off with quality of partition• Better load balance costs more; may or may not be worth it• Need to know tasks, communication pattern before starting• What if you don’t?04/18/2007 CS267 Lecture 24 6Load Balancing OverviewLoad balancing differs with properties of the tasks (chunks of work):• Tasks costs• Do all tasks have equal costs?• If not, when are the costs known?• Before starting, when task created, or only when task ends• Task dependencies• Can all tasks be run in any order (including parallel)?• If not, when are the dependencies known?• Before starting, when task created, or only when task ends• Locality• Is it important for some tasks to be scheduled on the same processor (or nearby) to reduce communication cost?• When is the information about communication known?04/18/2007 CS267 Lecture 24 7Task Cost Spectrum, search04/18/2007 CS267 Lecture 24 8Task Dependency Spectrum04/18/2007 CS267 Lecture 24 9Task Locality Spectrum (Communication)04/18/2007 CS267 Lecture 24 10Spectrum of SolutionsA key question is when certain information about the load balancing problem is known.Leads to a spectrum of solutions:• Static scheduling. All information is available to scheduling algorithm, which runs before any real computation starts.• Off-line algorithms, eg graph partitioning• Semi-static scheduling. Information may be known at program startup, or the beginning of each timestep, or at other well-defined points. Offline algorithms may be used even though the problem is dynamic. • eg Kernighan-Lin• Dynamic scheduling. Information is not known until mid-execution.• On-line algorithms04/18/2007 CS267 Lecture 24 11Dynamic Load Balancing• Motivation for dynamic load balancing• Search algorithms as driving example• Centralized load balancing• Overview• Special case for schedule independent loop iterations• Distributed load balancing• Overview• Engineering• Theoretical results• Example scheduling problem: mixed parallelism• Demonstrate use of coarse performance models04/18/2007 CS267 Lecture 24 12Search• Search problems are often:• Computationally expensive• Have very different parallelization strategies than physical simulations.• Require dynamic load balancing• Examples:• Optimal layout of VLSI chips• Robot motion planning• Chess and other games (N-queens)• Speech processing• Constructing phylogeny tree from set of genes04/18/2007 CS267 Lecture 24 13Example Problem: Tree Search• In Tree Search the tree unfolds dynamically• May be a graph if there are common sub-problems along different paths• Graphs unlike meshes which are precomputed and have no ordering constraintsTerminal node (non-goal)Non-terminal nodeTerminal node (goal)04/18/2007 CS267 Lecture 24 14Sequential Search Algorithms• Depth-first search (DFS)• Simple backtracking• Search to bottom, backing up to last choice if necessary• Depth-first branch-and-bound• Keep track of best solution so far (“bound”)• Cut off sub-trees that are guaranteed to be worse than bound• Iterative Deepening• Choose a bound on search depth, d and use DFS up to depth d• If no solution is found, increase d and start again• Iterative deepening A* uses a lower bound estimate of cost-to-solution as the bound• Breadth-first search (BFS)• Search across a given level in the tree04/18/2007 CS267 Lecture 24 15Depth vs Breadth First Search• DFS with Explicit Stack• Put root into Stack• Stack is data structure where items added to and removed from the top only• While Stack not empty• If node on top of Stack satisfies goal of search, return result, else– Mark node on top of Stack as “searched”– If top of Stack has an unsearched child, put child on top of Stack, else remove top of Stack• BFS with Explicit Queue• Put root into Queue• Queue is data structure where items added to end, removed from front• While Queue not empty• If node at front of Queue satisfies goal of search, return result, else– Mark node at front of Queue as “searched”– If node at front of Queue has any unsearched children, put them all at end of Queue– Remove node at front from Queue04/18/2007 CS267 Lecture 24 16Parallel Search• Consider simple backtracking search• Try static load balancing: spawn each


View Full Document

Berkeley COMPSCI C267 - Load Balancing

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Load Balancing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Load Balancing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Load Balancing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?