DOC PREVIEW
Berkeley COMPSCI C267 - Load Balancing

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267: Applications of Parallel Computers Load BalancingOutlineLoad Imbalance in Parallel ApplicationsMeasuring Load ImbalanceReview of Graph PartitioningLoad Balancing OverviewTask Cost SpectrumTask Dependency SpectrumTask Locality Spectrum (Communication)Spectrum of SolutionsDynamic Load BalancingSearchExample Problem: Tree SearchSequential Search AlgorithmsDepth vs Breadth First SearchParallel SearchCentralized SchedulingCentralized Task Queue: Scheduling LoopsVariations on Self-SchedulingVariation 1: Fixed Chunk SizeVariation 2: Guided Self-SchedulingVariation 3: TaperingVariation 4: Weighted FactoringWhen is Self-Scheduling a Good Idea?Distributed Task QueuesDistributed Dynamic Load BalancingHow to Select a Donor ProcessorHow to Split WorkTheoretical Results (1)Theoretical Results (2)Distributed Task Queue ReferencesDiffusion-Based Load BalancingDiffusion-based load balancingMixed ParallelismMixed Parallelism StrategiesWhich Strategy to UseSwitch Parallelism: A Special CaseList of CS267 ProjectsCS267 ProjectsMore ProjectsExtra SlidesSimple Performance Model for Data ParallelismSlide 43Modeling PerformanceActual Speed of Sign Function EigensolverValues of Sigma (Problem Size for Half Peak)Best-First SearchBranch and Bound Search RevisitedSimulated Efficiency of EigensolverSimulated efficiency of Sparse Cholesky04/18/2007 CS267 Lecture 241CS 267: Applications of Parallel ComputersLoad BalancingKathy Yelickwww.cs.berkeley.edu/~yelick/cs267_sp0704/18/2007 CS267 Lecture 242Outline•Motivation for Load Balancing•Recall graph partitioning as load balancing technique•Overview of load balancing problems, as determined by•Task costs•Task dependencies•Locality needs•Spectrum of solutions•Static - all information available before starting•Semi-Static - some info before starting•Dynamic - little or no info before starting•Survey of solutions•How each one works•Theoretical bounds, if any•When to use it04/18/2007 CS267 Lecture 243Load Imbalance in Parallel ApplicationsThe primary sources of inefficiency in parallel codes:•Poor single processor performance •Typically in the memory system•Too much parallelism overhead•Thread creation, synchronization, communication•Load imbalance•Different amounts of work across processors•Computation and communication•Different speeds (or available resources) for the processors•Possibly due to load on the machine•How to recognizing load imbalance•Time spent at synchronization is high and is uneven across processors, but not always so simple …04/18/2007 CS267 Lecture 244Measuring Load Imbalance•Challenges:•Can be hard to separate from high synch overhead•Especially subtle if not bulk-synchronous•“Spin locks” can make synchronization look like useful work•Note that imbalance may change over phases•Insufficient parallelism always leads to load imbalance•Tools like TAU can help (acts.nersc.gov)04/18/2007 CS267 Lecture 245Review of Graph Partitioning•Partition G(N,E) so that•N = N1 U … U Np, with each |Ni| ~ |N|/p•As few edges connecting different Ni and Nk as possible•If N = {tasks}, each unit cost, edge e=(i,j) means task i has to communicate with task j, then partitioning means•balancing the load, i.e. each |Ni| ~ |N|/p•minimizing communication volume•Optimal graph partitioning is NP complete, so we use heuristics (see earlier lectures)•Spectral•Kernighan-Lin•Multilevel•Speed of partitioner trades off with quality of partition•Better load balance costs more; may or may not be worth it•Need to know tasks, communication pattern before starting•What if you don’t?04/18/2007 CS267 Lecture 246Load Balancing OverviewLoad balancing differs with properties of the tasks (chunks of work):•Tasks costs•Do all tasks have equal costs?•If not, when are the costs known?•Before starting, when task created, or only when task ends•Task dependencies•Can all tasks be run in any order (including parallel)?•If not, when are the dependencies known?•Before starting, when task created, or only when task ends•Locality•Is it important for some tasks to be scheduled on the same processor (or nearby) to reduce communication cost?•When is the information about communication known?04/18/2007 CS267 Lecture 247Task Cost Spectrum, search04/18/2007 CS267 Lecture 248Task Dependency Spectrum04/18/2007 CS267 Lecture 249Task Locality Spectrum (Communication)04/18/2007 CS267 Lecture 2410Spectrum of SolutionsA key question is when certain information about the load balancing problem is known.Leads to a spectrum of solutions:•Static scheduling. All information is available to scheduling algorithm, which runs before any real computation starts. •Off-line algorithms, eg graph partitioning•Semi-static scheduling. Information may be known at program startup, or the beginning of each timestep, or at other well-defined points. Offline algorithms may be used even though the problem is dynamic. •eg Kernighan-Lin•Dynamic scheduling. Information is not known until mid-execution. •On-line algorithms04/18/2007 CS267 Lecture 2411Dynamic Load Balancing•Motivation for dynamic load balancing•Search algorithms as driving example•Centralized load balancing•Overview•Special case for schedule independent loop iterations•Distributed load balancing•Overview•Engineering•Theoretical results•Example scheduling problem: mixed parallelism•Demonstrate use of coarse performance models04/18/2007 CS267 Lecture 2412Search•Search problems are often:•Computationally expensive•Have very different parallelization strategies than physical simulations.•Require dynamic load balancing•Examples:•Optimal layout of VLSI chips•Robot motion planning•Chess and other games (N-queens)•Speech processing•Constructing phylogeny tree from set of genes04/18/2007 CS267 Lecture 2413Example Problem: Tree Search•In Tree Search the tree unfolds dynamically•May be a graph if there are common sub-problems along different paths•Graphs unlike meshes which are precomputed and have no ordering constraintsTerminal node (non-goal)Non-terminal nodeTerminal node (goal)04/18/2007 CS267 Lecture 2414Sequential Search Algorithms•Depth-first search (DFS)•Simple backtracking•Search to bottom, backing up to last choice if necessary•Depth-first branch-and-bound•Keep track of best solution so far (“bound”)•Cut off sub-trees that are guaranteed to be worse than


View Full Document

Berkeley COMPSCI C267 - Load Balancing

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Load Balancing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Load Balancing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Load Balancing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?