DOC PREVIEW
Berkeley COMPSCI C267 - Load Balancing

This preview shows page 1-2-3-4-29-30-31-32-33-60-61-62-63 out of 63 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 63 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267: Applications of Parallel Computers Load BalancingLoad Imbalance in Parallel ApplicationsMeasuring Load ImbalanceLoad Balancing OverviewTask Cost SpectrumTask Dependency SpectrumTask Locality Spectrum (Communication)Spectrum of SolutionsDynamic Load BalancingSearchExample Problem: Tree SearchSequential Search AlgorithmsParallel SearchCentralized SchedulingCentralized Task Queue: Scheduling LoopsVariations on Self-SchedulingVariation 1: Fixed Chunk SizeVariation 2: Guided Self-SchedulingVariation 3: TaperingVariation 4: Weighted FactoringWhen is Self-Scheduling a Good Idea?Distributed Task QueuesDistributed Dynamic Load BalancingHow to Select a Donor ProcessorHow to Split WorkTheoretical Results (1)Theoretical Results (2)Distributed Task Queue ReferencesDiffusion-Based Load BalancingDiffusion-based load balancingMixed Parallelism: DigressionMixed Parallelism StrategiesWhich Strategy to UseSwitch Parallelism: A Special CaseSimple Performance Model for Data ParallelismPowerPoint PresentationValues of Sigma (Problem Size for Half Peak)Modeling PerformanceSimulated Efficiency of EigensolverSimulated efficiency of Sparse CholeskyActual Speed of Sign Function EigensolverGraph Partitioning for Load BalancingDefinition of Graph PartitioningApplicationsSparse Matrix Vector MultiplicationCost of Graph PartitioningFirst Heuristic: Repeated Graph BisectionOverview of Bisection HeuristicsEdge Separators vs. Vertex SeparatorsNodal Coordinates: How Well Can We Do?Nodal Coordinates: Inertial PartitioningInertial Partitioning: Choosing LInertial Partitioning: choosing L (continued)Nodal Coordinates: Random SpheresRandom Spheres: Well Shaped GraphsGeneralizing Lipton/Tarjan to Higher DimensionsStereographic ProjectionChoosing a Random SphereRandom Sphere Algorithm (Gilbert)Nodal Coordinates: SummaryExtraBest-First SearchBranch and Bound Search Revisited01/14/19 CS267, Yelick 1CS 267: Applications of Parallel ComputersLoad BalancingKathy Yelickhttp://www-inst.eecs.berkeley.edu/~cs26701/14/19 CS267, Yelick 2Load Imbalance in Parallel ApplicationsThe primary sources of inefficiency in parallel codes:•Poor single processor performance •Typically in the memory system•Too much parallelism overhead•Thread creation, synchronization, communication•Load imbalance•Different amounts of work across processors•Computation and communication•Different speeds (or available resources) for the processors•Possibly due to load on the machine•Recognizing load imbalance•Time spent at synchronization is high and is uneven across processors: don’t take min/max/med/mean of barrier times01/14/19 CS267, Yelick 3Measuring Load Imbalance•Challenges:•Can be hard to separate from high synch overhead•Especially subtle if not bulk-synchronous•“Spin locks” can make synchronization look like useful work•Note that imbalance may change over phases•Insufficient parallel always leads to load imbalance•Tools like TAU (shown) can help01/14/19 CS267, Yelick 4Load Balancing OverviewLoad balancing differs with properties of the tasks (chunks of work):•Tasks costs•Do all tasks have equal costs?•If not, when are the costs known?•Before starting, when task created, or only when task ends•Task dependencies•Can all tasks be run in any order (including parallel)?•If not, when are the dependencies known?•Before starting, when task created, or only when task ends•Locality•Is it important for some tasks to be scheduled on the same processor (or nearby) to reduce communication cost?•When is the information about communication known?01/14/19 CS267, Yelick 5Task Cost Spectrum01/14/19 CS267, Yelick 6Task Dependency Spectrum01/14/19 CS267, Yelick 7Task Locality Spectrum (Communication)01/14/19 CS267, Yelick 8Spectrum of SolutionsOne of the key questions is when certain information about the load balancing problem is knownLeads to a spectrum of solutions:•Static scheduling. All information is available to scheduling algorithm, which runs before any real computation starts. (offline algorithms)•Semi-static scheduling. Information may be known at program startup, or the beginning of each timestep, or at other well-defined points. Offline algorithms may be used even though the problem is dynamic.•Dynamic scheduling. Information is not known until mid-execution. (online algorithms)01/14/19 CS267, Yelick 9Dynamic Load Balancing•Motivation for dynamic load balancing•Search algorithms•Centralized load balancing•Overview•Special case for schedule independent loop iterations•Distributed load balancing•Overview•Engineering•Theoretical results•Load balancing in general•Example scheduling problem: mixed parallelism•Demonstrate use of coarse performance models01/14/19 CS267, Yelick 10Search•Search problems are often:•Computationally expensive•Have very different parallelization strategies than physical simulations.•Require dynamic load balancing•Examples:•Optimal layout of VLSI chips•Robot motion planning•Chess and other games (N-queens)•Speech processing•Eigenvalue search•Constructing phylogeny tree from set of genes01/14/19 CS267, Yelick 11Example Problem: Tree Search•In Tree Search the tree unfolds dynamically•May be a graph if there are common sub-problems along different paths•Graphs unlike meshes which are precomputed and have no ordering constraintsTerminal node (non-goal)Non-terminal nodeTerminal node (goal)01/14/19 CS267, Yelick 12Sequential Search Algorithms•Depth-first search (DFS)•Simple backtracking•Search to bottom, backing up to last choice if necessary•Depth-first branch-and-bound•Keep track of best solution so far (“bound”)•Cut off sub-trees that are guaranteed to be worse than bound•Iterative Deepening•Choose a bound on search depth, d and use DFS up to depth d•If no solution is found, increase d and start again•Iterative deepening A* uses a lower bound estimate of cost-to-solution as the bound•Breadth-first search (BFS)•Search across a given level in the tree01/14/19 CS267, Yelick 13Parallel Search•Consider simple backtracking search•Use static load balancing: spawn each new task on an idle processor, until all have a subtreeLoad balance on 2 processors Load balance on 4 processors01/14/19 CS267, Yelick 14Centralized Scheduling•Keep a queue of task waiting to be done•May be done by manager task•Or a shared data structure protected by locksTask


View Full Document

Berkeley COMPSCI C267 - Load Balancing

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Load Balancing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Load Balancing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Load Balancing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?