DOC PREVIEW
CMU CS 15740 - Lecture

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallel Programming Todd C. Mowry CS 740 October 16 & 18, 2000Motivating ProblemsSimulating Ocean CurrentsSimulating Galaxy EvolutionRendering Scenes by Ray TracingParallel Programming TaskSteps in Creating a Parallel ProgramPartitioning for PerformanceLoad Balance and Synch Wait TimeDeciding How to Manage ConcurrencyDynamic AssignmentDynamic Tasking with Task QueuesDetermining Task GranularityReducing SerializationReducing Inherent CommunicationDomain DecompositionReducing Extra WorkSummary of TradeoffsImpact of Programming ModelShared-Memory ImplementationMessage-Passing ImplementationsCase StudiesCase 1: Simulating Ocean CurrentsTime Step in Ocean SimulationPartitioningTwo Static Partitioning SchemesImpact of Memory LocalityImpact of Line Size & Data DistributionCase 2: Simulating Galaxy EvolutionBarnes-HutApplication StructurePartitioningLoad BalancingA Partitioning Approach: ORBAnother Approach: CostzonesBarnes-Hut PerformanceParallel ProgrammingTodd C. MowryCS 740October 16 & 18, 2000Topics•Motivating Examples•Parallel Programming for High Performance•Impact of the Programming Model•Case Studies–Ocean simulation–Barnes-Hut N-body simulationCS 740 F’00– 2 –Motivating ProblemsSimulating Ocean Currents•Regular structure, scientific computingSimulating the Evolution of Galaxies•Irregular structure, scientific computingRendering Scenes by Ray Tracing•Irregular structure, computer graphics•Not discussed here (read in book)CS 740 F’00– 3 –Simulating Ocean Currents•Model as two-dimensional grids•Discretize in space and time–finer spatial and temporal resolution => greater accuracy•Many different computations per time step–set up and solve equations•Concurrency across and within grid computations(a) Cross sections (b) Spatial discretization of a cross sectionCS 740 F’00– 4 –Simulating Galaxy Evolution•Simulate the interactions of many stars evolving over time•Computing forces is expensive•O(n2) brute force approach•Hierarchical Methods take advantage of force law: Gm1m2r2•Many time-steps, plenty of concurrency across stars within oneStar on which forcesare being computedStar too close toapproximateSmall gr oup far enough away toapproximate to center of massLarge group farenough away toapproximateCS 740 F’00– 5 –Rendering Scenes by Ray Tracing•Shoot rays into scene through pixels in image plane•Follow their paths–they bounce around as they strike objects–they generate new rays: ray tree per input ray•Result is color and opacity for that pixel•Parallelism across raysAll case studies have abundant concurrencyCS 740 F’00– 6 –Parallel Programming TaskBreak up computation into tasks•assign tasks to processorsBreak up data into chunks•assign chunks to memoriesIntroduce synchronization for:•mutual exclusion•event orderingCS 740 F’00– 7 –Steps in Creating a Parallel Program4 steps: Decomposition, Assignment, Orchestration, Mapping•Done by programmer or system software (compiler, runtime, ...)•Issues are the same, so assume programmer does it all explicitlyP0TasksProcesses ProcessorsP1P2P3p0p1p2p3p0p1p2p3PartitioningSequentialcomputationParallelprogramAssignmentDecompositionMappingOrchestrationCS 740 F’00– 8 –Partitioning for PerformanceBalancing the workload and reducing wait time at synch pointsReducing inherent communicationReducing extra workEven these algorithmic issues trade off:•Minimize comm. => run on 1 processor => extreme load imbalance•Maximize load balance => random assignment of tiny tasks => no control over communication•Good partition may imply extra work to compute or manage itGoal is to compromise•Fortunately, often not difficult in practiceCS 740 F’00– 9 –Load Balance and Synch Wait TimeLimit on speedup: Speedupproblem(p) < •Work includes data access and other costs•Not just equal work, but must be busy at same timeFour parts to load balance and reducing synch wait time:1. Identify enough concurrency2. Decide how to manage it3. Determine the granularity at which to exploit it4. Reduce serialization and cost of synchronizationSequential WorkMax Work on any ProcessorCS 740 F’00– 10 –Deciding How to Manage ConcurrencyStatic versus Dynamic techniquesStatic:•Algorithmic assignment based on input; won’t change•Low runtime overhead•Computation must be predictable•Preferable when applicable (except in multiprogrammed/heterogeneous environment)Dynamic:•Adapt at runtime to balance load•Can increase communication and reduce locality•Can increase task management overheadsCS 740 F’00– 11 –Dynamic AssignmentProfile-based (semi-static):•Profile work distribution at runtime, and repartition dynamically•Applicable in many computations, e.g. Barnes-Hut, some graphicsDynamic Tasking:•Deal with unpredictability in program or environment (e.g. Raytrace)–computation, communication, and memory system interactions –multiprogramming and heterogeneity–used by runtime systems and OS too•Pool of tasks; take and add tasks until done•E.g. “self-scheduling” of loop iterations (shared loop counter)CS 740 F’00– 12 –Dynamic Tasking with Task QueuesCentralized versus distributed queuesTask stealing with distributed queues•Can compromise comm and locality, and increase synchronization•Whom to steal from, how many tasks to steal, ...•Termination detection•Maximum imbalance related to size of taskQQ0Q2Q1Q3All remove tasksP0 inserts P1 insertsP2 insertsP3 insertsP0 removes P1 removes P2 removes P3 removes(b) Distributed task queues (one per pr ocess)Others maystealAll processesinsert tasks(a) Centralized task queueCS 740 F’00– 13 –Determining Task GranularityTask granularity: amount of work associated with a taskGeneral rule:•Coarse-grained => often less load balance•Fine-grained => more overhead; often more communication and contentionCommunication and contention actually affected by assignment, not size•Overhead by size itself too, particularly with task queuesCS 740 F’00– 14 –Reducing SerializationCareful about assignment and orchestration (including scheduling)Event synchronization•Reduce use of conservative synchronization–e.g. point-to-point instead of barriers, or granularity of pt-to-pt•But fine-grained synch more difficult to program, more synch ops. Mutual exclusion•Separate locks for separate data–e.g. locking records in a database: lock per process, record, or field–lock


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?