DOC PREVIEW
CMU CS 15740 - Lecture

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Page 1Parallel Programming:OverviewTodd C. MowryCS 740September 20, 2007CS 740 F’07–2–OutlineMotivating Problems (application case studies)Steps in creating a parallel programWhat a simple parallel program looks like• In the three major programming models• What primitives must a system support?Later: Performance issues and architectural interactionsPage 2CS 740 F’07–3–Motivating ProblemsSimulating Ocean Currents• Regular structure, scientific computingSimulating the Evolution of Galaxies• Irregular structure, scientific computingRendering Scenes by Ray Tracing• Irregular structure, computer graphicsData Mining• Irregular structure, information processing• Not discussed here (read in book)CS 740 F’07–4–Simulating Ocean Currents• Model as two-dimensional grids• Discretize in space and time– finer spatial and temporal resolution => greater accuracy• Many different computations per time step– set up and solve equations• Concurrency across and within grid computations(a) Cross sections (b) Spatial discretization of a cross sectionPage 3CS 740 F’07–5–Simulating Galaxy Evolution• Simulate the interactions of many stars evolving over time• Computing forces is expensive• O(n2) brute force approach• Hierarchical Methods take advantage of force law: Gm1m2r2•Many time-steps, plenty of concurrency across stars within oneStar on which forcesare being computedStar too close toapproximateSmall group far enough away toapproximate to center of massLarge group farenough away toapproximate CS 740 F’07–6–Rendering Scenes by Ray Tracing• Shoot rays into scene through pixels in image plane• Follow their paths– they bounce around as they strike objects– they generate new rays: ray tree per input ray• Result is color and opacity for that pixel• Parallelism across raysAll case studies have abundant concurrencyPage 4CS 740 F’07–7–Creating a Parallel ProgramAssumption: Sequential algorithm is given• Sometimes need very different algorithm, but beyond scopePieces of the job:• Identify work that can be done in parallel• Partition work and perhaps data among processes• Manage data access, communication and synchronization•Note: work includes computation, data access and I/OMain goal: Speedup (plus low prog. effort and resource needs)Speedup (p) = For a fixed problem:Speedup (p) = Performance(p)Performance(1)Time(1)Time(p)CS 740 F’07–8–Steps in Creating a Parallel Program4 steps: Decomposition, Assignment, Orchestration, Mapping• Done by programmer or system software (compiler, runtime, ...)• Issues are the same, so assume programmer does it all explicitlyP0TasksProcesses ProcessorsP1P2P3p0p1p2p3p0p1p2p3PartitioningSequentialcomputationParallelprogramAssignmentDecompositionMappingOrchestrationPage 5CS 740 F’07–9–Some Important ConceptsTask:• Arbitrary piece of undecomposed work in parallel computation• Executed sequentially; concurrency is only across tasks• E.g. a particle/cell in Barnes-Hut, a ray or ray group in Raytrace• Fine-grained versus coarse-grained tasksProcess (thread):• Abstract entity that performs the tasks assigned to processes• Processes communicate and synchronize to perform their tasksProcessor:• Physical engine on which process executes• Processes virtualize machine to programmer– first write program in terms of processes, then map to processorsCS 740 F’07–10–DecompositionBreak up computation into tasks to be divided among processes• Tasks may become available dynamically• No. of available tasks may vary with timei.e. identify concurrency and decide level at which to exploit itGoal: Enough tasks to keep processes busy, but not too many• # of tasks available at a time is upper bound on achievable speedupPage 6CS 740 F’07–11–Limited Concurrency: Amdahl’s Law• Most fundamental limitation on parallel speedup• If fraction sof seq execution is inherently serial, speedup <= 1/s• Example: 2-phase calculation– sweep over n-by-ngrid and do some independent computation– sweep again and add each value to global sum• Time for first phase = n2/p• Second phase serialized at global variable, so time = n2• Speedup <= or at most 2• Trick: divide second phase into two– accumulate into private sum during sweep– add per-process private sum into global sum• Parallel time is n2/p+ n2/p+ p, and speedup at best 2n2n2p+ n2p2n22n2 + p2CS 740 F’07–12–Pictorial Depiction1p1p1n2/pn2pwork done concurrentlyn2n2Timen2/p n2/p(c)(b)(a)Page 7CS 740 F’07–13–Concurrency Profiles• Area under curve is total work done, or time with 1 processor• Horizontal extent is lower bound on time (infinite processors)• Speedup is the ratio: , base case: • Amdahl’s law applies to any overhead, not just limited concurrency•Cannot usually divide into serial and parallel partConcurrency15021924728631334338041544448350452656458963366270273302004006008001,0001,2001,400Clock cycle numberfkkfkkp∑k=1∞∑k=1∞1s + 1-spCS 740 F’07–14–Steps in Creating a Parallel Program4 steps: Decomposition, Assignment, Orchestration, Mapping• Done by programmer or system software (compiler, runtime, ...)• Issues are the same, so assume programmer does it all explicitlyP0TasksProcesses ProcessorsP1P2P3p0p1p2p3p0p1p2p3PartitioningSequentialcomputationParallelprogramAssignmentDecompositionMappingOrchestrationPage 8CS 740 F’07–15–AssignmentSpecifying mechanism to divide work up among processes• E.g. which process computes forces on which stars, or which rays• Together with decomposition, also calledpartitioning• Balance workload, reduce communication and management costStructured approaches usually work well• Code inspection (parallel loops) or understanding of application• Well-known heuristics•Staticversus dynamicassignmentAs programmers, we worry about partitioning first•Usuallyindependent of architecture or prog model• But cost and complexity of using primitives may affect decisionsAs architects, we assume program does reasonable job of itCS 740 F’07–16–Orchestration• Naming data• Structuring communication• Synchronization • Organizing data structures and scheduling tasks temporallyGoals• Reduce cost of communication and synch. as seen by processors• Preserve locality of data reference (incl. data structure organization)• Schedule tasks to satisfy dependences early• Reduce overhead


View Full Document

CMU CS 15740 - Lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?