DOC PREVIEW
RIT EECC 756 - Steps in Creating a Parallel Program

This preview shows page 1-2-3-22-23-24-44-45-46 out of 46 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 46 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Steps in Creating a Parallel ProgramParallel Programming for PerformanceSuccessive Refinement of Parallel Program PerformancePartitioning for PerformanceLoad Balancing and Synch Wait Time ReductionIdentifying Concurrency: DecompositionIdentifying Concurrency (continued)Levels of Parallelism in VLSI RoutingManaging Concurrency: AssignmentDynamic Assignment/MappingDynamic Tasking with Task QueuesImpact of Dynamic AssignmentAssignment: Determining Task GranularityReducing SerializationImplications of Load BalancingReducing Inherent CommunicationDomain DecompositionDomain Decomposition (continued)Finding a Domain DecompositionImplications of Communication-to-Computation RatioReducing Extra Work (Overheads)Summary of Parallel Algorithms AnalysisLimitations of Parallel Algorithm AnalysisGeneric Multiprocessor ArchitectureExtended Memory-Hierarchy View of MultiprocessorsExtended HierarchyArtifactual Communication in Extended HierarchyCommunication and ReplicationWorking Set PerspectiveOrchestration for PerformanceReducing Artifactual CommunicationExploiting Temporal LocalityExploiting Spatial LocalitySpatial Locality ExampleTradeoffs with Inherent CommunicationExample Performance ImpactStructuring CommunicationReducing OverheadReducing Network DelayReducing ContentionTypes of ContentionOverlapping CommunicationSummary of TradeoffsRelationship Between PerspectivesComponents of Execution Time From Processor PerspectiveSummaryEECC756 - ShaabanEECC756 - Shaaban#1 lec # 5 Spring2002 3-28-2002Steps in Creating a Parallel ProgramSteps in Creating a Parallel Program•4 steps: Decomposition, Assignment, Orchestration, Mapping•Performance Goal of the steps: Minimize resulting execution time by:–Balancing computations on processors (every processor does the same amount of work). – Minimizing communication cost and other overheads associated with each step.P0TasksProcesses ProcessorsP1P2P3p0p1p2p3p0p1p2p3PartitioningSequentialcomputationParallelprogramAssignmentDecompositionMappingOrchestrationEECC756 - ShaabanEECC756 - Shaaban#2 lec # 5 Spring2002 3-28-2002Parallel Programming for PerformanceParallel Programming for PerformanceA process of Successive Refinement of the stepsA process of Successive Refinement of the steps•Partitioning for Performance:Partitioning for Performance:–Load Balancing and Synch Wait Time ReductionLoad Balancing and Synch Wait Time Reduction–Identifying & Managing Concurrency Identifying & Managing Concurrency •Static Vs. Dynamic AssignmentStatic Vs. Dynamic Assignment•Determining Optimal Task GranularityDetermining Optimal Task Granularity•Reducing SerializationReducing Serialization–Reducing Inherent CommunicationReducing Inherent Communication•Minimizing Minimizing communication to computation ratio–Efficient Domain DecompositionEfficient Domain Decomposition–Reducing Additional OverheadsReducing Additional Overheads•Orchestration/Mapping for Performance: for Performance:–Extended Memory-Hierarchy View of MultiprocessorsExtended Memory-Hierarchy View of Multiprocessors•Exploiting Spatial Locality/Reduce Artifactual Communication Exploiting Spatial Locality/Reduce Artifactual Communication •Structuring CommunicationStructuring Communication•Reducing ContentionReducing Contention•Overlapping CommunicationOverlapping CommunicationEECC756 - ShaabanEECC756 - Shaaban#3 lec # 5 Spring2002 3-28-2002Successive Refinement of Parallel Successive Refinement of Parallel Program PerformanceProgram Performance Partitioning is often independent of architecture, and may be done first:–View machine as a collection of communicating processors•Balancing the workload across processes/processors.•Reducing the amount of inherent communication.•Reducing extra work to find a good assignment.–Above three issues are conflicting.Then deal with interactions with architecture (Orchestration, Mapping) :–View machine as an extended memory hierarchy:•Extra communication due to architectural interactions.•Cost of communication depends on how it is structured–This may inspire changes in partitioning.EECC756 - ShaabanEECC756 - Shaaban#4 lec # 5 Spring2002 3-28-2002Partitioning for PerformancePartitioning for Performance•Balancing the workload across processes:–reducing wait time at synch points.•Reducing interprocess inherent communication.•Reducing extra work needed to find a good assignment.These algorithmic issues have extreme trade-offs:–Minimize communication => run on 1 processor. => extreme load imbalance.–Maximize load balance => random assignment of tiny tasks. => no control over communication.–Good partition may imply extra work to compute or manage it•The goal is to compromise between the above extremesEECC756 - ShaabanEECC756 - Shaaban#5 lec # 5 Spring2002 3-28-2002Load Balancing and Synch Wait Load Balancing and Synch Wait Time ReductionTime ReductionLimit on speedup:–Work includes data access and other costs.–Not just equal work, but must be busy at same time to minimize synch wait time.Four parts to load balancing and reducing synch wait time:1. Identify enough concurrency.2. Decide how to manage it.3. Determine the granularity at which to exploit it.4. Reduce serialization and cost of synchronization.Sequential WorkMax Work on any ProcessorSpeedupproblem(p) EECC756 - ShaabanEECC756 - Shaaban#6 lec # 5 Spring2002 3-28-2002Identifying Concurrency:Identifying Concurrency:DecompositionDecomposition•Concurrency may be found by:–Examining loop structure of sequential algorithm.–Fundamental data dependences. –Exploit the understanding of the problem to devise algorithms with more concurrency.•Data Parallelism versus Function Parallelism:•Data Parallelism: –Parallel operation sequences performed on elements of large data structures –Such as resulting from parallization of loops.–Usually easy to load balance.–Degree of concurrency usually increase with input or problem size.EECC756 - ShaabanEECC756 - Shaaban#7 lec # 5 Spring2002 3-28-2002Identifying Concurrency (continued)Identifying Concurrency (continued)Function or Task parallelism:– Entire large tasks (procedures) that can be done in parallel on the same or different data. e.g. different independent grid computations in Ocean.– Software Pipelining: Different functions or software stages of the pipeline performed on different data:•As in video encoding/decoding, or polygon


View Full Document

RIT EECC 756 - Steps in Creating a Parallel Program

Documents in this Course
Load more
Download Steps in Creating a Parallel Program
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Steps in Creating a Parallel Program and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Steps in Creating a Parallel Program 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?