DOC PREVIEW
Synchronization Transformations for Parallel Computing

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Synchronization Transformations for Parallel ComputingMotivationTalk OutlineModel of ComputationReducing Synchronization OverheadPowerPoint PresentationSynchronization OptimizationLock CancellationAcquire Lock MovementRelease Lock MovementSynchronization Optimization AlgorithmInterprocedural Control Flow GraphAcquire Movement PathsRelease Movement PathsMigration Paths and Meeting EdgeIntersection of PathsCompensation NodesFinal ResultSynchronization Optimization Trade-OffFalse Exclusion PolicyExperimental ResultsLock OverheadContention OverheadPerformance Results : Barnes-HutPerformance Results: WaterPerformance Results: StringChoosing Best PolicySolution: Dynamic FeedbackDynamic FeedbackDynamic Feedback : Barnes-HutDynamic Feedback : WaterDynamic Feedback : StringRelated WorkConclusionsSynchronization TransformationsforParallel ComputingPedro DinizandMartin RinardDepartment of Computer ScienceUniversity of California, Santa Barbarahttp://www.cs.ucsb.edu/~{pedro,martin}MotivationParallel Computing Becomes Dominant Form of ComputationParallel Machines Require Parallel SoftwareParallel Constructs Require New Analysis and Optimization TechniquesOur GoalEliminate Synchronization OverheadTalk Outline•Motivation•Model of Computation•Synchronization Optimization Algorithm•Applications Experience•Dynamic Feedback•Related Work•ConclusionsModel of Computation•Parallel Programs•Serial Phases•Parallel Phases•Single Address Space•Atomic Operations on Shared Data•Mutual Exclusion Locks•Acquire Constructs•Release ConstructsAcqS1MutualExclusionRegionRelReducing Synchronization OverheadAcqS1S2RelS3RelAcqSynchronization OptimizationIdea:Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only OnceResult:Reduction in the Number of Executed Acquire and Release ConstructsMechanism:Lock Movement Transformations andLock Cancellation TransformationsLock CancellationAcquire Lock MovementRelease Lock MovementSynchronization Optimization AlgorithmOverview:•Find Two Mutual Exclusion Regions With the Same Lock•Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent•Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion RegionInterprocedural Control Flow GraphAcquire Movement PathsRelease Movement PathsMigration Paths and Meeting EdgeIntersection of PathsCompensation NodesFinal ResultSynchronization Optimization Trade-Of•Advantage: •Reduces Number of Executed Acquires and Releases•Reduces Acquire and Release Overhead•Disadvantage: May Introduce False Exclusion•Multiple Processors Attempt to Acquire Same Lock•Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion RegionFalse Exclusion PolicyGoal: Limit Potential Severity of False ExclusionMechanism: Constrain the Application of Basic Transformations•Original: Never Apply Transformations•Bounded: Apply Transformations only onCycle-Free Subgraphs of ICFG•Aggressive: Always apply TransformationsExperimental Results•Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96]•Set of Complete Scientific Applications (C++ subset)•Barnes-Hut N-Body Solver (1500 lines of Code)•Liquid Water Simulation Code (1850 lines of Code)•Seismic Modeling String Code (2050 lines of Code)•Diferent False Exclusion Policies•Performance of Generated Parallel Code on Stanford DASH Shared-Memory MultiprocessorLock Overhead 0204060Percentage Lock OverheadBarnes-Hut (16K Particles)OriginalBoundedAggressivePercentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks0204060Percentage Lock OverheadWater (512 Molecules)OriginalBoundedAggressive0204060Percentage Lock OverheadString (Big Well Model)OriginalAggressiveContention OverheadContention PercentagePercentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors10002550750 4 8 12 16ProcessorsBarnes-Hut (16K Bodies)02550751000 4 8 12 16ProcessorsWater (512 Molecules)02550751000 4 8 12 16ProcessorsString (Big Well Model)OriginalBoundedAggressive0246810121416Speedup0 2 4 6 8 10 12 14 16Number of ProcessorsIdealAggressiveBoundedOriginalBarnes-Hut (16384 bodies)Performance Results : Barnes-HutPerformance Results: WaterIdealAggressiveBoundedOriginal02468101214160 2 4 6 8 10 12 14 16SpeedupNumber of ProcessorsWater (512 Molecules)Performance Results: StringString (Big Well Model)SpeedupNumber of Processors02468101214160 2 4 6 8 10 12 14 16IdealOriginalAggressiveChoosing Best Policy•Best False Exclusion Policy May Depend On•Topology of Data Structures•Dynamic Schedule Of Computation•Information Required to Choose Best Policy Unavailable at Compile Time•Complications•Diferent Phases May Have Diferent Best Policy•In Same Phase, Best Policy May Change Over TimeSolution: Dynamic Feedback•Generated Code Consists of•Sampling Phases: Measure Performance of Diferent Policies•Production Phases : Use Best Policy From Sampling Phase•Periodically Resample to Discover Changes in Best Policy•Guaranteed Performance BoundsDynamic FeedbackAggressiveOriginalBoundedTimeOverheadSampling Phase Production Phase Sampling PhaseAggressiveCodeVersionDynamic Feedback : Barnes-Hut 0246810121416Speedup0 2 4 6 8 10121416Number of ProcessorsIdealAggressiveDynamic FeedbackBoundedOriginalBarnes-Hut (16384 bodies)Dynamic Feedback : Water02468101214160 2 4 6 8 10 12 14 16SpeedupNumber of ProcessorsIdealBoundedOriginalAggressiveDynamic FeedbackWater (512 Molecules)Dynamic Feedback : StringString (BigWell Model) 02468101214160 2 4 6 8 10 12 14 16SpeedupNumber of ProcessorsIdealOriginalAggressiveDynamic FeedbackRelated Work•Parallel Loop Optimizations (e.g. [Tseng:PPoPP95])•Array-based Scientific Computations•Barriers vs. Cheaper Mechanisms•Concurrent Object-Oriented Programs (e.g. [PZC:POPL95])•Merge Access Regions for Invocations of Exclusive Methods•Concurrent Constraint Programming•Bring Together Ask and Tell Constructs•Efficient Synchronization Algorithms•Efficient Implementations of Synchronization PrimitivesConclusions•Synchronization Optimizations•Basic Synchronization Transformations for Locks•Synchronization Optimization Algorithm•Integrated into Prototype Parallelizing Compiler•Object-Based Programs with Dynamic Data Structures•Commutativity Analysis•Experimental


Synchronization Transformations for Parallel Computing

Download Synchronization Transformations for Parallel Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Synchronization Transformations for Parallel Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Synchronization Transformations for Parallel Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?