UVA CS 671 - Code Optimization of Parallel Programs (55 pages)

Previewing pages 1, 2, 3, 4, 25, 26, 27, 52, 53, 54, 55 of 55 page document View the full content.
View Full Document

Code Optimization of Parallel Programs



Previewing pages 1, 2, 3, 4, 25, 26, 27, 52, 53, 54, 55 of actual document.

View the full content.
View Full Document
View Full Document

Code Optimization of Parallel Programs

66 views


Pages:
55
School:
University Of Virginia
Course:
Cs 671 - Data-Flow Analysis
Unformatted text preview:

Code Optimization of Parallel Programs Vivek Sarkar Rice University vsarkar rice edu FPU ISU ISU FXU FXU FPU IDU IDU LSU IFU BXU L2 L3 Directory Control LSU L2 IFU BXU L2 Parallel Software Challenges Focus Area for this Talk Domain specific implicitly parallel programming models e g Matlab stream processing map reduce Sawzall Domain specific Programming Models Parallelism in middleware e g transactions relational databases web services J2EE containers Middleware Parallel application libraries e g linear algebra graphics imaging signal processing security Application Libraries Parallel Debugging and Performance Tools e g Eclipse Parallel Tools Platform TotalView Thread Checker Programming Tools Explicitly parallel languages e g OpenMP Java Concurrency NET Parallel Extensions Intel TBB CUDA Cilk MPI Unified Parallel C Co Array Fortran X10 Chapel Fortress Parallel intermediate representation optimization of synchronization data transfer automatic parallelization Languages Static Dynamic Optimizing Compilers Code partitioning for accelerators data transfer optimizations SIMDization space time scheduling power management Multicore Back ends Parallel Runtime System Libraries OS and Hypervisors Parallel runtime and system libraries for task scheduling synchronization parallel data structures 2 Virtualization scalable management of heterogeneous resources per core frequency power Outline Paradigm Shifts Anomalies in Optimizing Parallel Code Incremental vs Comprehensive Approaches to Code Optimization of Parallel Code Rice Habanero Multicore Software project 3 Our Current Paradigm for Code Optimization has served us well for Fifty Years Fortran Autocoder II ALPHA Translation Translation Translation IL OPTIMIZER IL Stretch Harvest Compiler Organization 1958 1962 REGISTER ALLOCATOR Source Compiling for Parallelism Fran Allen Turning Lecture June 2007 IL ASSEMBLER OBJECT CODE STRETCH 4 STRETCH HARVEST and has been adapted to meet challenges along the way Interprocedural analysis Array dependence analysis Pointer alias analysis Instruction scheduling software pipelining SSA form Profile directed optimization Dynamic compilation Adaptive optimization Auto tuning 5 but is now under siege because of parallelism Proliferation of parallel hardware Multicore manycore accelerators clusters Proliferation of parallel libraries and languages OpenMP Java Concurrency NET Parallel Extensions Intel TBB Cilk MPI UPC CAF X10 Chapel Fortress 6 Paradigm Shifts The Structure of Scientific Revolutions Thomas S Kuhn 1970 A paradigm is a scientific structure or framework consisting of Assumptions Laws Techniques Normal science is a puzzle solving activity governed by the rules of the paradigm It is uncritical of the current paradigm Crisis sets in when a series of serious anomalies appear The emergence of new theories is generally preceded by a period of pronounced professional insecurity Scientists engage in philosophical and metaphysical disputes A revolution or paradigm shift occurs when an an entire paradigm is replaced by another 7 Kuhn s History of Science Immature Science Revolution Normal Science Crisis Anomalies Revolution A new paradigm emerges Old Theory well established many followers many anomalies New Theory few followers untested new concepts techniques accounts for anomalies asks new questions Source www philosophy ed ac uk ug study ug phil sci1h phil sci files L10 Kuhn1 ppt 8 Some Well Known Paradigm Shifts Newton s Laws to Einstein s Theory of Relativity Ptolemy s geocentric view to Copernicus and Galileo s heliocentric view Creationism to Darwin s Theory of Evolution 9 Outline Paradigm Shifts Anomalies in Optimizing Parallel Code Incremental vs Comprehensive Approaches Rice Habanero Multicore Software project 10 What anomalies do we see when optimizing parallel code Examples 1 Control flow rules 2 Data flow rules 3 Load elimination rules 11 1 Control Flow Rules from Sequential Code Optimization Control Flow Graph Node Basic Block Edge Transfer of Control Flow Succ b successors of block b Pred b predecessors of block b Dominators Block d dominates block b if every sequential path from START to b includes d Dom b set of dominators of block b Every block has a unique immediate dominator parent in dominator tree 12 Dominator Tree Dominator Example START Control Flow Graph BB1 START BB2 BB3 BB4 BB1 T F BB2 STOP BB3 QuickTime and a decompressor are needed to see this picture BB4 QuickTime and a decompressor are needed to see this picture STOP 13 Anomalies in Control Flow Rules for Parallel Code Parallel Control Flow Graph BB1 parbegin BB2 BB3 parend BB4 BB1 FORK BB2 BB3 JOIN Does B4 have a unique immediate dominator Can the dominator relation be represented as a tree 14 BB4 2 Data Flow Rules from Sequential Code Optimization Example Reaching Definitions REACHin n set of definitions d s t there is a sequential path from d to n in the CFG and d is not killed along that path QuickTime and a decompressor are needed to see this picture 15 Anomalies in Data Flow Rules for Parallel Code S1 X1 parbegin Task 1 S2 X2 post ev2 S3 post ev3 S4 wait ev8 X4 Task 2 S5 S6 wait ev2 S7 X7 S8 wait ev3 post ev8 parend control sync What definitions reach COEND What if there were no synchronization edges QuickTime and a decompressor are needed to see this picture How should the data flow equations be defined for parallel code 16 3 Load Elimination Rules from Sequential Code Optimization A load instruction at point P T3 q is redundant if the value of q is available at point P T1 q T1 q T2 p T2 p T3 17 T3 Anomalies in Load Elimination Rules for Parallel Code Original Version Assume that p q and that p q 0 initially TASK 1 T1 q T2 p T3 q print T1 T2 T3 TASK 2 p 1 Question Is 0 1 0 permitted as a possible output Answer It depends on the programming model It is not permitted by Sequential Consistency Lamport 1979 But it is permitted by Location Consistency Gao Sarkar 1993 2000 18 Anomalies in Load Elimination Rules for Parallel Code After Load Elimination Assume that p q and that p q 0 initially TASK 1 T1 q T2 p T3 T1 print T1 T2 T3 TASK 2 p 1 Question Is 0 1 0 permitted as a possible output Answer Yes it will be permitted by Sequential Consistency if load elimination is performed 19 Outline Paradigm Shifts Anomalies in Optimizing Parallel Code Incremental vs Comprehensive Approaches to Code Optimization of Parallel Code Rice Habanero Multicore Software project 20 Incremental Approaches to coping with Parallel Code


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Code Optimization of Parallel Programs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Code Optimization of Parallel Programs and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?