Scientific Applications Jing Jiang Jayanth Gummaraju Rohit Gupta Outline Application Study Vortex Architectural Issues Benchmarks 4 18 2003 Scientific Applications 2 Applications 4 18 2003 taken from Cray Research website www cray com Scientific Applications 3 Example Application Vortex N body Simulation O N2 Interactions Each Processor has N P bodies Binary Tree Reduction In this example 4 18 2003 4096 bodies 100 stages Scientific Applications 4 Code Size 4 18 2003 Scientific Applications 5 Memory Requirements 4 18 2003 Scientific Applications 6 Processing Requirements 4 18 2003 Scientific Applications 7 I O Requirements 4 18 2003 Scientific Applications 8 Communication Requirements 4 18 2003 Scientific Applications 9 General Characteristics Number Crunching Applications typically have high arithmetic memory operations Large data sets working set is also typically large but depends on application Typically low temporal locality Depending on regularity of application can have high spatial locality 4 18 2003 Scientific Applications 10 Parallelism Lots of DLP TLP ILP DLP TLP Convert DLP to TLP More flexibility compared to DLP ILP Same operation performed on all bodies Parallelism within threads Example Vortex 4 18 2003 Mostly DLP Scientific Applications 11 Architectural Issues Performance Trends Scaling faster than Moore s law 4 18 2003 taken from Cray Research website www cray com Scientific Applications 13 Processing Requirements Two approaches to achieve computational capacity 4 18 2003 Cluster Systems typically 100s 1000s of processors Stream Vector Systems fewer custom designed highly powerful processors Scientific Applications 14 Interconnection Networks Both BW and latency important Bus Need arbitration protocol Only one device at a time 4 18 2003 Crossbar Switch All processors memories connected O N2 doesn t scale well Scientific Applications Multistage Switch N X N switches built from smaller switches E g 16X16 built from 2 stages of 4X4 15 Architectural Issues in Vortex Data Memory O N FLOPS O N2 P I O Volume O N Communication Volume Communication Count O NP P2 4 18 2003 O P2 Scientific Applications 16 Current Design Challenges System Performance to Cost ratio millions of dollars to build Custom vs Cluster systems Programming model not very intuitive I O Scalability Power 4 18 2003 Scientific Applications 17 Benchmarks DLAB suite measuring performance of distributed resource sharing systems on scientific applications 4 18 2003 taken from CCLRC UK website http www cse clrc ac uk Scientific Applications 18 Benchmarks Two important performance measures Peak Performance dependent on maximum computation capacity eg Linpack Sustained Performance depends on overall system architecture interconnects memory BW DLAB measures this and other characteristics 4 18 2003 Scientific Applications 19 4 18 2003 Scientific Applications 20 Memory interconnects taken from NEC Earth simulator website www nec co jp 4 18 2003 Scientific Applications 21
View Full Document
Unlocking...