CS 267 The Titanium LanguageMotivation: Target ProblemsTitanium BackgroundSummary of Features Added to JavaOutlineSPMD Execution ModelGlobal and Local ViewsBarriers and SingleExplicit Communication: BroadcastSingle Variable ExampleSlide 11Global Address SpaceUse of Global / LocalSlide 14Aside on Titanium ArraysExplicit Communication: ExchangeDistributed Data StructuresRegion-Based Memory ManagementSlide 19Slide 20Java ObjectsJava Object ExampleImmutable Classes in TitaniumExample of Immutable ClassesOperator OverloadingArrays in JavaMultidimensional Arrays in TitaniumUnordered IterationPoint, RectDomain, Arrays in GeneralSimple Array ExampleExample: DomainExample using Domains and foreachMore Array OperationsMatMul with Titanium ArraysExample: Setting Boundary ConditionsTemplatesExample of TemplatesUsing Templates: Distributed ArraysCross-Language CallsAre these features expressive?Slide 41Java Compiled by Titanium CompilerSlide 43Local Pointer AnalysisApplications in TitaniumError on High-Wavenumber ProblemScalable Parallel Poisson SolverAMR Gas DynamicsCompiler Optimizations of Sparse Matrix Code in TitaniumCoding Challenges: Block-Structured AMRLanguages Support Helps ProductivityTitanium AMR PerformanceSlide 53Titanium Compiler StatusCurrent Work & Future PlansTitanium Group (Past and Present)CS 267The Titanium LanguageCS 267The Titanium LanguageKathy Yelickhttp://titanium.cs.berkeley.eduKathy Yelickhttp://titanium.cs.berkeley.eduJanuary 14, 2019CS267 Lecture 122Motivation: Target ProblemsMany modeling problems in astrophysics, biology, material science, and other areas require Enormous range of spatial and temporal scalesTo solve interesting problems, one needs:Adaptive methodsLarge scale parallel machinesTitanium is designed forStructured gridsLocally-structured grids (AMR)Particle/Mesh methodsSource: J. Bell, LBNLJanuary 14, 2019CS267 Lecture 123Titanium BackgroundBased on Java, a cleaner C++Classes, automatic memory management, etc.Compiled to C and then machine code, no JVMSame parallelism model at UPC and CAFSPMD parallelismDynamic Java threads are not supportedOptimizing compilerAnalyzes global synchronizationOptimizes pointers, communication, memoryJanuary 14, 2019CS267 Lecture 124Summary of Features Added to JavaMultidimensional arrays: iterators, subarrays, copyingImmutable (“value”) classesTemplatesOperator overloadingScalable SPMD parallelism replaces threadsGlobal address space with local/global reference distinctionChecked global synchronization Zone-based memory management (regions)Libraries for collective communication, distributed arrays, bulk I/O, performance profilingJanuary 14, 2019CS267 Lecture 125OutlineTitanium Execution ModelSPMDGlobal SynchronizationSingleTitanium Memory ModelSupport for Serial ProgrammingPerformance and ApplicationsCompiler/Language StatusJanuary 14, 2019CS267 Lecture 126SPMD Execution ModelTitanium has the same execution model as UPC and CAFBasic Java programs may be run as Titanium programs, but all processors do all the work.E.g., parallel hello world class HelloWorld { public static void main (String [] argv) { System.out.println(“Hello from proc “ + Ti.thisProc() + “ out of “ + Ti.numProcs()); } }Global synchronization done using Ti.barrier()January 14, 2019CS267 Lecture 127Global and Local ViewsWhen writing parallel programs, especially SPMD programs, there are 2 types of functionsLocal: may be called independently by any thread; more than 1 may call concurrentlyGlobal/collective: all threads call these togetherConvention in UPC is to put “all_” in the nameCommon source of bugs is barriers or other collective operations inside branches or loopsbarrier, broadcast, reduction, exchangeTitanium compiler proves that no such deadlocks exist, or a compiler-time error producedJanuary 14, 2019CS267 Lecture 128Barriers and SingleTo put a barrier (or equivalent) inside a method, you need to make the message “single” (aka “sglobal”).A “single” method is one called by all procs public single static void allStep(...)These single annotations on methods are optional, but useful in understanding compiler messagesTo put a barrier (or single method) inside a branch or loop, you need to use a “single” variable for branchA “single” variable has same value on all procs int single timestep = 0;Compiler proves that all processors call barriers together "Barrier Inference" [Gay & Aiken]January 14, 2019CS267 Lecture 129Explicit Communication: BroadcastBroadcast is a one-to-all communication broadcast <value> from <processor>For example: int count = 0; int allCount = 0; if (Ti.thisProc() == 0) count = computeCount(); allCount = broadcast count from 0;The processor number in the broadcast must be single; all constants are single.All processors must agree on the broadcast source.The allCount variable could be declared single.All processes have the same value after broadcast.January 14, 2019CS267 Lecture 1210Single Variable ExampleBarriers and single in N-body Simulation class ParticleSim { public static void main (String [] argv) { int single allTimestep = 0; int single allEndTime = 100; for (; allTimestep < allEndTime; allTimestep++){ read remote particles, compute forces on mine Ti.barrier(); write to my particles using new forces Ti.barrier(); } } } Single methods inferred by the compilerJanuary 14, 2019CS267 Lecture 1211OutlineTitanium Execution ModelTitanium Memory ModelGlobal and Local ReferencesExchange: Building Distributed Data StructuresRegion-Based Memory ManagementSupport for Serial ProgrammingPerformance and ApplicationsCompiler/Language StatusJanuary 14, 2019CS267 Lecture 1212Global Address SpaceGlobally shared address space is partitioned References (pointers) are either local or global (meaning possibly remote)Object heapsare shared by defaultGlobal address spacex: 1y: 2Program stacks are privatel: l: l: g: g: g: x: 5y: 6x: 7y: 8p0 p1 pnJanuary 14, 2019CS267 Lecture 1213Use of Global / LocalGlobal references (pointers) may point to remote locationsReference are global by default (unlike UPC)Easy to port shared-memory programsGlobal pointers are more expensive than localTrue even when data is on the same processorCosts
View Full Document