DOC PREVIEW
Berkeley COMPSCI C267 - Titanium: A Java Dialect for High Performance Computing

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Titanium: A Java Dialect for High Performance ComputingKatherine YelickU.C. Berkeley and LBNLMarch 5, 2004CS267 Lecture 122Motivation: Target Problems!Many modeling problems in astrophysics, biology, material science, and other areas require 0Enormous range of spatial and temporal scales!To solve interesting problems, one needs:0Adaptive methods0Large scale parallel machines!Titanium is designed for0Structured grids0Locally-structured grids (AMR)0Unstructured grids (in progress)Source: J. Bell, LBNLMarch 5, 2004CS267 Lecture 123Titanium Background!Based on Java, a cleaner C++0Classes, automatic memory management, etc.0Compiled to C and then machine code, no JVM!Same parallelism model at UPC and CAF0SPMD parallelism0Dynamic Java threads are not supported!Optimizing compiler0Analyzes global synchronization0Optimizes pointers, communication, memoryMarch 5, 2004CS267 Lecture 124Summary of Features Added to Java!Multidimensional arrays: iterators, subarrays, copying!Immutable (“value”) classes!Templates!Operator overloading!Scalable SPMD parallelism replaces threads!Global address space with local/global reference distinction!Checked global synchronization !Zone-based memory management (regions)!Libraries for collective communication, distributed arrays, bulk I/O, performance profilingMarch 5, 2004CS267 Lecture 125Outline!Titanium Execution Model0SPMD0Global Synchronization0Single!Titanium Memory Model!Support for Serial Programming!Performance and Applications!Compiler/Language StatusMarch 5, 2004CS267 Lecture 126SPMD Execution Model!Titanium has the same execution model as UPC and CAF!Basic Java programs may be run as Titanium programs, but all processors do all the work.!E.g., parallel hello worldclass HelloWorld {public static void main (String [] argv) {System.out.println(“Hello from proc “ + Ti.thisProc()+ “ out of “+ Ti.numProcs());}}!Global synchronization done using Ti.barrier()March 5, 2004CS267 Lecture 127Barriers and Single!Common source of bugs is barriers or other collective operations inside branches or loopsbarrier, broadcast, reduction, exchange!A “single” method is one called by all procspublic single static void allStep(...)!A “single” variable has same value on all procsint single timestep = 0;!Single annotation on methods is optional, but useful in understanding compiler messages!Compiler proves that all processors call barriers togetherMarch 5, 2004CS267 Lecture 128Explicit Communication: Broadcast!Broadcast is a one-to-all communicationbroadcast <value> from <processor>!For example: int count = 0;int allCount = 0;if (Ti.thisProc() == 0) count = computeCount();allCount = broadcast count from 0;!The processor number in the broadcast must be single; all constants are single.0All processors must agree on the broadcast source.!The allCount variable could be declared single.0All will have the same value after the broadcast.March 5, 2004CS267 Lecture 129More on Single!Global synchronization needs to be controlledif (this processor owns some data) {compute on itbarrier}!Hence the use of “single” variables in Titanium!If a conditional or loop block contains a barrier, all processors must execute it0conditions must contain only single variables!Compiler analysis statically enforces freedom from deadlocks due to barrier and other collectives being called non-collectively "Barrier Inference" [Gay & Aiken]March 5, 2004CS267 Lecture 1210Single Variable Example! Barriers and single in N-body Simulationclass ParticleSim {public static void main (String [] argv) {int single allTimestep = 0;int single allEndTime = 100;for (; allTimestep < allEndTime; allTimestep++){read remote particles, compute forces on mineTi.barrier();write to my particles using new forcesTi.barrier();}}} ! Single methods inferred by the compilerMarch 5, 2004CS267 Lecture 1211Outline!Titanium Execution Model!Titanium Memory Model0Global and Local References0Exchange: Building Distributed Data Structures0Region-Based Memory Management!Support for Serial Programming!Performance and Applications!Compiler/Language StatusMarch 5, 2004CS267 Lecture 1212Global Address Space!Globally shared address space is partitioned !References (pointers) are either local or global (meaning possibly remote)Object heapsare sharedGlobal address spacex: 1y: 2Program stacks are privatel: l: l: g: g: g: x: 5y: 6x: 7y: 8p0 p1 pnMarch 5, 2004CS267 Lecture 1213Use of Global / Local!Global references (pointers) may point to remote locations0Reference are global by default0Easy to port shared-memory programs!Global pointers are more expensive than local0True even when data is on the same processor0Costs of global:!space (processor number + memory address)!dereference time (check to see if local)!May declare references as local0Compiler will automatically infer local when possible0This is an important performance-tuning mechanismMarch 5, 2004CS267 Lecture 1214Global Address Space!Processes allocate locally!References can be passed to other processesclass C { public int val;... }Process 0HEAP0Process 1HEAP1val: 0lvgvlvgvC gv; // global pointerC local lv; // local pointer if (Ti.thisProc() == 0) {lv = new C();}gv = broadcast lv from 0; //data race gv.val = Ti.thisProc()+1; int winner = gv.valwinner: 2 winner: 22March 5, 2004CS267 Lecture 1215Aside on Titanium Arrays!Titanium adds its own multidimensional array class for performance!Distributed data structures are built using a 1D Titanium array!Slightly different syntax, since Java arrays still exist in Titanium, e.g.: int [1d] a;a = new int [1:100];a[1] = 2*a[1] - a[0] – a[2]; !Will discuss these more later…March 5, 2004CS267 Lecture 1216Explicit Communication: Exchange!To create shared data structures0each processor builds its own piece0pieces are exchanged (for objects, just exchange pointers)!Exchange primitive in Titaniumint [1d] single allData;allData = new int [0:Ti.numProcs()-1];allData.exchange(Ti.thisProc()*2);!E.g., on 4 procs, each will have copy of allData:0 2 4 6allDataMarch 5, 2004CS267 Lecture 1217Distributed Data Structures!Building distributed arrays: Particle [1d] single [1d] allParticle = new Particle [0:Ti.numProcs-1][1d];Particle [1d] myParticle = new Particle [0:myParticleCount-1];allParticle.exchange(myParticle);!Now each processor has array of pointers, one to each processor’s chunk of particlesP0 P1 P2All to all broadcastMarch 5, 2004CS267 Lecture 1218Region-Based Memory Management!An advantage of Java over C/C++


View Full Document

Berkeley COMPSCI C267 - Titanium: A Java Dialect for High Performance Computing

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Titanium: A Java Dialect for High Performance Computing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Titanium: A Java Dialect for High Performance Computing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Titanium: A Java Dialect for High Performance Computing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?