1Titanium and Java ParallelismArvind KrishnamurthyFall 2004Titanium Take the best features of threads and MPI (just like Split-C) global address space like threads (ease programming) SPMD parallelism like MPI (for performance) local/global distinction, i.e., layout matters (for performance) Based on Java, a cleaner C++ classes, memory management Language is extensible through classes domain-specific language extensions support for grid-based computations, including adaptive mesh refinement(AMR) Optimizing compiler compiled down to C communication and memory optimizations cache and other uniprocessor optimizationsJava: A Cleaner C++ Java is an object-oriented language classes (no standalone functions) with methods inheritance between classes; multiple interface inheritance only Syntax similar to C++class Hello {public static void main (String [] argv) {System.out.println(“Hello, world!”);}} Safe Strongly typed: checked at compile time, no unsafe casts Automatic memory management Titanium is (almost) strict supersetJava Objects Primitive scalar types: boolean, double, int, etc. implementations will store these on the program stack access is fast Objects: user-defined and from the standard library passed by pointer value (object sharing) into functions has level of indirection (pointer to) implicit simple model, but inefficient for small objects2.63truer: 7.1i: 4.3Java Object Exampleclass Complex {private double real;private double imag;public Complex(double r, double i) {real = r; imag = i; }public Complex add(Complex c) { return new Complex(c.real + real, c.imag + imag);} public double getReal {return real; }public double getImag {return imag; }}Complex c = new Complex(7.1, 4.3);c = c.add(c);class VisComplex extends Complex { ... }Immutable Classes in Titanium For small objects, would sometimes prefer to avoid level of indirection pass by value (copying of entire object) especially when objects are immutable -- fields are unchangeable extends the idea of primitive values (1, 4.2, etc.) to user-defined values Titanium introduces immutable classes all fields are final (implicitly) cannot inherit from (extend) or be inherited by other classes needs to have 0-argument constructor, e.g., Complex ()immutable class Complex { ... }Complex c = new Complex(7.1, 4.3);2Arrays in Java Arrays in Java are objects Only 1D arrays are directly supported Array bounds are checked Multidimensional arrays as arrays-of-arrays are slowMultidimensional Arrays in Titanium New kind of multidimensional array added Indexed by Points (tuple of ints) Constructed over a set of Points, called Domains RectDomains are special case of domains Points, Domains and RectDomains are built-in immutable classes Points specified by a tuple of ints RectDomains given by: lower bound, upper bound [stride] Array declared by # dimensions and type, created by passing domainPoint<2> lb = [1, 1];Point<2> ub = [10, 20];RectDomain<2> r = [lb : ub];double [2d] a = new double [r];Unordered iteration Reordering iterations helps performs Compilers can (in principle) do this, but hard in general Titanium adds unordered iteration on rectangular domainsforeach (p within r) { … } p is a Point new point, scoped only within the foreach body r is a previously-declared RectDomain Foreach simplifies bounds checking as well Additional operations on domains and arrays to subset and transform MatMul with Titanium Arrayspublic static void matMul(double [2d] a, double [2d] b, double [2d] c) {foreach (ij within c.domain()) {double [1d] aRowi = a.slice(1, ij[1]);double [1d] bColj = b.slice(2, ij[2]);foreach (k within aRowi.domain()) {c[ij] += aRowi[k] * bColj[k];}}}Note that code is unblocked.Example: DomainPoint<2> lb = [0, 0];Point<2> ub = [6, 4];Point<2> s = [1, 1];RectDomain<2> r = [lb : ub : [2, 2]];RectDomain<2> r1 = [lb+s : ub+s : [2, 2]];Domain<2> red = r + r1;foreach (p in red) { ...}(0, 0)(6, 4)r(1, 1)(7, 5)r + [1, 1](0, 0)(7, 5)red Domains in general are not rectangular Built using set operations union, + intersection, * difference, - Example is red-black algorithmSPMD Execution Model Java programs can be run as Titanium, but the result will be that all processors do all the work E.g., parallel hello worldclass HelloWorld {public static void main (String [] argv) {System.out.println(“Hello from proc ” +Ti.thisProc());}} Barrier synchronization: Ti.barrier()3Safe Barriers All processor start together and execute same code, but not in lock-step Sometimes they take different branchesif (Ti.thisProc() == 0) { … do setup … }for(all data I own) { … compute on data … } Common source of bugs is barriers or other global operations inside branches or loopsbarrier, broadcast, reduction, exchange A “single” method is one called by all procspublic single static void allStep(…) A “single” variable has the same value on all procsint single timestep = 0;SPMD Execution Model Barriers and single in FishSimulationclass FishSim {public static single void main (String [] argv) {int single allTimestep = 0;int single allEndTime = 100;for (; allTimestep < allEndTime; allTimestep++){read all fish and compute forces on mineTi.barrier();write to my fish using new forcesTi.barrier();}}} Single on methods may be inferred by compilerGlobal Address Space Processes allocate locally References can be passed to other processesClass C { …int val;… }C gv; // global pointerC local lv; // local pointer if (thisProc() == 0) {lv = new C();}gv = broadcast lv from 0; gv.val = …; // full… = gv.val; // functionalityProcess 0Other processeslvgvlvgvlvgvlvgvlvgvlvgvLOCAL HEAPLOCAL HEAPUse of Global / Local Default is global opposite of Split-C easier to port shared-memory programs harder to use sequential kernels Use local declarations in performance critical sections same trade-off as Split-C (same implementation as Split-C) shared memory: no performance implications distributed memory: save overhead of a few instructions when using a global reference to access a local objectMemory Management Garbage collection Reference counting Copying garbage collection, generational garbage collection, etc. Distributed GC Complex Potentially expensive Zone-based memory management
View Full Document