Yale CPSC 424 - Titanium and Java Parallelism - D2769330

Home> Schools> Yale University> Computer Science (CPSC) > CPSC 424> Titanium and Java Parallelism

DOC PREVIEW

Yale CPSC 424 - Titanium and Java Parallelism

School name Yale University

Course Cpsc 424- Parallel Programming Techniques

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Titanium and Java ParallelismArvind KrishnamurthyFall 2004Titanium Take the best features of threads and MPI (just like Split-C) global address space like threads (ease programming) SPMD parallelism like MPI (for performance) local/global distinction, i.e., layout matters (for performance) Based on Java, a cleaner C++ classes, memory management Language is extensible through classes domain-specific language extensions support for grid-based computations, including adaptive mesh refinement(AMR) Optimizing compiler compiled down to C communication and memory optimizations cache and other uniprocessor optimizationsJava: A Cleaner C++ Java is an object-oriented language classes (no standalone functions) with methods inheritance between classes; multiple interface inheritance only Syntax similar to C++class Hello {public static void main (String [] argv) {System.out.println(“Hello, world!”);}} Safe Strongly typed: checked at compile time, no unsafe casts Automatic memory management Titanium is (almost) strict supersetJava Objects Primitive scalar types: boolean, double, int, etc. implementations will store these on the program stack access is fast Objects: user-defined and from the standard library passed by pointer value (object sharing) into functions has level of indirection (pointer to) implicit simple model, but inefficient for small objects2.63truer: 7.1i: 4.3Java Object Exampleclass Complex {private double real;private double imag;public Complex(double r, double i) {real = r; imag = i; }public Complex add(Complex c) { return new Complex(c.real + real, c.imag + imag);} public double getReal {return real; }public double getImag {return imag; }}Complex c = new Complex(7.1, 4.3);c = c.add(c);class VisComplex extends Complex { ... }Immutable Classes in Titanium For small objects, would sometimes prefer to avoid level of indirection  pass by value (copying of entire object) especially when objects are immutable -- fields are unchangeable extends the idea of primitive values (1, 4.2, etc.) to user-defined values Titanium introduces immutable classes all fields are final (implicitly) cannot inherit from (extend) or be inherited by other classes needs to have 0-argument constructor, e.g., Complex ()immutable class Complex { ... }Complex c = new Complex(7.1, 4.3);2Arrays in Java Arrays in Java are objects Only 1D arrays are directly supported Array bounds are checked Multidimensional arrays as arrays-of-arrays are slowMultidimensional Arrays in Titanium New kind of multidimensional array added Indexed by Points (tuple of ints) Constructed over a set of Points, called Domains RectDomains are special case of domains Points, Domains and RectDomains are built-in immutable classes Points specified by a tuple of ints RectDomains given by: lower bound, upper bound [stride] Array declared by # dimensions and type, created by passing domainPoint<2> lb = [1, 1];Point<2> ub = [10, 20];RectDomain<2> r = [lb : ub];double [2d] a = new double [r];Unordered iteration Reordering iterations helps performs Compilers can (in principle) do this, but hard in general Titanium adds unordered iteration on rectangular domainsforeach (p within r) { … }  p is a Point new point, scoped only within the foreach body r is a previously-declared RectDomain Foreach simplifies bounds checking as well  Additional operations on domains and arrays to subset and transform MatMul with Titanium Arrayspublic static void matMul(double [2d] a, double [2d] b, double [2d] c) {foreach (ij within c.domain()) {double [1d] aRowi = a.slice(1, ij[1]);double [1d] bColj = b.slice(2, ij[2]);foreach (k within aRowi.domain()) {c[ij] += aRowi[k] * bColj[k];}}}Note that code is unblocked.Example: DomainPoint<2> lb = [0, 0];Point<2> ub = [6, 4];Point<2> s = [1, 1];RectDomain<2> r = [lb : ub : [2, 2]];RectDomain<2> r1 = [lb+s : ub+s : [2, 2]];Domain<2> red = r + r1;foreach (p in red) { ...}(0, 0)(6, 4)r(1, 1)(7, 5)r + [1, 1](0, 0)(7, 5)red Domains in general are not rectangular Built using set operations union, + intersection, * difference, - Example is red-black algorithmSPMD Execution Model Java programs can be run as Titanium, but the result will be that all processors do all the work E.g., parallel hello worldclass HelloWorld {public static void main (String [] argv) {System.out.println(“Hello from proc ” +Ti.thisProc());}} Barrier synchronization: Ti.barrier()3Safe Barriers All processor start together and execute same code, but not in lock-step Sometimes they take different branchesif (Ti.thisProc() == 0) { … do setup … }for(all data I own) { … compute on data … } Common source of bugs is barriers or other global operations inside branches or loopsbarrier, broadcast, reduction, exchange A “single” method is one called by all procspublic single static void allStep(…) A “single” variable has the same value on all procsint single timestep = 0;SPMD Execution Model Barriers and single in FishSimulationclass FishSim {public static single void main (String [] argv) {int single allTimestep = 0;int single allEndTime = 100;for (; allTimestep < allEndTime; allTimestep++){read all fish and compute forces on mineTi.barrier();write to my fish using new forcesTi.barrier();}}} Single on methods may be inferred by compilerGlobal Address Space Processes allocate locally References can be passed to other processesClass C { …int val;… }C gv; // global pointerC local lv; // local pointer if (thisProc() == 0) {lv = new C();}gv = broadcast lv from 0; gv.val = …; // full… = gv.val; // functionalityProcess 0Other processeslvgvlvgvlvgvlvgvlvgvlvgvLOCAL HEAPLOCAL HEAPUse of Global / Local Default is global opposite of Split-C easier to port shared-memory programs harder to use sequential kernels Use local declarations in performance critical sections same trade-off as Split-C (same implementation as Split-C) shared memory: no performance implications  distributed memory:  save overhead of a few instructions when using a global reference to access a local objectMemory Management Garbage collection Reference counting Copying garbage collection, generational garbage collection, etc. Distributed GC Complex Potentially expensive Zone-based memory management

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Yale CPSC 424 - Titanium and Java Parallelism

Sign up for free to view:

Please select your school