U of I CS 498 - Programming for Parallelism - D2128218

Home> Schools> University of Illinois> Computer Science (CS) > CS 498> Programming for Parallelism

DOC PREVIEW

U of I CS 498 - Programming for Parallelism

School name University of Illinois

Course Cs 498- Special Topics

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Programming for Parallelismand Locality withHierarchically Tiled ArraysProgram Optimization Fall 07University of Illinois,Urbana-ChampaignPeople working in this project:Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger,Gheorghe Almasi*, Basilio B.Fraguela+, Maria J.Garzaran, David Padua, Christoph von Praun*University of Illinois,Urbana-Champaign,*IBM T. J. Watson Research Center,+Universidade de Coruna, Spain210/26/07 HTA 3Motivation Memory Hierarchy Determines performance Increasingly complex Registers – Cache (L1..L3) – Memory - Distributed Memory Given the importance of memory hierarchy,programming constructs that allow the control oflocality are necessary.10/26/07 HTA 4Contributions Design of a programming paradigm that facilitates control of locality (in sequential and parallel programs) communication (in parallel programs) Implementations in MATLAB and C++ Distributed Memory (Clusters) Shared Memory (Multi-core) Implementation of NAS benchmarks in our paradigm Evaluation of productivity and performance310/26/07 HTA 5Motivation Of Our ResearchHierarchically Tiled ArraysExamples and Additional FeaturesEvaluationConclusion & Future workOutline of the talk10/26/07 HTA 6 Tiled Array: Array partitioned into tiles Hierarchically Tiled Array: Tiled Array whosetiles are in-turn HTAs or Leaf HTAs.Leaf HTA: HTA with no further levels of tilingHierarchically Tiled Arrays (HTA)410/26/07 HTA 7Hierarchically Tiled ArraysAn example tiling2 X 2 tiles4 X 4 tiles2 X 2 tiles10/26/07 HTA 8Hierarchically Tiled Arrays2 X 2 tilessize = 64 X 64 elements4 X 4 tilessize = 32 X 32 elements2 X 2 tilessize = 8 X 8 elementsAn example size510/26/07 HTA 9Hierarchically Tiled Arrays2 X 2 tilessize = 64 X 64 elementsmap to distinct modulesof a cluster4 X 4 tilessize = 32 X 32 elementsmap to L1-cache2 X 2 tilessize = 8 X 8 elementsmap to registersAn example mapping10/26/07 HTA 10Why tiles as first class objects? Many algorithms are formulated in terms oftiles e.g. linear algebra algorithms (in LAPACK,ATLAS) What is special about HTAs, in particular? HTA provides a convenient way for theprogrammer to express these types of algorithms Its recursive nature makes it easy to fit in to anymemory hierarchy.610/26/07 HTA 11hta(MATRIX,{[delim1],[delim2],…},[processor mesh])P1P3P2P4aha = hta(a,{[1,4],[1,3]},[2,2]);haprocessormappingHTA Construction141310/26/07 HTA 12HTA Addressing710/26/07 HTA 13h{1,1:2} h{2,1} hierarchical HTA Addressingtiles10/26/07 HTA 14h{1,1:2} h{2,1} hierarchical h(3,4) flattenedHTA Addressingelements810/26/07 HTA 15h{1,1:2} h{2,1} hierarchical h{2,2}(1,2) hybridh(3,4) flattenedorHTA Addressing10/26/07 HTA 16F90 Conformabilitydim(lhs) == dim(rhs).and.size(lhs) == size(rhs)size(rhs) == 1910/26/07 HTA 17 HTA ConformabilityAll conformable HTAs can be operated using the primitive operations(add, subtract, etc) 10/26/07 HTA 18HTA Assignmenth{:,:} = t{:,:}h{1,:} = t{2,:}h{1,:}(2,:) = t{2,:}(1,:);orh(2,:) = t(3,:); htimplicit communication1010/26/07 HTA 19Data parallel functions in F90sinsin( )sin( )sin( ) sin( )sin( )sin( )sin( )sin( )sin( )10/26/07 HTA 20Mapr = map (@sin, h)Recursive@sin1110/26/07 HTA 21@sin @sin@sinMapr = map (@sin, h)@sinRecursive@sin10/26/07 HTA 22@sin@sin@sin@sin @sin@sin@sin@sin @sin @sin@sin@sinMapr = map (@sin, h)Recursive@sin@sin@sin@sin1210/26/07 HTA 23@max@max@max@max @max@max@max@max @max @max@max@maxReducer = reduce (@max, h)Recursive@max10/26/07 HTA 24Reducer = reduce (@max, h)@max@max@maxRecursive@max@max@max@max @max@max@max@max @max @max@max@max@max1310/26/07 HTA 25Reducer = reduce (@max, h)Recursive@max@max@max@max@max10/26/07 HTA 26repmat(h, [1, 3])circshift( h, [0, -1] )transpose(h)Higher level operations1410/26/07 HTA 27Motivation Of Our ResearchHierarchically Tiled ArraysExamples and Additional FeaturesEvaluationConclusion & Future workOutline of the talk10/26/07 HTA 28Blocked Recursive Matrix Multiplicationfunction c = matmul (A, B, C)if (level(A) == 0)C = matmul_leaf (A, B, C)else for i = 1:size(A, 1)for k = 1:size(A, 2)for j = 1:size(B, 2)C{i, j} = matmul(A{i, k}, B{k, j}, C{i,j});endendendendBlocked-Recursive implementationA{i, k}, B{k, j} and C{i, j} are sub-matrices of A, B and C.1510/26/07 HTA 29Matrix Multiplicationmatmul (AA{i, k}, BB{k, j}, CC{i, j})matmul_leaf (AAA{i, k}, BBB{k, j}, CCC{i, j})matmul (A, B, C)matmul (A{i, k}, B{k, j}, C{i, j})10/26/07 HTA 30SUMMA Matrix MultiplicationOuter-product methodfor k=1:n for i=1:n for j=1:n C(i,j)=C(i,j)+A(i,k)*B(k,j); end end endfor k=1:n C(:,:)=C(:, :)+A(:,k) * B(k, :);end1610/26/07 HTA 31SUMMA Matrix MultiplicationT2BT1A*function C = summa (A, B, C)for k=1:m T1 = repmat(A{:, k}, 1, m); T2 = repmat(B{k, :}, m, 1); C = C + T1 * T2; endHere, C{i,j}, A{i,k}, B{k,j}represent submatrices.The * operator representstile-by-tile matrixmultiplicationrepmatrepmat10/26/07 HTA 32Stencil Computations A common computation is a stencil – wherean element’s value is based on those of itselfand its neighbors. Let element X equal the average of itself andits neighbors A(X) = 1/3 * ( A(X-1) + A(X) + A(X+1) )1710/26/07 HTA 33Stencil Computations If tiles are distributed, communication isnecessary to read the value of neighbors atthe tile boundaries One solution: Shadow Regions10/26/07 HTA 34Shadow Regions Insert “padding” cells at the borders of tiles tohold the adjacency data Requires manual updates by the programmerto remain consistent The programmer overhead of this approachleads to a feature of HTAs, Overlapped Tiling1810/26/07 HTA 35Overlapped Tiling Overlapped HTAs are regular HTAs with afew exceptions: At creation, the programmer specifies the amountof overlap in each dimension Overlapped HTAs provide the functionality ofshadow regions without any manual updating bythe programmer The library handles any communication in thecorrect manner10/26/07 HTA 36StencilWith Explicit Shadow RegionsDo X=1,MA{1:N-1}(6) = A{2:N}(2)A{2:N}{1} = A{1:N-1}(5)A(X) = 1/3 * ( A(X-1) +A(X) +A(X+1) )EnddoWith Overlapped TilingDo X=1,MA(X) = 1/3 * ( A(X-1) +A(X) +A(X+1) )Enddo1910/26/07 HTA 37Dynamic Partitioning Another feature of HTAs is dynamicpartitioning Normally, the tiling structure of a HTA is setat creation; however, the partitioning forsome problems might be input dependent. Dynamic partitioning allows the programmerto: Add tiling to flat arrays Turn tiled arrays into

View Full Document