DOC PREVIEW
CSU CS 553 - Run-time Reordering Transformations

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Michelle Mills StroutDecember 5, 2005Run-time Reordering Transformations2Molecular DynamicsFinite Element AnalysisSparse Matrix ComputationsImportant Irregular Science & Engineering Applications3•10% of peak typical for irregular apps•BiCGSTAB solver on SGI Origin 2000•Performance degrades upon falling out of L2 cache•Processor-memory performance gap increasingLackluster Performance in Irregular Applications [Mahinthakumar & Saied 99]120.0110.0100.090.050.070.060.050.0Mflops102103104105Problem SizeTrue MflopsEffective Mflops140.0130.028%26%24%22%20%18%16%14%12%10%% of Peak Mflops102103104105Problem Size MflopsEffective Mflops4Challenge: unable to effectively reorder data and computation at compile-time in irregular applicationsApproach: run-time reordering transformationsGoal: Ideal Compiler/Run-time System for Irregular ApplicationsRun-time TransformationFrameworkTransformation 1Transformation 2Transformation N...Compiler Run-timeInspector/Executor CodeDecision Code5•Full Sparse Tiling (FST)-Data locality in Gauss-Seidel (ICCS01)(JHPCA03)-Parallelism for Gauss-Seidel (LCPC02)•Framework for run-time reordering transformations and their compositions (PLDI03)ContributionsRun-time TransformationFrameworkTransformation 1Transformation 2Transformation N...Compiler Run-timeInspector/Executor CodeDecision CodeFST6I.OverviewII.Introduction to run-time reordering transformationsIII.Motivation for composing such transformationsIV.Framework for creating compositionsV.Conclusions and Future WorkTalk Outline7Goal: Improve data locality and exploit parallelismHow? •Reordering data at runtime•Reordering iterations at runtimeResult: Performance improvements for irregular applicationsRun-time Reordering Transformations8abcdefgh•Z is the data array•x is the index array•Simple cache model-2 elements of Z in each cache line-2 cache lines of Z in cache at any one timeExample Loop with an Irregular Memory Referencefor i=0,7 Y[i] = Z[x[i]]xZi0MCache:1Mcdgh2 7M3M4M5M6MMabcdefgh37052702II. Introduction9Spatial locality occurs when memory locations mapped to the same cache-line are used before the cache line is evictedTemporal locality occurs when the same memory location is reused before its cache line is evictedData LocalityII. Introduction10xZ4 5 6 71 30 2abcdefgh37052702dhafcbegZ’01234124x’for i=0,7 Y[i] = Z[x[i]]dhafcbegZ’01234124x’for i=0,7 Y[i] = Z’[x’[i]]i0MCache:for i=0,7 Y[i] = Z’[x’[i]]i•reorder items in data array Z at run-time•update index array x with new locations•increases spatial localityRun-time Data Reordering Transformation1Sdh2 3 4 5 6 7M MMMSMdhafcbeg112 6 4 7 0 3 1 5x’i002235772 6 4 7 0 3 1 5x’i00223577for i’=0,7 Y’[i’] = Z[x’[i’]]i’for i’=0,7 Y’[i’] = Z[x’[i’]]0MCache:1Tabcdefgh4 5 6 732TSTMM MRun-time Iteration Reordering Transformation•reorder iterations of the i loop•remap index array x according to the new i loop ordering •increases temporal and spatial localityII. Introduction12•Inspector-Traverses through index arrays -Creates data and/or iteration reordering functions-Reorders data and updates any affected index arrays by applying reordering functions•ExecutorA transformed version of the original code which uses reordered data arrays and/or executes new iteration order Inspector/Executor[Saltz et al. 1994]13for i=0,7 ... x[i] ...for j=0,7 sigma[j] = ...for j=0,7 Z’[sigma[j]]=Z[j] x’[j]=sigma[x[j]]Original CodeInspectorExecutorfor i=0,7 Y[i] = Z[x[i]]for i=0,7 Y[i] = Z’[x’[i]]Generates data reordering function! "Reorder data and updates index arrayTraverses index array14I.OverviewII.Introduction to run-time reordering transformationsIII.Motivation for composing such transformationsIV.Framework for creating compositionsV.Conclusions and Future WorkTalk Outline15•Hand-generated inspector and executor code for three benchmarks•Used existing data and iteration reordering heuristics within one loop•Composed these with full sparse tiling to reorder iterations between loops•Measured performance improvements and overheadEffect of Composing Run-time Reordering Transformations16•IRREG, partial differential equation solver•Non-bonded force calculations in molecular dynamics-MOLDYN abstracted from CHARMM-NBF abstracted from GROMOS, different data structureSample Irregular Benchmarks[Han & Tseng 2000]for s=1,T for i=1,n ... = ...Z[i] endfor for j=1,m Z[l[j]] = ... Z[r[j]] = ... endfor for k=1,n ... += Z[k] endforendfor171 2 3 4 5 6Z Z1 Z4Z2 Z5 Z6Z31 2 3 4 5 6 7 8jData Mapping for j Loopfor s=1,T for i=1,n ... = ...Z[i] endfor for j=1,m Z[l[j]] = ... Z[r[j]] = ... endfor for k=1,n ... += Z[k] endforendfor181 2 3 4 5 6 7 8j1 2 3 4 5 6Z4 Z3Z5 Z6 Z1Z2Z'1 2 3 4 5 6Z Z1 Z4Z2 Z5 Z6Z31 2 3 4 5 6 7 8jData Reordering: CPACK [Ding & Kennedy 99]191 2 34 56 7 8j1 2 3 4 5 6Z' Z4 Z3Z5 Z6 Z1Z21 2 3 4 5 6 7 8j1 2 3 4 5 6Z4 Z3Z5 Z6 Z1Z2Z'Iteration Reordering: lexGroup [Ding & Kennedy 99]203 5 7 8j65 13ki4 5 32 6 11 24 64 2Dependences Between Loopsfor s=1,T for i=1,n ... = ...Z[i] endfor for j=1,m Z[l[j]] = ... Z[r[j]] = ... endfor for k=1,n ... += Z[k] endforendfor213 5 7 8j65 13ki4 5 32 6 11 24 64 2Iteration Reordering: Full Sparse Tiling (FST)3 5 7 8j65 13ki4 5 32 6 11 24 64 222Iteration & Data Reordering: Tile Packing (tilePack)3 5 7 8j65 13ki4 5 32 6 11 24 64 23 5 7 8j32 16ki4 2 65 3 124 64 5123for s=1,T for t=1,nt for i’ in sched(t,i) ... = ...Z’[i’] endfor for j’ in sched(t,j) Z’[l’[j’]] = ... endfor for k’ in sched(t,k) ... += Z’[k’] endfor endforendforExecutor for Composed Transformationfor s=1,T for i=1,n ... = ...Z[i] endfor for j=1,m Z[l[j]] = ... Z[r[j]] = ... endfor for k=1,n ... += Z[k] endforendfor24Effect of FST in CompositionIBM Power 3, 375MHz, 64 KB L1 cache 10MB 31MB 11MB 37MB 18MB61MB00.20.40.60.81CPACK, lexGroupCPACK, lexGroup, FSTGpart, lexGroupGpart, lexGroup, FSTIRREG NBF MOLDYNNormalized Execution Time for Executor25Normalized Execution Time for ExecutorEffect of FST in CompositionIntel Pentium 4, 1.7GHz, 8KB L1 cache10MB 31MB 11MB 37MB 18MB61MB00.20.40.60.81CPACK, lexGroupCPACK, lexGroup, FSTGpart, lexGroupGpart, lexGroup, FSTIRREG NBF MOLDYN26Inspector for FST CompositionsIBM Power 3 and Intel Pentium 4Break EvenMin (timesteps)Break EvenMax (timesteps)IRREG11161NBF518MOLDYN2527Parallelism with Full Sparse TilingcTile Dependence GraphFull


View Full Document

CSU CS 553 - Run-time Reordering Transformations

Download Run-time Reordering Transformations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Run-time Reordering Transformations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Run-time Reordering Transformations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?