DOC PREVIEW
CMU CS 15745 - Lecture

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

15-745Optimizing For Data Locality - 1Seth Copen [email protected]@cs.cmu.EduCMUBased on “A Data Locality Optimizing Algorithm, Wolf & Lam PLDI ‘9115-745 © 2005 Seth Copen Goldstein 1Wolf & Lam, PLDI 91Outline• The Problem•Loop Transformations•Loop Transformations– dependence vectorsTr nsf rm ti ns–Transformations– Unimodular transformationsLli Ali•Locality Analysis•SRP15-745© 2005 Seth Copen Goldstein2The Issue• Improve cache reuse in nested loops•Canonical simple case: Matrix Multiply•Canonical simple case: Matrix Multiplyfor I1:= 1 to nfor I2:= 1 to nf1tfor I3:=1 to nC[I1,I3] += A[I1,I2] * B[I2,I3]I3I3I2+1I3I2I3I2+1=I1I215-745© 2005 Seth Copen Goldstein3In next iteration of I2previous data that could be reused has been replaced in cache.Tiling solves problemfor I1:= 1 to nfI1tfor I2:=1 to nfor I3:= 1 to nC[I1,I3]+=A[I1,I2] * B[I2,I3]for II:=1tonbysfor II2:=1 to n by sfor II3:= 1 to n by sfor I1:= 1 to nfor I2:=II2to min(II2+s-1,n)o2:2to (2s,)for I3:= II3to min(II3+ s - 1,n)C[I1,I3] += A[I1,I2] * B[I2,I3];I3I2I2+1 I3=II215-745© 2005 Seth Copen Goldstein4I1The Problem• How to increase locality by transforming loop nest• Matrix Mult is simple as it is both– legal to tile– advantageous to tile• Can we determine the benefit?(reuse vector space and locality vector space)• Is it legal (and if so, how) to transform loop?(idlfi)(unimodular transformations)15-745© 2005 Seth Copen Goldstein5Handy Representation: “Iteration Space”Iteration Space01ifor i = 0 to N-1for j = 0 to N-1A[i][j] = B[j][i];•each position represents an iterationj•each position represents an iterationVisitation Order in Iteration SpaceSpace01ifor i = 0 to N-1for j = 0 to N-1A[i][j] = B[j][i];•Note: iteration space is not data spacejNote: iteration space is not data spaceWhen Do Cache Misses Occur?for i = 0 to N-1for j= 0 to N-1jA[i][j] = B[j][i];ABi ij jWhen Do Cache Misses Occur?for i = 0 to N-1for j= 0 to N-1jA[i][j] = B[j][i];ABHitMissi ij jWhen Do Cache Misses Occur?ifor i = 0 to N-1for j = 0 to N-1A[i+j][0] = i*j;jjWhen Do Cache Misses Occur?iHitMissfor i = 0 to N-1for j = 0 to N-1A[i+j][0] = i*j;jjOptimizing the Cache Behavior of Array Accessesof Array Accesses• We need to answer the following questions:– when do cache misses occur?•use “locality analysis”– can we change the order of the iterations ( l d l ) d (or possibly data layout) to produce better behavior?l t th t f i lt ti•evaluate the cost of various alternatives– does the new ordering/layout still produce correct results?correct results?•use “dependence analysis”Examples of Loop TransformationsTransformations• Loop InterchangeCan improve locality• Cache Blocking•SkewingCan improve localitySkewing• Loop ReversalCan enable above•…(we will briefly discuss the first two)Loop Interchangefori=0toN-1forj=0toN-1for i= 0 to N-1for j = 0 to N-1A[j][i] = i*j;for j= 0 to N-1for i = 0 to N-1A[j][i] = i*j;i jHitMiss•(assuming N is large relative to cache size)j i(assuming N is large relative to cache size)Impact on Visitation Order in Iteration Spacein Iteration Spacefor i = 0 to N-1for JJ = 0 to N-1 by Bfori=0toN-1for j = 0 to N-1f(A[i],A[j]);for i 0 to N1for j = JJ to max(N-1,JJ+B-1) f(A[i],A[j]);iijjCache Blocking (aka “Tiling”)for i = 0 to N-1fj0t N1for JJ = 0 to N-1 by Bfor i = 0 to N-1for j= 0 to N-1f(A[i],A[j]);for j = JJ to max(N-1,JJ+B-1)f(A[i],A[j]);iiA[i] A[j]iiA[i] A[j]now we can exploit localityjjjjnow we can exploit localityCache Blocking (aka “Tiling”)for i = 0 to N-1fj0t N1for JJ = 0 to N-1 by Bfor i = 0 to N-1for j= 0 to N-1f(A[i],A[j]);for j = JJ to max(N-1,JJ+B-1)f(A[i],A[j]);iiA[i] A[j]iiA[i] A[j]now we can exploit temporal localityjjjjnow we can exploit temporal localityCache Blocking in Two DimensionsDimensionsfor i = 0 to N-1for j = 0 to N-1for JJ = 0 to N-1 by Bfor KK = 0 to N-1 by Bfor i = 0 to N-1fjJJt(N1JJ B1)for k = 0 to N-1c[i,k] += a[i,j]*b[j,k];for j = JJto max(N-1,JJ+B-1)for k = KK to max(N-1,KK+B-1)c[i,k] += a[i,j]*b[j,k];•brings square sub-blocks of matrix “b” into gqthe cache•completely uses them up before moving onPredicting Cache Behavior through “Locality Analysis”through Locality Analysis• Definitions:R–Reuse:accessing a location that has been accessed in the pastp–Locality:accessing a location that is now found in the cache• Key Insights– Locality only occurs when there is reuse!BUT d t il lt i llit–BUT, reuse does not necessarily result in locality.–Why not?Steps in Locality Analysis1. Find data reuse– if caches were infinitely large, we would be finished2. Determine “localized iteration space”– set of inner loops where the data accessed pby an iteration is expected to fit within the cache3. Find data locality:– reuse ⊇ localized iteration space ⊇ localitypyTypes of Data Reuse/Localityfori=0to2for i = 0 to 2for j = 0 to 100A[i][j] = B[j][0] + B[j+1][0];HitMissiA[i][j]iB[j][0]iB[j+1][0]jjjSpatial Temporal Group(spatial)Kinds of reuse and the factorfor i = 0 to N-1forj=0toN-1What kinds of reuse are there?A[i]?for j 0 to N1f(A[i],A[j]);A[i]?A[j]?15-745© 2005 Seth Copen Goldstein22Kinds of reuse and the factorfor I1:=0to5for I2:= 0 to 62A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])15-745© 2005 Seth Copen Goldstein23Kinds of reuse and the factorfor I1:=0to5for I2:= 0 to 62A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])self-temporal in 1, self-spatial in 2Also, group spatial in 2What is different about this and previous?for i = 0 to N-1for j = 0 to N-1f(A[i],A[j]);15-745© 2005 Seth Copen Goldstein24Uniformly Generated references• f and g are indexing functions: ZnÆ Zd–n is depth of loop nestn is depth of loop nest– d is dimensions of array, A•Two references A[f(i)] and A[g(i)] are •Two references A[f(i)] and A[g(i)] are uniformly generated iff(i) = Hi + cfAND g(i)=Hi+cg• H is a linear transformd tt t•cfand cgare constant vectors15-745© 2005 Seth Copen Goldstein25Eg of Uniformly generated setsh f ll l h for I1:=0to5for I2:= 0 to 6These references all belong to the same uniformly generated set: H = [ 0 1]2A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])A[I+1][01] +[1]I1A[I2+1][ 0 1 ] + [ 1 ]1I2A[I2] [ 0 1 ] + [ 0 ]I1I2A[I+2][01] +[2]I115-745© 2005 Seth Copen Goldstein26A[I2+2][ 0 1 ] + [ 2 ]I2Quantifying Reuse• Why should we quantify reuse?•How do we quantify locality?•How do we quantify locality?15-745© 2005 Seth Copen Goldstein27Quantifying


View Full Document

CMU CS 15745 - Lecture

Documents in this Course
Lecture

Lecture

14 pages

Lecture

Lecture

19 pages

Lecture

Lecture

8 pages

Lecture

Lecture

5 pages

Lecture

Lecture

6 pages

lecture

lecture

17 pages

Lecture 3

Lecture 3

12 pages

Lecture

Lecture

17 pages

Lecture

Lecture

18 pages

lecture

lecture

14 pages

lecture

lecture

8 pages

lecture

lecture

5 pages

Lecture

Lecture

19 pages

lecture

lecture

10 pages

Lecture

Lecture

20 pages

Lecture

Lecture

8 pages

Lecture

Lecture

7 pages

lecture

lecture

59 pages

Task 2

Task 2

2 pages

Handout

Handout

18 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?