15-745Optimizing For Data Locality - 1Seth Copen [email protected]@cs.cmu.EduCMUBased on “A Data Locality Optimizing Algorithm, Wolf & Lam PLDI ‘9115-745 © 2005 Seth Copen Goldstein 1Wolf & Lam, PLDI 91Outline• The Problem•Loop Transformations•Loop Transformations– dependence vectorsTr nsf rm ti ns–Transformations– Unimodular transformationsLli Ali•Locality Analysis•SRP15-745© 2005 Seth Copen Goldstein2The Issue• Improve cache reuse in nested loops•Canonical simple case: Matrix Multiply•Canonical simple case: Matrix Multiplyfor I1:= 1 to nfor I2:= 1 to nf1tfor I3:=1 to nC[I1,I3] += A[I1,I2] * B[I2,I3]I3I3I2+1I3I2I3I2+1=I1I215-745© 2005 Seth Copen Goldstein3In next iteration of I2previous data that could be reused has been replaced in cache.Tiling solves problemfor I1:= 1 to nfI1tfor I2:=1 to nfor I3:= 1 to nC[I1,I3]+=A[I1,I2] * B[I2,I3]for II:=1tonbysfor II2:=1 to n by sfor II3:= 1 to n by sfor I1:= 1 to nfor I2:=II2to min(II2+s-1,n)o2:2to (2s,)for I3:= II3to min(II3+ s - 1,n)C[I1,I3] += A[I1,I2] * B[I2,I3];I3I2I2+1 I3=II215-745© 2005 Seth Copen Goldstein4I1The Problem• How to increase locality by transforming loop nest• Matrix Mult is simple as it is both– legal to tile– advantageous to tile• Can we determine the benefit?(reuse vector space and locality vector space)• Is it legal (and if so, how) to transform loop?(idlfi)(unimodular transformations)15-745© 2005 Seth Copen Goldstein5Handy Representation: “Iteration Space”Iteration Space01ifor i = 0 to N-1for j = 0 to N-1A[i][j] = B[j][i];•each position represents an iterationj•each position represents an iterationVisitation Order in Iteration SpaceSpace01ifor i = 0 to N-1for j = 0 to N-1A[i][j] = B[j][i];•Note: iteration space is not data spacejNote: iteration space is not data spaceWhen Do Cache Misses Occur?for i = 0 to N-1for j= 0 to N-1jA[i][j] = B[j][i];ABi ij jWhen Do Cache Misses Occur?for i = 0 to N-1for j= 0 to N-1jA[i][j] = B[j][i];ABHitMissi ij jWhen Do Cache Misses Occur?ifor i = 0 to N-1for j = 0 to N-1A[i+j][0] = i*j;jjWhen Do Cache Misses Occur?iHitMissfor i = 0 to N-1for j = 0 to N-1A[i+j][0] = i*j;jjOptimizing the Cache Behavior of Array Accessesof Array Accesses• We need to answer the following questions:– when do cache misses occur?•use “locality analysis”– can we change the order of the iterations ( l d l ) d (or possibly data layout) to produce better behavior?l t th t f i lt ti•evaluate the cost of various alternatives– does the new ordering/layout still produce correct results?correct results?•use “dependence analysis”Examples of Loop TransformationsTransformations• Loop InterchangeCan improve locality• Cache Blocking•SkewingCan improve localitySkewing• Loop ReversalCan enable above•…(we will briefly discuss the first two)Loop Interchangefori=0toN-1forj=0toN-1for i= 0 to N-1for j = 0 to N-1A[j][i] = i*j;for j= 0 to N-1for i = 0 to N-1A[j][i] = i*j;i jHitMiss•(assuming N is large relative to cache size)j i(assuming N is large relative to cache size)Impact on Visitation Order in Iteration Spacein Iteration Spacefor i = 0 to N-1for JJ = 0 to N-1 by Bfori=0toN-1for j = 0 to N-1f(A[i],A[j]);for i 0 to N1for j = JJ to max(N-1,JJ+B-1) f(A[i],A[j]);iijjCache Blocking (aka “Tiling”)for i = 0 to N-1fj0t N1for JJ = 0 to N-1 by Bfor i = 0 to N-1for j= 0 to N-1f(A[i],A[j]);for j = JJ to max(N-1,JJ+B-1)f(A[i],A[j]);iiA[i] A[j]iiA[i] A[j]now we can exploit localityjjjjnow we can exploit localityCache Blocking (aka “Tiling”)for i = 0 to N-1fj0t N1for JJ = 0 to N-1 by Bfor i = 0 to N-1for j= 0 to N-1f(A[i],A[j]);for j = JJ to max(N-1,JJ+B-1)f(A[i],A[j]);iiA[i] A[j]iiA[i] A[j]now we can exploit temporal localityjjjjnow we can exploit temporal localityCache Blocking in Two DimensionsDimensionsfor i = 0 to N-1for j = 0 to N-1for JJ = 0 to N-1 by Bfor KK = 0 to N-1 by Bfor i = 0 to N-1fjJJt(N1JJ B1)for k = 0 to N-1c[i,k] += a[i,j]*b[j,k];for j = JJto max(N-1,JJ+B-1)for k = KK to max(N-1,KK+B-1)c[i,k] += a[i,j]*b[j,k];•brings square sub-blocks of matrix “b” into gqthe cache•completely uses them up before moving onPredicting Cache Behavior through “Locality Analysis”through Locality Analysis• Definitions:R–Reuse:accessing a location that has been accessed in the pastp–Locality:accessing a location that is now found in the cache• Key Insights– Locality only occurs when there is reuse!BUT d t il lt i llit–BUT, reuse does not necessarily result in locality.–Why not?Steps in Locality Analysis1. Find data reuse– if caches were infinitely large, we would be finished2. Determine “localized iteration space”– set of inner loops where the data accessed pby an iteration is expected to fit within the cache3. Find data locality:– reuse ⊇ localized iteration space ⊇ localitypyTypes of Data Reuse/Localityfori=0to2for i = 0 to 2for j = 0 to 100A[i][j] = B[j][0] + B[j+1][0];HitMissiA[i][j]iB[j][0]iB[j+1][0]jjjSpatial Temporal Group(spatial)Kinds of reuse and the factorfor i = 0 to N-1forj=0toN-1What kinds of reuse are there?A[i]?for j 0 to N1f(A[i],A[j]);A[i]?A[j]?15-745© 2005 Seth Copen Goldstein22Kinds of reuse and the factorfor I1:=0to5for I2:= 0 to 62A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])15-745© 2005 Seth Copen Goldstein23Kinds of reuse and the factorfor I1:=0to5for I2:= 0 to 62A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])self-temporal in 1, self-spatial in 2Also, group spatial in 2What is different about this and previous?for i = 0 to N-1for j = 0 to N-1f(A[i],A[j]);15-745© 2005 Seth Copen Goldstein24Uniformly Generated references• f and g are indexing functions: ZnÆ Zd–n is depth of loop nestn is depth of loop nest– d is dimensions of array, A•Two references A[f(i)] and A[g(i)] are •Two references A[f(i)] and A[g(i)] are uniformly generated iff(i) = Hi + cfAND g(i)=Hi+cg• H is a linear transformd tt t•cfand cgare constant vectors15-745© 2005 Seth Copen Goldstein25Eg of Uniformly generated setsh f ll l h for I1:=0to5for I2:= 0 to 6These references all belong to the same uniformly generated set: H = [ 0 1]2A[I2+ 1] = 1/3 * (A[I2]+ A[I2+1]+A[I2+ 2])A[I+1][01] +[1]I1A[I2+1][ 0 1 ] + [ 1 ]1I2A[I2] [ 0 1 ] + [ 0 ]I1I2A[I+2][01] +[2]I115-745© 2005 Seth Copen Goldstein26A[I2+2][ 0 1 ] + [ 2 ]I2Quantifying Reuse• Why should we quantify reuse?•How do we quantify locality?•How do we quantify locality?15-745© 2005 Seth Copen Goldstein27Quantifying
View Full Document