DOC PREVIEW
CMU CS 15745 - Memory Optimizations

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Memory Optimizations15-745 Optimizing CompilersSpring 2006Peter LeeRemindersT3 is due this weekthe last one! ;-)Suggested projects postedReading list postedCPUregscachememorymain memorydisk8B 32B 8KB3ns 6ns60ns8mscachevirtualmemorylarger, slower, cheaperImproving cache performanceThings to enhancetemporal localityspatial localityThings to minimizeconflicts (i.e., bad replacement decisions)What the compiler can doTime:when is an object accessed?Space:where does an object exist in the address space?These are the two “levers” a compiler can try to manipulateManipulating time and spaceTime: Reordering computationtry to determine when an object will be accessed, and predict a better time to access itSpace: Changing data layoutdetermine an object’s shape and location, and determine a better layoutTypes of objectsFor small objects (scalars) and pointer-based objects, manipulating temporal and spatial locality is both difficult and usually not fruitfulArrays, on the other hand, often provide many opportunities for significant optimizationsArraysOften accessed within loop nestsmakes it easy to understand “time”Have predictable layoutmakes it easy to understand “space”double A[N][N], B[N][N];…for i = 0 to N-1 for j = 0 to N-1 A[i][j] = B[j][i];Iteration spaceijfor i = 0 to N-1 for j = 0 to N-1 A[i][j] = B[j][i];Each position represents a loop iteration.Note: iteration space ! data space!ijIteration spacefor i = 0 to N-1 for j = 0 to N-1 A[i][j] = B[j][i];The visitation orderWhen do cache misses occur?for i = 0 to N-1 for j = 0 to N-1 A[i][j] = B[j][i];ijijA BWhen do cache misses occur?for i = 0 to N-1 for j = 0 to N-1 A[i+j][0] = i*j;ijOptimizing cache behaviorWhen do cache misses occur?use locality analysisCan we change the visitation order to produce better behavior?evaluate costsDoes the new visitation order still produce correct results?use dependence analysisLoop transformationsMany different ones proposedloop interchangecache blockingskewingloop reversal...Loop interchangefor i = 0 to N-1 for j = 0 to N-1 A[j][i] = i*j;for j = 0 to N-1 for i = 0 to N-1 A[j][i] = i*j;ijjiHitMiss(assuming N is large, relative to the cache line size)Cache blockingfor i = 0 to N-1 for j = 0 to N-1 f(A[i],A[j]);for JJ = 0 to N-1 by B for i = 0 to N-1 for j = JJ to max(N-1,JJ+B-1) f(A[i],A[j]);ijijijijA[i] A[j]A[i] A[j]Also known as “tiling”Impact of tiling on visit orderfor i = 0 to N-1 for j = 0 to N-1 f(A[i],A[j]);for JJ = 0 to N-1 by B for i = 0 to N-1 for j = JJ to max(N-1,JJ+B-1) f(A[i],A[j]);ijijfor JJ = 0 to N-1 by B for i = 0 to N-1 for j = JJ to max(N-1,JJ+B-1) f(A[i],A[j]);Tiling in two dimensionsfor i = 0 to N-1 for j = 0 to N-1 for k = 0 to N-1 c[i,k] += a[i,j]*b[j,k];for JJ = 0 to N-1 by B for KK = 0 to N-1 by B for i = 0 to N-1 for j = JJ to max(N-1,JJ+B-1) for k = KK to max(N-1,KK+B-1) c[i,k] += a[i,j]*b[j,k];The goal is to bring square sub-blocks of b into the cache, using them up before moving onLocality analysisReuse: accessing a location that has been accessed previouslyLocality: accessing a location that is in the cacheObserve:locality only occurs when there is reuse!but reuse does not imply localitySteps in locality analysisFind data reuseDetermine “localized iteration space”set of inner loops where the data accessed by an iteration is expected to fit within the cacheFind data localityreuse 󲰐 localized iteration space 󲰛 localityTypes of reuse/localityfor i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] + B[j+1][0];ijA[i][j]ijB[j+1][0]jB[j][0]spatial temporal groupReuse analysisfor i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] + B[j+1][0];Map n loop indices into d array indices via the array indexing function:Finding temporal reuseTemporal reuse occurs between iterations and wheneverRather than worrying about individual values of and , we say that reuse occurs along direction vector whenSolve by computing the nullspace of HTemporal reuse exampleReuse between iterations (i1,j1) and (i2,j2) wheneverTrue whenever j1=j2, and regardless of difference between i1 and i2i.e., when difference lies along the nullspace of , which is span{(1.0)}for i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] + B[j+1][0];A more complicated examplefor i = 0 to N-1 for j = 0 to N-1 A[i+j][0] = i*j;ijNullspace of = span{(1,-1)}Computing spatial reuseReplace last row of H with zeroes, creating HsFind the nullspace of HsResult: vector along which we access the same rowSpatial reuse examplefor i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] + B[j+1][0];ijHs =Nullspace of Hs = span{(0,1)}- i.e., access same row of A[i][j] along inner loopMore complicated examplefor i = 0 to N-1 for j = 0 to N-1 A[i+j] = i*j;Hs =Nullspace of H = span{(1,-1)}Nullspace of Hs = span{(1,0),(0,1)}ijGroup reuseConsider only “uniformly generated sets”i.e., index expressions differ only by constant termsCheck whether they access the same cache lineOnly the leading reference suffers the bulk of the cache missesfor i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] + B[j+1][0];Localized iteration spacefor i = 0 to 2 for j = 0 to 8 A[i][j] = B[j][0] + B[j+1][0];for i = 0 to 2 for j = 0 to 1000000 A[i][j] = B[j][0] + B[j+1][0];ijB[j+1][0]ijB[j+1][0]Given finite cache, when does reuse result in locality?localized in both i and j loops localized in j loop onlyComputing localityReuse vector space 󲰐 localized vector space 󲰛 locality vector spaceExampleIf both loops are localized:span{(1,0)} 󲰐 span{(1,0),(0,1)} 󲰛 span{(1,0)}ie, temporal reuse results in temporal localityIf only innermost loop is localized:span{(1,0)} 󲰐 span{(0,1)} 󲰛 span{}ie, no temporal localityfor i = 0 to 2 for j = 0 to 100 A[i][j] = B[j][0] +


CMU CS 15745 - Memory Optimizations

Download Memory Optimizations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Optimizations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Optimizations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?