Unformatted text preview:

Locality and CachingTopicsTopics Locality of reference Cache principles Multi-level cachesSystems I2LocalityPrinciple of Locality:Principle of Locality: Programs tend to reuse data and instructions near thosethey have used recently, or that were recently referencedthemselves. Temporal locality: Recently referenced items are likely to bereferenced in the near future. Spatial locality: Items with nearby addresses tend to bereferenced close together in time.Locality Example:• Data– Reference array elements in succession(stride-1 reference pattern):– Reference s um each iteration:• Instructions– Reference instructions in sequence:– Cycle through loop repeatedly:sum = 0;for (i = 0; i < n; i++)sum += a[i];return sum;Spatial localitySpatial localityTemporal localityTemporal locality3Locality ExampleClaim:Claim: Being able to look at code and get a qualitative Being able to look at code and get a qualitativesense of its locality is a key skill for a professionalsense of its locality is a key skill for a professionalprogrammer.programmer.Question:Question: Does this function have good locality? Does this function have good locality?int sumarrayrows(int a[M][N]){ int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum;}4Locality ExampleQuestion:Question: Does this function have good locality? Does this function have good locality?int sumarraycols(int a[M][N]){ int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum;}5Locality ExampleQuestion:Question: Can you permute the loops so that the Can you permute the loops so that thefunction scans the 3-d array function scans the 3-d array a[]a[] with a stride-1 with a stride-1reference pattern (and thus has good spatialreference pattern (and thus has good spatiallocality)?locality)?int sumarray3d(int a[M][N][N]){ int i, j, k, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) for (k = 0; k < N; k++) sum += a[k][i][j]; return sum;}6Memory HierarchiesSome fundamental and enduring properties ofSome fundamental and enduring properties ofhardware and software:hardware and software: Fast storage technologies cost more per byte and have lesscapacity. The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality.These fundamental properties complement each otherThese fundamental properties complement each otherbeautifully.beautifully.They suggest an approach for organizing memory andThey suggest an approach for organizing memory andstorage systems known as a storage systems known as a memory hierarchymemory hierarchy..7An Example Memory Hierarchyregisterson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, and cheaper (per byte)storagedevicesremote secondary storage(distributed file systems, Web servers)Local disks hold filesretrieved from disks onremote network servers.Main memory holds diskblocks retrieved from localdisks.off-chip L2cache (SRAM)L1 cache holds cache lines retrievedfrom the L2 cache memory.CPU registers hold words retrievedfrom L1 cache.L2 cache holds cache linesretrieved from main memory.L0:L1:L2:L3:L4:L5:Smaller,faster,and costlier(per byte)storage devices8CachesCache:Cache: A smaller, faster storage device that acts as a staging area A smaller, faster storage device that acts as a staging areafor a subset of the data in a larger, slower device.for a subset of the data in a larger, slower device.Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy: For each k, the faster, smaller device at level k serves as a cachefor the larger, slower device at level k+1.Why do memory hierarchies work?Why do memory hierarchies work? Programs tend to access the data at level k more often than theyaccess the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger andcheaper per bit. Net effect: A large pool of memory that costs as much as the cheapstorage near the bottom, but that serves data to programs at therate of the fast storage near the top. Use combination of small fast memory and big slow memory togive illusion of big fast memory.9Caching in a Memory Hierarchy0 1 2 34 5 6 78 9 10 1112 13 14 15Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.Data is copied betweenlevels in block-sized transferunits89 14 3Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1Level k:Level k+1:44410101010Request14Request12General Caching ConceptsProgram needs object d, which is storedProgram needs object d, which is storedin some block b.in some block b.Cache hitCache hit Program finds b in the cache at levelk. E.g., block 14.Cache missCache miss b is not at level k, so level k cachemust fetch it from level k+1.E.g., block 12. If level k cache is full, then somecurrent block must be replaced(evicted). Which one is the “victim”? Placement policy: where can the newblock go? E.g., b mod 4 Replacement policy: which blockshould be evicted? E.g., LRU9 30 1 2 34 5 6 78 9 10 1112 13 14 15Level k:Level k+1:141412144*4*12120 1 2 3Request124*4*1211General Caching ConceptsTypes of cache misses:Types of cache misses: Cold (compulsary) miss Cold misses occur because the cache is empty. Conflict miss Most caches limit blocks at level k+1 to a small subset(sometimes a singleton) of the block positions at level k. E.g. Block i at level k+1 must be placed in block (i mod 4) atlevel k+1. Conflict misses occur when the level k cache is large enough,but multiple data objects all map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. Capacity miss Occurs when the set of active cache blocks (working set) islarger than the cache.12Examples of Caching in the HierarchyHardware0On-Chip TLBAddresstranslationsTLBWebbrowser10,000,000Local diskWeb pagesBrowsercacheWeb cacheNetwork buffercacheBuffer cacheVirtualMemoryL2 cacheL1 cacheRegistersCache TypeWeb pagesParts of filesParts of files4-KB page32-byte block32-byte block4-byte wordWhat CachedWeb proxyserver1,000,000,000Remote serverdisksOS100Main memoryHardware1On-Chip L1Hardware10Off-Chip L2AFS/NFSclient10,000,000Local diskHardware+OS100Main memoryCompiler0 CPU registersManagedByLatency(cycles)Where Cached13Cache


View Full Document

UT CS 429H - Systems I Locality and Caching

Download Systems I Locality and Caching
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Systems I Locality and Caching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Systems I Locality and Caching 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?