OSU CS 419 - Architecture: Caching Issues in Performance - D866672

Home> Schools> Oregon State University> Computer Science (CS) > CS 419> Architecture: Caching Issues in Performance

DOC PREVIEW

OSU CS 419 - Architecture: Caching Issues in Performance

School name Oregon State University

Course Cs 419- Selected Topics In Computer Science

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Architecture: Caching Issues in PerformanceMike [email protected] – November 14, 2011Oregon State UniversityComputer GraphicsOregon State UniversityCPUChipMainMemoryProblem: The Path Between a CPU Chip and Off-chip Memory is Slowmjb – November 14, 2011Oregon State UniversityComputer GraphicsThis path is relatively slow, forcing the CPU to wait for up to 200 clock cycles just to do a store to, or a load from, memory.Depending on your CPU’s ability to process instructions out-of-order, it might go idle during this time.This is a hugeperformance hit!CPUChipMainMemorySolution: Hierarchical Memory Systems, or “Cache”mjb – November 14, 2011Oregon State UniversityComputer GraphicsThe solution is to add intermediate memory systems. The one closest to the CPU is small and fast. The memory systems get slower and larger as they get farther away from the CPU.In computer science, a cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (due to longer access time) or to compute, compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access. Once the data is stored in the cache, future use can be made by accessing the cached copy rather than re-fetching or recomputing the original data so that the average access time isHow is Cache Memory Actually Defined?mjb – November 14, 2011Oregon State UniversityComputer Graphicsfetching or recomputing the original data, so that the average access time is shorter. Cache, therefore, helps expedite data access that the CPU would otherwise need to fetch from main memory.-- WikipediaCPUChipMainMemoryL1L2Cache and Memory are Named by “Distance Level” from the CPUmjb – November 14, 2011Oregon State UniversityComputer GraphicsL1 L2 Memory DiskType of Storage On-chip Registers On-chip CMOSmemoryOff-chip DRAM main memoryDiskTypical Size < 100 KB < 8 MB < 10 GB Many GBsTypical Access Time (ns).25 - .50 .5 – 25.0 50 - 250 5,000,000Bandwidth (MB/sec)50,000 – 500,000 5,000 – 20,000 2,500 – 10,000 50 - 500Storage Level Characteristicsmjb – November 14, 2011Oregon State UniversityComputer Graphics()Managed by Compiler Hardware OS OSAdapted from: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Morgan-Kaufmann, 2007. (4thEdition)Usually there are two L1 caches – one for Instructions and one for Data. You will often see this referred to in data sheets as: “L1 cache: 64KB + 64KB” or “I and D cache”2When the CPU asks for a value from memory, and that value is already in the cache, it can get it quickly.This is called a cache hitWhen the CPU asks for a value from memory, and that value is not already in the cache, it will have to go off the chip to get it.This is called a cache missCache Hits and Missesmjb – November 14, 2011Oregon State UniversityComputer GraphicsPerformance programming should strive to avoid as many cache misses as possible. That’s why it is very helpful to know the cache structure of your CPU.Memory is arranged in blocks. The size of a block is typically 64 Kbytes.While cache might be multiple kilo- or megabytes, the bytes are transferred in much smaller quantities, each called a cache line. The size of a cache line is typically just 64 bytes.Block 0Block 1Block 2Block 3Possible Cache Architectures1. Fully Associative – cache lines from any block of memory can appear anywhere in cache. 2. Direct Mapped – a cache line from a particular block of memory has only one place it could appear in cache. A memory block’s cache line is:Cache line # = Memory block # % # lines the cache has 01234567•••0123012mjb – November 14, 2011Oregon State UniversityComputer Graphics3. N-way Set Associative – a cache line from a particular block of memory can appear in a limited number of places in cache. Each “limited place” is called a set of cache lines. A set contains N cache lines. A memory block’s set number is:Set # = Memory block # % # sets the cache hasThe memory block can appear in any cache line in its set. N is typically 4 for L1 and 8 or 16 for L2.Set0Set1Set2Set3Note that Direct Mapped can be thought of as 1-Way Set Associative.Note that Fully Associative can be thought of as 1-Set Set-Associative.0123456730123What Happens When the Cache is Full and a New Piece of Memory Needs to Come In?1. Random – randomly pick a cache line to remove2. Least Recently Used (LRU) – remove the cache line which has gone unaccessed the longest3. Oldest(FIFO, First-In-First-Out) –remove the cache line that has mjb – November 14, 2011Oregon State UniversityComputer Graphics()been there the longest Actual Caches Today are N-way Set AssociativeGood news: This keeps one specific block of memory from hogging all the cache lines. Bad news: It also means that a block that your program uses much more than the other blocks will need to over-use those N cache lines assigned to that block, and underuse the other cache lines assigned to the other blocks.This is like a good-news / bad-news joke:mjb – November 14, 2011Oregon State UniversityComputer Graphicsunderuse the other cache lines assigned to the other blocks.Actual Cache ArchitectureLet’s try some reasonable numbers. Assume there is 1 GB of main memory, 512 KB of L2 cache, and 64-byte 16-way cache lines in the L2 cache. This means that there are512819264Kcache lines available all together in the cache. The “16-way” means that there are 16 cache lines per set, so that the cache must be divided into:mjb – November 14, 2011Oregon State UniversityComputer Graphics819251216sets, which makes each memory block need to contain:1 1,073,741,8242,097,152 2512 512GBMBof main memory. What does this mean to you?Actual Cache Architecture – Take Home MessageA 2 MB memory block could be contained in2 524,28832,76864 64MBcache lines, but it only has 16 available. If your program is using data from just one block, and they are coming from all over the block and going back and forth, you will have many cache misses and those 16 cache lines will keep being re-loaded and re-loaded. Performance will suffer. mjb – November 14, 2011Oregon State UniversityComputer GraphicsIf your program is using data from just one block, and they are coming from just a (16 cache-lines) * (64 bytes/cache-line) = 1024-byte

View Full Document