OSU CS 419 - Architecture: Caching Issues in Performance - D505977

Home> Schools> Oregon State University> Computer Science (CS) > CS 419> Architecture: Caching Issues in Performance

DOC PREVIEW

OSU CS 419 - Architecture: Caching Issues in Performance

School name Oregon State University

Course Cs 419- Selected Topics In Computer Science

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Architecture: Caching Issues in PerformanceMike [email protected] State Universitymjb – November 14, 2011Oregon State UniversityComputer GraphicsProblem: The Path Between a CPU Chip and Off-chip Memory is SlowCPUChipMainMemoryThis path is relatively slow, forcing the CPU to wait for up to 200 lk l jt t d t t ld f clock cycles just to do a store to, or a load from, memory.Depending on your CPU’s ability to process instructions out-of-order, it might go idle during this time.mjb – November 14, 2011Oregon State UniversityComputer GraphicsThis is a hugeperformance hit!Solution: Hierarchical Memory Systems, or “Cache”CPUChiMainMemoryChipThe solution is to add intermediate memory systems. The one closest to the CPU is small and fast. The memory systems get slower and larger as they get farther away from the CPU.mjb – November 14, 2011Oregon State UniversityComputer GraphicsHow is Cache Memory Actually Defined?In computer science, a cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (due to longer access time) or to compute, compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access Once the data is stored in theaccessed data can be stored for rapid access. Once the data is stored in the cache, future use can be made by accessing the cached copy rather than re-fetching or recomputing the original data, so that the average access time is shorter. Cache, therefore, helps expedite data access that the CPU would th i d t f t h f iotherwise need to fetch from main memory.-- Wikipediamjb – November 14, 2011Oregon State UniversityComputer GraphicsCache and Memory are Named by “Distance Level” from the CPUCPUChiMainMemoryL1L2ChipeoyL1mjb – November 14, 2011Oregon State UniversityComputer GraphicsStorage Level CharacteristicsL1 L2 Memory DiskType of Storage On-chip Registers On-chip CMOSmemoryOff-chip DRAM main memoryDiskmemoryTypical Size < 100 KB < 8 MB < 10 GB Many GBsTypical Access Time (ns).25 - .50 .5 – 25.0 50 - 250 5,000,000()Bandwidth (MB/sec)50,000 – 500,000 5,000 – 20,000 2,500 – 10,000 50 - 500Managed by Compiler Hardware OS OSAdapted from: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, Morgan-Kaufmann, 2007. (4thEdition)Usually there are two L1 caches – one for Instructions and one for Data. You will often see this referred to in data sheets as: “L1 cache: 64KB + 64KB” or “I and D cache”mjb – November 14, 2011Oregon State UniversityComputer GraphicsCache Hits and MissesWhen the CPU asks for a value from memory, and that value is already in the cache, it can get it quickly.This is called a cache hitWhen the CPU asks for a value from memory, and that value is not already in the cache, it will have to go off the chip to get it.This is called a cache missPerformance programming should strive to avoid as many cache misses as possible. That’s why it is very helpful to know the Block 0cache structure of your CPU.Block 1Memory is arranged in blocks. The size of a block is typically 64 Kbytes.While cache might be multiple kilo- or megabytes, the bytes are transferred in much smaller quantities each called acache lineThe size of a cacheBlock 2mjb – November 14, 2011Oregon State UniversityComputer Graphicsin much smaller quantities, each called a cache line. The size of a cache line is typically just 64 bytes.Block 3Possible Cache Architectures01. Fully Associative – cache lines from any block of memory can appear anywhere in cache. 1232. Direct Mapped – a cache line from a particular block of memory has only one place it could appear in cache. A memory block’s cache line is:012345673013. N-way Set Associative –a cache line from a particular Cache line # = Memory block # % # lines the cache has 01234567•••123ypblock of memory can appear in a limited number of places in cache. Each “limited place” is called a set of cache lines. A set contains N cache lines. A memory block’s set number is:01Set # = Memory block # % # sets the cache hasThe memory block can appear in any cache line in its set. N Set0Set1Set2Set3123mjb – November 14, 2011Oregon State UniversityComputer Graphicsyppyis typically 4 for L1 and 8 or 16 for L2.Note that Direct Mapped can be thought of as 1-Way Set Associative.Note that Fully Associative can be thought of as 1-Set Set-Associative.What Happens When the Cache is Full and a New Piece of Memory Needs to Come In?1Random–randomly pick a cache line to remove1.Random–randomly pick a cache line to remove2. Least Recently Used (LRU) – remove the cache line which has gone unaccessed the longest3. Oldest (FIFO, First-In-First-Out) – remove the cache line that has been there the longest mjb – November 14, 2011Oregon State UniversityComputer GraphicsActual Caches Today are N-way Set AssociativeThis is like a good-news / bad-news joke:Good news: This keeps one specific block of memory from hogging all the cache lines. Bad news:It also means that a block that your program uses much more than theBad news: It also means that a block that your program uses much more than the other blocks will need to over-use those N cache lines assigned to that block, and underuse the other cache lines assigned to the other blocks.mjb – November 14, 2011Oregon State UniversityComputer GraphicsActual Cache ArchitectureLet’s try some reasonable numbers. Assume there is 1 GB of main memory, 512 KB of L2 cache, and 64-byte 16-way cache lines in the L2 cache. This means that there are512819264Kcache lines available all together in the cache. The “16-way” means gythat there are 16 cache lines per set, so that the cache must be divided into:819251216sets, which makes each memory block need to contain:1 1,073,741,8242,097,152 2512 512GBMBmjb – November 14, 2011Oregon State UniversityComputer Graphicsof main memory. What does this mean to you?Actual Cache Architecture – Take Home MessageA 2 MB memory block could be contained in2 524,28832,76864 64MB,64 64cache lines, but it only has 16 available. If your program is using data from just one block, and they are coming from all over the block and going back and forth, you will have many cache misses and those 16 cache lines will keep being re-loaded and re-loaded. Performance will

View Full Document