CacheCache MemorySlide 3LevelsMultilevel CachesWhy is it so fast?ComparisonLocality PrincipleTemporal LocalitySpatial LocalityTerminologyCache Mapping SchemesAssociative Mapped CacheSlide 14Direct-Mapped CacheSlide 16Set-Associative Mapped CacheSlide 18Replacement PoliciesMeasuring Cache PerformanceCache CoherencyDifferent Approaches to CoherencyFalse SharingReferencesCacheBy Asher MoodyFall 2008, CS 147Cache Memory•physically placed between the CPU and main memory.•stores commonly accessed dataLevels•Most modern systems have Level 1 (L1) and Level 2 (L2) Cache•L1: smaller and faster cache•L2: larger cache to compensate for small size of L1▫In some cases L3Multilevel Caches•L1 cache are commonly…▫Split level: data and instructions separately maintained▫With recent advancements they can be located on the CPU•L2 and L3 are…▫Unified: hold data and instructionsWhy is it so fast?•Faster electronics in a small size offsets cost•Fewer locations to access means faster access time•Physically closer to the CPU, avoids communication delaysComparisonLocality Principle•90/10 rule▫90% of execution time is spent accessing only 10% of the memory•This means that most executing programs reference memory from a small number of locationsTemporal Locality•When a program references a memory location, it is likely to reference that same memory location again soonSpatial Locality•Memory located near a recently reference memory location is more likely to be referenced than memory farther away•Reasons:▫iterations▫commonly used subroutinesTerminology•Hit: referenced location found in the cache•Miss: referenced location not found in the cache•Valid Bit: indicates where this data is being used by the current program•Dirty Bit: modified data in the cache that needs to be written back to main memoryCache Mapping SchemesAssociative Mapped CacheDirect-Mapped CacheSet-Associative Mapped CacheAssociative Mapped Cache•Uses a tag to associate the data with where it is stored in main memory•Any main memory block can be mapped to any cache slot•Tags are searched in parallelAssociative Mapped CacheDirect-Mapped Cache•Each main memory block can only be mapped to only one slot•Each slot can receive more than one block•Faster to access than Associative Mapped Cache•Causes problems if two blocks mapped to the same slot need to be used frequently▫These two blocks would keep replacing each otherDirect-Mapped CacheSet-Associative Mapped Cache•Hybrid of fully Associative and Direct Mapped Cache•Cache is split into sets for direct mapping to each set•Then the block is stored associatively within that setSet-Associative Mapped CacheReplacement Policies•Least Recently Used (LRU): using time stamp•Least Frequently Used (LFU): frequency counter•First-In First-Out (FIFO)•Random•OptimalMeasuring Cache Performance•Hit Ratio = No. times referenced words are in cache Total number of memory accesses•Effective Access Time =(# hits)(time per hit) + (# misses)(time per miss) Total number of memory accessesCache CoherencyMultiple caches for multiple processors can create a problem when one processor updates data without the other(s) knowing.To insure cache coherence every cache should see the same value for a referenced location.Different Approaches to Coherency•Directory Based▫shared data is placed in a common directory▫when loading, the cache goes through the directory•Snooping▫each cache watches the memory bus for any changes to lines they haveFalse Sharing•A false sharing situation arises when two processors read and write different locations in the same cache line.•Two operands that are not shared between processes access the same cache line•Solution: avoid it by ensuring operands are assigned to memory locationsReferencesText•Computer Architecture and Organization: An Integrated Approach by Miles J. Murdocca and Vincent P. Heuring Websites•http://en.wikipedia.org/wiki/Cache
View Full Document