COMP 206: Computer Architecture and ImplementationOutlineThe Big Picture: Where are We Now?The Motivation for CachesThe Principle of LocalityLevels of the Memory HierarchyMemory Hierarchy: Principles of OperationMemory Hierarchy: TerminologyCache AddressingCache ShapesExamples of Cache ConfigurationsStorage Overhead of CacheCache OrganizationWrite Allocate versus Not AllocateBasics of Cache Operation: OverviewDetails of Simple Blocking CacheA-way Set-Associative CacheFully Associative CacheCache Block Replacement PoliciesCache Write PolicyWrite Buffer for Write ThroughWrite Buffer Saturation1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghMon, Oct 31, 2005Mon, Oct 31, 2005Topic: Topic: Memory Hierarchy Design (HP3 Ch. 5)Memory Hierarchy Design (HP3 Ch. 5)(Caches, Main Memory and Virtual Memory)(Caches, Main Memory and Virtual Memory)2OutlineOutlineMotivation for CachesMotivation for CachesPrinciple of localityPrinciple of localityLevels of Memory HierarchyLevels of Memory HierarchyCache OrganizationCache OrganizationCache Read/Write PoliciesCache Read/Write PoliciesBlock replacement policiesBlock replacement policiesWrite-back vs. write-through cachesWrite-back vs. write-through cachesWrite buffersWrite buffersReading: HP3 Sections 5.1-5.2Reading: HP3 Sections 5.1-5.23The Five Classic Components of a ComputerThe Five Classic Components of a ComputerThis lecture (and next few): Memory SystemThis lecture (and next few): Memory SystemControlDatapathMemoryProcessorInputOutputThe Big Picture: Where are We The Big Picture: Where are We Now?Now?4MotivationMotivationLarge (cheap) memories (DRAM) are slowLarge (cheap) memories (DRAM) are slowSmall (costly) memories (SRAM) are fastSmall (costly) memories (SRAM) are fastMake the average access time smallMake the average access time smallservice most accesses from a small, fast memoryservice most accesses from a small, fast memoryreduce the bandwidth required of the large memoryreduce the bandwidth required of the large memoryProcessorMemory SystemCacheDRAMThe Motivation for CachesThe Motivation for Caches5The Principle of LocalityThe Principle of LocalityProgram accesses a relatively small portion of the address Program accesses a relatively small portion of the address space at any instant of timespace at any instant of timeExample: 90% of time in 10% of the codeExample: 90% of time in 10% of the codeTwo different types of localityTwo different types of localityTemporal Locality (locality in time): Temporal Locality (locality in time): if an item is referenced, it will tend to be referenced again soonif an item is referenced, it will tend to be referenced again soonSpatial Locality (locality in space): Spatial Locality (locality in space): if an item is referenced, items close by tend to be referenced if an item is referenced, items close by tend to be referenced soonsoonAddress Space0 2nFrequencyof referenceThe Principle of LocalityThe Principle of Locality6CPU Registers500 Bytes0.25 ns~$.01Cache16K-1M Bytes1 ns~$.0001Main Memory64M-2G Bytes100ns~$.0000001Disk100 G Bytes5 ms10-5- 10-7 centsCapacityAccess TimeCost/bitTape/Network“infinite”secs.10-8 centsRegistersL1, L2, … CacheMemoryDiskTape/NetworkWordsBlocksPagesFilesStagingTransfer Unitprogrammer/compiler1-8 bytescache controller8-128 bytesOS4-64K bytesuser/operatorMbytesUpper LevelLower LevelFasterLargerLevels of the Memory HierarchyLevels of the Memory Hierarchy7Lower Level(Memory)Upper Level(Cache)To ProcessorFrom ProcessorBlk XBlk YMemory Hierarchy: Principles of Memory Hierarchy: Principles of OperationOperationAt any given time, data is copied between only At any given time, data is copied between only 2 adjacent levels2 adjacent levelsUpper Level (Cache): the one closer to the processorUpper Level (Cache): the one closer to the processorSmaller, faster, and uses more expensive technologySmaller, faster, and uses more expensive technologyLower Level (Memory): the one further away from the Lower Level (Memory): the one further away from the processorprocessorBigger, slower, and uses less expensive technologyBigger, slower, and uses less expensive technologyBlockBlockThe smallest unit of information that can either be The smallest unit of information that can either be present or not present in the two-level hierarchypresent or not present in the two-level hierarchy8Memory Hierarchy: TerminologyMemory Hierarchy: TerminologyHit:Hit: data appears in some block in the upper data appears in some block in the upper level (e.g.: Block X in previous slide) level (e.g.: Block X in previous slide) Hit Rate = fraction of memory access found in upper Hit Rate = fraction of memory access found in upper levellevelHit Time = time to access the upper levelHit Time = time to access the upper levelmemory access time + Time to determine hit/missmemory access time + Time to determine hit/missMiss:Miss: data needs to be retrieved from a block in data needs to be retrieved from a block in the lower level (e.g.: Block Y in previous slide)the lower level (e.g.: Block Y in previous slide)Miss Rate = 1 - (Hit Rate)Miss Rate = 1 - (Hit Rate)Miss Penalty: includes time to fetch a new block from Miss Penalty: includes time to fetch a new block from lower levellower levelTime to replace a block in the upper level from lower level + Time to replace a block in the upper level from lower level + Time to deliver the block the processorTime to deliver the block the processorHit Time: significantly less than Miss PenaltyHit Time: significantly less than Miss Penalty9Cache AddressingCache AddressingSet 0Set j-1Block 0 Block k-1 Replacement infoSector 0 Sector m-1 TagByte 0 Byte n-1 Valid Dirty SharedBlock/line is unit of allocationBlock/line is unit of allocationSector/sub-block is unit of transfer and coherenceSector/sub-block is unit of transfer and coherenceCache parameters Cache parameters jj, , kk, , mm, , nn are integers, and are integers, and generally powers of 2generally powers of 210Cache ShapesCache ShapesDirect-mapped(A = 1, S = 16)2-way set-associative(A = 2, S = 8)4-way set-associative(A = 4, S = 4)8-way set-associative(A = 8, S = 2)Fully associative(A = 16, S = 1)11Examples of Cache ConfigurationsExamples of Cache Configurations# Sets # Blocks # Sectors # Bytes Name1 k m n
View Full Document