Memory HierarchyGoals of Today’s LectureMotivation for Memory HierarchySimple Three-Level HierarchyWidening Processor/Memory GapAn Example Memory HierarchyLocality of ReferenceLocality Makes Caching EffectiveCaching in a Memory HierarchyCache Block SizesCache Hit and MissThree Kinds of Cache MissesCache ReplacementWho Manages the Cache?Manual Allocation: SegmentationAutomatic Allocation: Virtual MemoryMaking Good Use of Memory and DiskVirtual Address for a ProcessVirtual Memory for a ProcessPage Table to Manage the Cache“Miss” Triggers Page Fault ExceptionOS Handles the Page FaultVM as a Tool for Memory ProtectionSharing Physical MemoryProcess-ID and Page Table EntriesPage Tables in OS Memory...Measuring the Memory UsageVM as a Tool for Memory ManagementConclusion1Memory HierarchyProfessor Jennifer Rexfordhttp://www.cs.princeton.edu/~jrex2Goals of Today’s Lecture•Memory hierarchyFrom fast/expensive/small to slow/cheap/big memory technologyRegisters, on-chip cache, off-chip cache, main memory, disk, tape•Locality of referenceSpatial and temporal locality, of program data and instructionsCaching to store small number of recently-used memory blocks•Virtual memorySeparation of virtual addresses and physical memory locationsMain memory as a cache of virtual pages from the diskMemory protection from misbehaving user processes3Motivation for Memory Hierarchy•Faster storage technologies are more costlyCost more money per byteHave lower storage capacityRequire more power and generate more heat•The gap between processing and memory is wideningProcessors have been getting faster and fasterMain memory speed is not improving as dramatically•Well-written programs tend to exhibit good localityAcross time: repeatedly referencing the same variablesAcross space: often accessing other variables located nearby Want the speed of fast storage at the cost and capacity of slow storage. Key idea: memory hierarchy!4Simple Three-Level Hierarchy•RegistersUsually reside directly on the processor chipEssentially no latency, referenced directly in instructionsLow capacity (e.g., 32-512 bytes)•Main memoryAround 100 times slower than a clock cycleConstant access time for any memory locationModest capacity (e.g., 512 MB-2GB)•DiskAround 100,000 times slower than main memoryFaster when accessing many bytes in a rowHigh capacity (e.g., 200 GB)5Widening Processor/Memory Gap•Gap in speed increasing from 1986 to 2000CPU speed improved ~55% per yearMain memory speed improved only ~10% per year•Main memory as major performance bottleneckMany programs stall waiting for reads and writes to finish•Changes in the memory hierarchyIncreasing the number of registers–8 integer registers in the x86 vs. 128 in the ItaniumAdding caches between registers and main memory–On-chip level-1 cache and off-chip level-2 cacheAn Example Memory Hierarchyregisterson-chip L1cache (SRAM)main memory(DRAM)local secondary storage(local disks)Larger, slower, and cheaper (per byte)storagedevicesremote secondary storage(tapes, distributed file systems, Web servers)Local disks hold files retrieved from disks on remote network servers.Main memory holds disk blocks retrieved from local disks.off-chip L2cache (SRAM)L1 cache holds cache lines retrieved from the L2 cache memory.CPU registers hold words retrieved from L1 cache.L2 cache holds cache lines retrieved from main memory.L0:L1:L2:L3:L4:L5:Smaller,faster,and costlier(per byte)storage devices7Locality of Reference•Two kinds of localityTemporal locality: recently-referenced items are likely to be referenced in near futureSpatial locality: Items with nearby addresses tend to be referenced close together in time.•Locality exampleProgram data–Temporal: the variable sum–Spatial: variable a[i+1] accessed soon after a[i]Instructions–Temporal: cycle through the for-loop repeatedly–Spatial: reference instructions in sequencesum = 0;for (i = 0; i < n; i++)sum += a[i];return sum;8Locality Makes Caching Effective•CacheSmaller, faster storage device that acts as a staging area … for a subset of the data in a larger, slower device•Caching and the memory hierarchyStorage device at level k is a cache for level k+1Registers as cache of L1/L2 cache and main memoryMain memory as a cache for the diskDisk as a cache of files from remote storage•Locality of access is the keyMost accesses satisfied by first few (faster) levelsVery few accesses go to the last few (slower) levelsCaching in a Memory Hierarchy0 1 2 34 5 6 78 9 10 1112 13 14 15Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.Data copied between levels in block-sized transfer units9 3Smaller, faster, more expensivedevice at level k caches a subsetof the blocks from level k+1Level k:Level k+1:44 101010Cache Block Sizes•Fixed vs. variable sizeFixed-sized blocks are easier to manage (common case)Variable-sized blocks make more efficient use of storage•Block sizeDepends on access times at the level k+1 deviceLarger block sizes further down in the hierarchyE.g., disk seek times are slow, so disk pages are larger•ExamplesCPU registers: 4-byte wordsL1/L2 cache: 32-byte blocksMain memory: 4 KB pagesDisk: entire files11Cache Hit and Miss•Cache hitProgram accesses a block available in the cacheSatisfy directly from cacheE.g., request for “10”•Cache missProgram accesses a block not available in the cacheBring item into the cacheE.g., request for “13”•Where to place the item?•Which item to evict?0 1 2 34 5 6 78 9 10 1112 13 14 1589 14 3Level k:Level k+1:44101012Three Kinds of Cache Misses•Cold (compulsory) missCold misses occur because the block hasn’t been accessed beforeE.g., first time a segment of code is executedE.g., first time a particular array is referenced•Capacity missSet of active cache blocks (the “working set”) is larger than cacheE.g., manipulating a 1200-byte array within a 1000-byte cache•Conflict missSome caches limit the locations where a block can be storedE.g., block i must be placed in cache location (i mod 4)Conflicts occur when multiple blocks map to the same location(s)E.g., referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time13Cache Replacement•Evicting a block from the cacheNew block must be brought into the cacheMust
View Full Document