Unformatted text preview:

CMSC 611: AdvancedCMSC 611: AdvancedComputer ArchitectureComputer ArchitectureCache & MemoryCache & MemoryMost slides adapted from David Patterson. Some from Mohomed YounisSecond Level CacheSecond Level Cache• The previous techniques reduce the impact ofthe miss penalty on the CPU– L2 cache handles the cache-memory interface• Measuring cache performance• Local miss rate– misses in this cache divided by the total number ofmemory accesses to this cache (MissRateL2)• Global miss rate (& biggest penalty!)– misses in this cache divided by the total number ofmemory accesses generated by the CPU(MissRateL1 ! MissRateL2) ! AMAT = HitTimeL1+ MissRateL1" MissPenaltyL1= HitTimeL1+ MissRateL1" (HitTimeL2+ MissRateL2" MissPenaltyL2)(Global miss rate close to single level cache rate provided L2 >> L1)Local Local vs vs Global MissesGlobal MissesBlock size of second-level cache (byte)Relative execution time• 32 bit bus• 512KB cacheL2 Cache ParametersL2 Cache Parameters• L1 cache directly affectsthe processor designand clock cycle: shouldbe simple and small• Bulk of optimizationtechniques can go easilyto L2 cache• Miss-rate reductionmore practical for L2• Considering the L2cache can improve theL1 cache design,– e.g. use write-through ifL2 cache applies write-backAverage Access Time = Hit Time x (1 - Miss Rate) + Miss Time x Miss RateReducing Hit TimeReducing Hit Time• Hit rate is typically very high compared to miss rate– any reduction in hit time is magnified• Hit time critical: affects processor clock rate• Three techniques to reduce hit time:– Simple and small caches– Avoid address translation during cache indexing– Pipelining writes for fast write hitsSimple and small caches• Design simplicity limits control logic complexity andallows shorter clock cycles• On-chip integration decreases signal propagationdelay, thus reducing hit time– Alpha 21164 has 8KB Instruction and 8KB data cache and96KB second level cache to reduce clock rateAvoiding Address TranslationAvoiding Address Translation• Send virtual address to cache?– Called Virtually Addressed Cache or just VirtualCache vs. Physical Cache– Every time process is switched logically must flushthe cache; otherwise get false hits• Cost is time to flush + “compulsory” misses from emptycache– Dealing with aliases (sometimes called synonyms)• Two different virtual addresses map to same physicaladdress causing unnecessary read misses or even RAW– I/O must interact with cache, so need virtualaddressSolutionsSolutions• Solution to aliases– HW guarantees that every cache block hasunique physical address (simply check allcache entries)– SW guarantee: lower n bits must have sameaddress so that it overlaps with index; aslong as covers index field & direct mapped,they must be unique; called page coloring• Solution to cache flush– Add process identifier tag that identifiesprocess as well as address within process:cannot get a hit if wrong processImpact of Using Process IDImpact of Using Process ID• Miss rate vs. virtuallyaddressed cachesize of a programmeasured threeways:– Without processswitches(uniprocessor)– With processswitches using a PIDtag (PID)– With processswitches but withoutPID (purge)Virtually Addressed CachesVirtually Addressed CachesCPUTB$MEMVAPAPAConventionalOrganizationCPU$TBMEMVAVAPAVirtually Addressed CacheTranslate only on missSynonym ProblemCPU$ TBMEMVAPATagsPAOverlap $ accesswith VA translation:requires $ index toremain invariantacross translationVATagsL2 $VA: Virtual address TB: Translation buffer PA: Page addressIndexing via PhysicalIndexing via PhysicalAddressesAddresses• If index is physical part of address, can start tagaccess in parallel with translation• To get the best of the physical and virtual caches, usethe page offset (not affected by the addresstranslation) to index the cache• The drawback is that direct-mapped caches cannot bebigger than the page size (typically 4-KB)• To support bigger caches and use same technique:– Use higher associativity since the tag size gets smaller– OS implements page coloring since it will fix a few leastsignificant bits in the address (move part of the index to thetag)“Delayed Write Buffer”; must bechecked on reads; either completewrite or read from bufferPipeline Tag Checkand Update Cacheas separate stages;current write tagcheck & previouswrite cache updatePipelined Cache WritesPipelined Cache Writes• In cache read, tag check and block reading areperformed in parallel while writing requires validatingthe tag first• Tag Check can be performed in parallel with aprevious cache updateCache Optimization SummaryCache Optimization SummaryTechnique MR MP HT ComplexityLarger Block Size + – 0Higher Associativity + – 1Victim Caches + 2Pseudo-Associative Caches + 2HW Pre-fetching of Instr/Data + 2Compiler Controlled Pre-fetching + 3Compiler Reduce Misses + 0Priority to Read Misses + 1Sub-block Placement + + 1Early Restart & Critical Word 1st + 2Non-Blocking Caches + 3Second Level Caches + 2Small & Simple Caches – + 0Avoiding Address Translation + 2Pipelining Writes + 1miss ratehit timemisspenaltyCPU Registers100s Bytes<10s nsCacheK Bytes10-40 nsMain MemoryM Bytes70ns-1usDiskG BytesmsCapacityAccess TimeTapeinfinitesec-minRegistersCacheMain MemoryDiskTapeInstr. OperandsBlocksPagesFilesStagingTransfer UnitProg./compiler1-8 bytescache cntl8-128 bytesOS512-4K bytesuser/operatorMbytesUpper LevelLower LevelfasterLargerMemory HierarchyMemory HierarchyCache Virtual memoryBlock " PageCache miss " page faultBlock " Addressaddressing translationVirtual MemoryVirtual Memory• Using virtual addressing,main memory plays therole of cache for disks• The virtual space ismuch larger than thephysical memory space• Physical main memorycontains only the activeportion of the virtualspace• Address space can bedivided into fixed size(pages) or variable size(segments) blocksPhysical address esDisk addressesVirtual addressesAddress translationCache Virtual memoryBlock " PageCache miss " page faultBlock " Addressaddressing translationVirtual MemoryVirtual Memory• Advantages– Allows efficient and safedata sharing of memoryamong multiple programs– Moves programmingburdens of a small, limitedamount of main memory– Simplifies programloading and avoid theneed for contiguousmemory block– allows programs to beloaded at any


View Full Document

UMBC CMSC 611 - Cache & Memory

Download Cache & Memory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache & Memory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache & Memory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?