Unformatted text preview:

COMP 206: Computer Architecture and ImplementationReview: Improving Cache Performance1. Fast Hit Times via Small, Simple Caches2. Fast Hits by Avoiding Addr. TranslationVirtually Addressed Caches3. Pipeline Write HitsCache Optimization SummaryImpact of CachesCache CoherenceSlide 10Example of Cache CoherenceExample of Cache Coherence (contd)Two “Snoopy” ProtocolsNotation: Write-Through CacheNotation: Write-Back CacheThree-State Write-Invalidate ProtocolUnderstanding the ProtocolState Diagram of Cache Block (Part 1)State Diagram of Cache Block (Part 2)Comparison with Single WB CacheCorrectness of Three-State ProtocolAdding More Bits to ProtocolsMESI ProtocolState Diag. of MESI Cache Block (Part 1)State Diag. of MESI Cache Block (Part 2)Comparison with Three-State ProtocolComments on Write-Invalidate Protocols1COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementationMontek SinghMontek SinghWed., Nov. 12, 2003Wed., Nov. 12, 2003Topics: Topics: 1. Cache Performance (concl.)1. Cache Performance (concl.)2. Cache Coherence2. Cache Coherence2Review: Improving Cache Review: Improving Cache PerformancePerformance1. Reduce the miss rate, 1. Reduce the miss rate, 2. Reduce the miss penalty, or2. Reduce the miss penalty, or3. Reduce the time to hit in the cache.3. Reduce the time to hit in the cache.31. Fast Hit Times via Small, Simple 1. Fast Hit Times via Small, Simple CachesCachesSimple caches can be fasterSimple caches can be fastercache hit time increasingly a bottleneck to CPU cache hit time increasingly a bottleneck to CPU performanceperformanceset associativity requires complex tag matching set associativity requires complex tag matching  slower slowerdirect-mapped are simpler direct-mapped are simpler  faster faster  shorter CPU cycle shorter CPU cycle timestimes–tag check can be overlapped with transmission of datatag check can be overlapped with transmission of dataSmaller caches can be fasterSmaller caches can be fastercan fit on the same chip as CPUcan fit on the same chip as CPUavoid penalty of going off-chipavoid penalty of going off-chipfor L2 caches: compromisefor L2 caches: compromisekeep tags on chip, and data off chipkeep tags on chip, and data off chip–fast tag check, yet greater cache capacityfast tag check, yet greater cache capacityL1 data cache reduced from 16KB in Pentium III to L1 data cache reduced from 16KB in Pentium III to 8KB in Pentium IV8KB in Pentium IV42. Fast Hits by Avoiding Addr. 2. Fast Hits by Avoiding Addr. TranslationTranslationFor For Virtual Memory: Virtual Memory: can send virtual address to cache? Called can send virtual address to cache? Called Virtually Addressed Cache or just Virtual Cache, vs. Physical CacheVirtually Addressed Cache or just Virtual Cache, vs. Physical CacheBenefits: avoid translation from virtual to real address; saves timeBenefits: avoid translation from virtual to real address; saves timeProblems:Problems:Every time process is switched logically must flush the cache; otherwise get Every time process is switched logically must flush the cache; otherwise get false hitsfalse hits–Cost is time to flush + “compulsory” misses from empty cacheCost is time to flush + “compulsory” misses from empty cacheDealing with aliases (sometimes called synonyms); Dealing with aliases (sometimes called synonyms); Two different virtual addresses map to same physical addressTwo different virtual addresses map to same physical addressI/O must interact with cache, so need mapping to virtual addressI/O must interact with cache, so need mapping to virtual addressSome Solutions partially address these issuesSome Solutions partially address these issuesHW guarantee: each cache frame holds unique physical addressHW guarantee: each cache frame holds unique physical addressSW guarantee: lower n bits must have same address; as long as SW guarantee: lower n bits must have same address; as long as covers index field & direct mapped, they must be unique;covers index field & direct mapped, they must be unique;called called page coloringpage coloringSolution to cache flushSolution to cache flushAdd process identifier tag that identifies process as well as address Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong processwithin process: can’t get a hit if wrong process5Virtually Addressed CachesVirtually Addressed CachesCPUTLBCacheMEMVAPAPAConventionalOrganizationCPUCacheTLBMEMVAVAPAVirtually Addressed CacheTranslate only on missVATags63. Pipeline Write Hits3. Pipeline Write HitsWrite Hits take slightly longer than Read Hits:Write Hits take slightly longer than Read Hits:cannot parallelize tag matching with data transfercannot parallelize tag matching with data transfermust match tags before data is written!must match tags before data is written!Summary of Key Idea:Summary of Key Idea:pipeline the writespipeline the writescheck tag first; if match, let CPU resumecheck tag first; if match, let CPU resumelet the actual write take its timelet the actual write take its time7TechniqueTechniqueMRMRMPMPHTHT Complexity ComplexityLarger Block SizeLarger Block Size++––00Higher AssociativityHigher Associativity++––11Victim CachesVictim Caches++22Pseudo-Associative Caches Pseudo-Associative Caches ++22HW Prefetching of Instr/DataHW Prefetching of Instr/Data++22Compiler Controlled PrefetchingCompiler Controlled Prefetching++33Compiler Reduce MissesCompiler Reduce Misses++00Priority to Read MissesPriority to Read Misses++11Subblock Placement Subblock Placement ++++11Early Restart & Critical Word 1st Early Restart & Critical Word 1st ++22Non-Blocking CachesNon-Blocking Caches++33Second Level CachesSecond Level Caches++22Small & Simple CachesSmall & Simple Caches––++00Avoiding Address TranslationAvoiding Address Translation++22Cache Optimization SummaryCache Optimization Summary8Impact of CachesImpact of Caches1960-1985: Speed 1960-1985: Speed = ƒ(no. operations)= ƒ(no. operations)19971997Pipelined Pipelined Execution & Execution & Fast Clock RateFast Clock RateOut-of-Order Out-of-Order completioncompletionSuperscalar Superscalar Instruction IssueInstruction Issue1999: Speed = 1999: Speed = ƒ(non-cached memory accesses)ƒ(non-cached memory accesses)Has impact on:Has impact on:Compilers,


View Full Document

UNC-Chapel Hill COMP 206 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?