Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Memory Hierarchies (II)Edgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielCache PerformanceAvg. memory access time = Hit time + Miss rate x Miss penaltywith– Hit time: time to access a data item which is available in the cache– Miss rate: ratio of no. of memory access leading to a cache miss to the total number of instructions– Miss penalty: time/cycles required for making a data item in the cache2COSC 6385 – Computer ArchitectureEdgar GabrielProcessor Performance• CPU equation:CPU time= (Clock cycle CPU execution+ Clock cycles memory stall) x clock cycle time• Can avg. memory access time really be ‘mapped ‘ to CPU time?– Not all memory stall cycles are due to cache misses• We ignore that on the following slides – Depends on the processor architecture• In-order vs. out-of-order execution• For out-of-order processors need the ‘visible’ portion of the miss penaltyMemory stall cycles/instruction = Misses/instruction x ( Total –miss latency – overlapped miss latency)COSC 6385 – Computer ArchitectureEdgar GabrielReducing cache miss penalty • Five techniques– Multilevel caches– Critical word first and early restart– Giving priority to read misses over writes– Merging write buffer– Victim caches3COSC 6385 – Computer ArchitectureEdgar GabrielMultilevel caches (I)• Dilemma: should the cache be fast or should it be large?• Compromise: multi-level caches– 1stlevel small, but at the speed of the CPU– 2ndlevel larger but slowerAvg. memory access time = Hit time L1+ Miss rate L1x Miss penalty L1andMiss penalty L1= Hit time L2+ Miss rate L2x Miss penaltyL2COSC 6385 – Computer ArchitectureEdgar GabrielMultilevel caches (II)• Local miss rate: rate of number of misses in a cache to total number of accesses to the cache• Global miss rate: ratio of number of misses in a cache number of memory access generated by the CPU– 1stlevel cache: global miss rate = local miss rate– 2ndlevel cache: global miss rate = Miss rateL1x Miss rate L2• Design decision for the 2ndlevel cache:1. Direct mapped or n-way set associative? 2. Size of the 2ndlevel cache?4COSC 6385 – Computer ArchitectureEdgar GabrielMultilevel caches (II)• Assumptions in order to decide question 1:– Hit time L2 cache: • Direct mapped cache:10 clock cycles• 2-way set associative cache: 10.1 clock cycles– Local miss rate L2: • Direct mapped cache: 25%• 2-way set associative: 20%– Miss penalty L2 cache: 100 clock cycles• Miss penalty direct mapped L2= 10 + 0.25 x 100 = 35 clock cycles• Miss penalty 2-way assoc. L2= 10.1 + 0.2 x 100 = 30.1 clock cyclesCOSC 6385 – Computer ArchitectureEdgar GabrielMultilevel caches (III)• Multilevel inclusion: 2ndlevel cache includes all data items which are in the 1stlevel cache– Applied if size of 2ndlevel cache >> size of 1stlevel cache• Multilevel exclusion: Data of L1 cache is never in the L2 cache– Applied if size of 2ndlevel cache only slightly bigger than size of 1stlevel cache– Cache miss in L1 often leads to a swap of an L1 block with an L2 block5COSC 6385 – Computer ArchitectureEdgar GabrielCritical word first and early restart• In case of a cache-miss, an entire cache-block has to be loaded from memory• Idea: don’t wait until the entire cache-block has been load, focus on the required data item– Critical word first: • ask for the required data item• Forward the data item to the processor• Fill up the rest of the cache block afterwards– Early restart:• Fetch words of a cache block in normal order• Forward the requested data item to the processor as soon as available• Fill up the rest of the cache block afterwardsCOSC 6385 – Computer ArchitectureEdgar GabrielGiving priority to read misses over writes• Write-through caches use a write-buffer to speed up write operations• Write-buffer might contain a value required by a subsequent load operations• Two possibilities for ensuring consistency:– A read resulting in a cache miss has to wait until write buffer is empty– Check the contents of the write buffer and take the data item from the write buffer if it is available• Similar technique used in case of a cache-line replacement for n-way set associative caches6COSC 6385 – Computer ArchitectureEdgar GabrielMerging write buffers• Check in the write buffer whether multiple entries can be merged to a single one1001081161241001081161241 Mem[100] 0 0 01 Mem[108] 0 0 01 Mem[116] 0 0 01 Mem[124] 0 0 01 Mem[100] 1 Mem[108] 1 Mem[116] 1 Mem[124]0 0 0 00 0 0 00 0 0 0COSC 6385 – Computer ArchitectureEdgar GabrielVictim caches• Question: how often is a cache block which has just been replaced by another cache block required soon after that again?• Victim cache: fully associative cache between the ‘real’ cache and the memory keeping blocks that have been discarded from the cache – Typically very small7COSC 6385 – Computer ArchitectureEdgar GabrielReducing miss rate• Three categories of cache misses– Compulsory Misses: first access to a block cannot be in the cache (cold start misses)– Capacity Misses: cache cannot contain all blocks required for the execution-> increase cache size– Conflict Misses: cache block has to be discarded because of block replacement strategy-> increase cache size and/or associativity.COSC 6385 – Computer ArchitectureEdgar GabrielReducing miss rate (II)• Five techniques to reduce the miss rate– Larger cache block size– Larger caches– Higher associativity– Way prediction and pseudo-associative caches– Compiler optimization8COSC 6385 – Computer ArchitectureEdgar GabrielLarger block size• Larger block size will reduce compulsory misses• Assuming that the cache size is constant, a larger block size also reduces the number of blocks– Increases conflict missesCOSC 6385 – Computer ArchitectureEdgar GabrielLarger caches• Reduces capacity misses• Might increase hit time ( e.g. if implemented as off-chip caches)• Cost limitations9COSC 6385 – Computer ArchitectureEdgar GabrielHigher AssociativityCOSC 6385 – Computer ArchitectureEdgar GabrielWay Prediction and Pseudo-associative caches• Way prediction– Add a bit to n-way associative caches in order to predict which of the cache blocks will be used• The predicted block is checked first on a data request• If the prediction was wrong, check the other entries• Speeds up the initial


View Full Document

UH COSC 6385 - Memory Hierarchies (II)

Download Memory Hierarchies (II)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory Hierarchies (II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory Hierarchies (II) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?