Unformatted text preview:

Cache OrganizationTopicsTopics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on programmingSystems I2Cache VocabularyCapacityCapacityCache block (Cache block (aka aka cache line)cache line)AssociativityAssociativityCache setCache setIndexIndexTagTagHit rateHit rateMiss rateMiss rateReplacement policyReplacement policy3General Org of a Cache Memory•!•!• B–110•!•!• B–110validvalidtagtagset 0:B = 2b bytesper cache blockE lines per setS = 2s setst tag bitsper line1 valid bitper lineCache size: C = B x E x S data bytes•!•!••!•!• B–110•!•!• B–110validvalidtagtagset 1:•!•!••!•!• B–110•!•!• B–110validvalidtagtagset S-1:•!•!••!•!•Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.4Addressing Cachest bits s bitsb bits0m-1<tag> <set index> <block offset>Address A:•!•!• B–110•!•!• B–110vvtagtagset 0:•!•!••!•!• B–110•!•!• B–110vvtagtagset 1:•!•!••!•!• B–110•!•!• B–110vvtagtagset S-1:•!•!••!•!•The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.5Direct-Mapped CacheSimplest kind of cacheSimplest kind of cacheCharacterized by exactly one line per set.Characterized by exactly one line per set.validvalidvalidtagtagtag•!•!•set 0:set 1:set S-1:E=1 lines per setcache blockcache blockcache block6Accessing Direct-Mapped CachesSet selectionSet selection Use the set index bits to determine the set of interest.validvalidvalidtagtagtag•!•!•set 0:set 1:set S-1:t bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected setcache blockcache blockcache block7Accessing Direct-Mapped CachesLine matching and word selectionLine matching and word selection Line matching: Find a valid line in the selected set with amatching tag Word selection: Then extract the word1t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):(3) If (1) and (2), then cache hit,and block offset selectsstarting byte. =1?(1) The valid bit must be set= ?(2) The tag bits in the cacheline must match thetag bits in the address0110 w3w0w1w230 1 2 74 5 68Direct-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block,S=4 sets, E=1 entry/setAddress trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]xt=1 s=2 b=1xx x1 0 m[1] m[0]v tag data0 [00002] (miss)(1)1 0 m[1] m[0]v tag data1 1 m[13] m[12]13 [11012] (miss)(3)1 1 m[9] m[8]v tag data8 [10002] (miss)(4)1 0 m[1] m[0]v tag data1 1 m[13] m[12]0 [00002] (miss)(5)0 M[0-1]11 M[12-13]11 M[8-9]11 M[12-13]10 M[0-1]11 M[12-13]10 M[0-1]19Why Use Middle Bits as Index?High-Order Bit IndexingHigh-Order Bit Indexing Adjacent memory lines would mapto same cache entry Poor use of spatial localityMiddle-Order Bit IndexingMiddle-Order Bit Indexing Consecutive memory lines map todifferent cache lines Can hold C-byte region of addressspace in cache at one time4-line CacheHigh-OrderBit IndexingMiddle-OrderBit Indexing000110110000000100100011010001010110011110001001101010111100110111101111000000010010001101000101011001111000100110101011110011011110111110Set Associative CachesCharacterized by more than one line per setCharacterized by more than one line per setvalid tagset 0:E=2 lines per setset 1:set S-1:•!•!•cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache block11Accessing Set Associative CachesSet selectionSet selection identical to direct-mapped cachevalidvalidtagtagset 0:validvalidtagtagset 1:validvalidtagtagset S-1:•!•!•t bits s bits0 0 0 0 10m-1b bitstag set index block offsetSelected setcache blockcache blockcache blockcache blockcache blockcache block12Accessing Set Associative CachesLine matching and word selectionLine matching and word selection must compare the tag in each valid line in the selected set.1 0110 w3w0w1w21 1001t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):=1?(1) The valid bit must be set.= ?(2) The tag bits in oneof the cache lines mustmatch the tag bits inthe address(3) If (1) and (2), thencache hit, and block offset selectsstarting byte.30 1 2 74 5 613Cache Performance MetricsMiss RateMiss Rate Fraction of memory references not found in cache(misses/references) Typical numbers: 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc.Hit TimeHit Time Time to deliver a line in the cache to the processor (includestime to determine whether the line is in the cache) Typical numbers: 1-3 clock cycle for L1 5-12 clock cycles for L2Miss PenaltyMiss Penalty Additional time required because of a miss Typically 100-300 cycles for main memory14Memory System PerformanceAssume 1-level cache, 90% hit rate, 1 cycle hitAssume 1-level cache, 90% hit rate, 1 cycle hittime, 200 cycle miss penaltytime, 200 cycle miss penaltyAMAT = 21 cycles!!! - even though 90% only takeAMAT = 21 cycles!!! - even though 90% only takeone cycleone cycle! Taccess= (1" pmiss)thit+ pmisstmiss! tmiss= thit+ tpenaltyAverage Memory Access Time (AMAT)Average Memory Access Time (AMAT)15! CPI = 1.0 + lp + mp + rpMemory System Performance - IIHow does AMAT affect overall performance?How does AMAT affect overall performance?Recall the CPI equation (pipeline efficiency)Recall the CPI equation (pipeline efficiency) load/use penalty (lp) assumed memory access of 1 cycle Further - we assumed that all load instructions were 1 cycle More realistic AMAT (20+ cycles), really hurts CPI and overallperformance1.981.9821+121+10.30.30.300.30lplpLoad/UseLoad/Use6.616.61Total penaltyTotal penalty0.060.06331.01.00.020.02rprpReturnReturn0.160.16220.40.40.200.20mpmpMispredictMispredict4.414.4121210.70.70.300.30lplpLoadLoadProductProductStallsStallsConditionConditionFrequencyFrequencyInstructionInstructionFrequencyFrequencyNameNameCauseCause16! Taccess= (1" pmiss)thit+ pmisstmiss! tmiss= thit+ tpenaltyMemory System Performance - IIIHow to reduce AMAT?How to reduce AMAT? Reduce miss rate Reduce miss penalty Reduce hit timeThere have been numerous inventions targeting each ofThere have been numerous inventions targeting each ofthesethese17int sumarrayrows(int a[M][N]){ int i, j, sum = 0; for (i = 0; i < M; i++)


View Full Document

UT CS 429H - Cache Organization

Download Cache Organization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache Organization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache Organization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?