UT CS 429H - Cache Organization - D57707

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 429H> Cache Organization

UT CS 429H - Cache Organization

School name University of Texas at Austin

Course Cs 429h- Computer Organization and Architecture: Honors

Pages 19

Download Save

Unformatted text preview:

Cache OrganizationTopicsTopics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on programmingSystems I2Cache VocabularyCapacityCapacityCache block (Cache block (aka aka cache line)cache line)AssociativityAssociativityCache setCache setIndexIndexTagTagHit rateHit rateMiss rateMiss rateReplacement policyReplacement policy3General Org of a Cache Memory•!•!• B–110•!•!• B–110validvalidtagtagset 0:B = 2b bytesper cache blockE lines per setS = 2s setst tag bitsper line1 valid bitper lineCache size: C = B x E x S data bytes•!•!••!•!• B–110•!•!• B–110validvalidtagtagset 1:•!•!••!•!• B–110•!•!• B–110validvalidtagtagset S-1:•!•!••!•!•Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.4Addressing Cachest bits s bitsb bits0m-1<tag> <set index> <block offset>Address A:•!•!• B–110•!•!• B–110vvtagtagset 0:•!•!••!•!• B–110•!•!• B–110vvtagtagset 1:•!•!••!•!• B–110•!•!• B–110vvtagtagset S-1:•!•!••!•!•The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.5Direct-Mapped CacheSimplest kind of cacheSimplest kind of cacheCharacterized by exactly one line per set.Characterized by exactly one line per set.validvalidvalidtagtagtag•!•!•set 0:set 1:set S-1:E=1 lines per setcache blockcache blockcache block6Accessing Direct-Mapped CachesSet selectionSet selection Use the set index bits to determine the set of interest.validvalidvalidtagtagtag•!•!•set 0:set 1:set S-1:t bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected setcache blockcache blockcache block7Accessing Direct-Mapped CachesLine matching and word selectionLine matching and word selection Line matching: Find a valid line in the selected set with amatching tag Word selection: Then extract the word1t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):(3) If (1) and (2), then cache hit,and block offset selectsstarting byte. =1?(1) The valid bit must be set= ?(2) The tag bits in the cacheline must match thetag bits in the address0110 w3w0w1w230 1 2 74 5 68Direct-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block,S=4 sets, E=1 entry/setAddress trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]xt=1 s=2 b=1xx x1 0 m[1] m[0]v tag data0 [00002] (miss)(1)1 0 m[1] m[0]v tag data1 1 m[13] m[12]13 [11012] (miss)(3)1 1 m[9] m[8]v tag data8 [10002] (miss)(4)1 0 m[1] m[0]v tag data1 1 m[13] m[12]0 [00002] (miss)(5)0 M[0-1]11 M[12-13]11 M[8-9]11 M[12-13]10 M[0-1]11 M[12-13]10 M[0-1]19Why Use Middle Bits as Index?High-Order Bit IndexingHigh-Order Bit Indexing Adjacent memory lines would mapto same cache entry Poor use of spatial localityMiddle-Order Bit IndexingMiddle-Order Bit Indexing Consecutive memory lines map todifferent cache lines Can hold C-byte region of addressspace in cache at one time4-line CacheHigh-OrderBit IndexingMiddle-OrderBit Indexing000110110000000100100011010001010110011110001001101010111100110111101111000000010010001101000101011001111000100110101011110011011110111110Set Associative CachesCharacterized by more than one line per setCharacterized by more than one line per setvalid tagset 0:E=2 lines per setset 1:set S-1:•!•!•cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache blockvalid tag cache block11Accessing Set Associative CachesSet selectionSet selection identical to direct-mapped cachevalidvalidtagtagset 0:validvalidtagtagset 1:validvalidtagtagset S-1:•!•!•t bits s bits0 0 0 0 10m-1b bitstag set index block offsetSelected setcache blockcache blockcache blockcache blockcache blockcache block12Accessing Set Associative CachesLine matching and word selectionLine matching and word selection must compare the tag in each valid line in the selected set.1 0110 w3w0w1w21 1001t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):=1?(1) The valid bit must be set.= ?(2) The tag bits in oneof the cache lines mustmatch the tag bits inthe address(3) If (1) and (2), thencache hit, and block offset selectsstarting byte.30 1 2 74 5 613Cache Performance MetricsMiss RateMiss Rate Fraction of memory references not found in cache(misses/references) Typical numbers: 3-10% for L1 can be quite small (e.g., < 1%) for L2, depending on size, etc.Hit TimeHit Time Time to deliver a line in the cache to the processor (includestime to determine whether the line is in the cache) Typical numbers: 1-3 clock cycle for L1 5-12 clock cycles for L2Miss PenaltyMiss Penalty Additional time required because of a miss Typically 100-300 cycles for main memory14Memory System PerformanceAssume 1-level cache, 90% hit rate, 1 cycle hitAssume 1-level cache, 90% hit rate, 1 cycle hittime, 200 cycle miss penaltytime, 200 cycle miss penaltyAMAT = 21 cycles!!! - even though 90% only takeAMAT = 21 cycles!!! - even though 90% only takeone cycleone cycle! Taccess= (1" pmiss)thit+ pmisstmiss! tmiss= thit+ tpenaltyAverage Memory Access Time (AMAT)Average Memory Access Time (AMAT)15! CPI = 1.0 + lp + mp + rpMemory System Performance - IIHow does AMAT affect overall performance?How does AMAT affect overall performance?Recall the CPI equation (pipeline efficiency)Recall the CPI equation (pipeline efficiency) load/use penalty (lp) assumed memory access of 1 cycle Further - we assumed that all load instructions were 1 cycle More realistic AMAT (20+ cycles), really hurts CPI and overallperformance1.981.9821+121+10.30.30.300.30lplpLoad/UseLoad/Use6.616.61Total penaltyTotal penalty0.060.06331.01.00.020.02rprpReturnReturn0.160.16220.40.40.200.20mpmpMispredictMispredict4.414.4121210.70.70.300.30lplpLoadLoadProductProductStallsStallsConditionConditionFrequencyFrequencyInstructionInstructionFrequencyFrequencyNameNameCauseCause16! Taccess= (1" pmiss)thit+ pmisstmiss! tmiss= thit+ tpenaltyMemory System Performance - IIIHow to reduce AMAT?How to reduce AMAT? Reduce miss rate Reduce miss penalty Reduce hit timeThere have been numerous inventions targeting each ofThere have been numerous inventions targeting each ofthesethese17int sumarrayrows(int a[M][N]){ int i, j, sum = 0; for (i = 0; i < M; i++)

View Full Document


School:
Email:
New Password:
Confirm Password:

UT CS 429H - Cache Organization

Sign up for free to view:

Please select your school