Harvey Mudd CS 105 - Cache Memories

Unformatted text preview:

Cache MemoriesNew Topic: CacheSlide 3Inserting an L1 Cache Between the CPU and Main MemoryGeneral Org of a Cache MemoryAddressing CachesDirect-Mapped CacheAccessing Direct-Mapped CachesSlide 9Direct-Mapped Cache SimulationWhy Use Middle Bits as Index?Set-Associative CachesAccessing Set Associative CachesSlide 14Multi-Level CachesIntel Pentium Cache HierarchyCache Performance MetricsWrite StrategiesWriting Cache-Friendly CodeThe Memory MountainMemory Mountain Test FunctionMemory Mountain Main RoutineSlide 23Ridges of Temporal LocalityA Slope of Spatial LocalityMatrix-Multiplication ExampleMiss-Rate Analysis for Matrix MultiplyLayout of C Arrays in Memory (review)Matrix Multiplication (ijk)Matrix Multiplication (jik)Matrix Multiplication (kij)Matrix Multiplication (ikj)Matrix Multiplication (jki)Matrix Multiplication (kji)Summary of Matrix MultiplicationPentium Matrix Multiply PerformanceImproving Temporal Locality by BlockingBlocked Matrix Multiply (bijk)Blocked Matrix Multiply AnalysisPentium Blocked Matrix Multiply PerformanceConcluding ObservationsCache MemoriesCache MemoriesTopicsTopicsGeneric cache memory organizationDirect mapped cachesSet associative cachesImpact of caches on performancecache.pptCS 105Tour of the Black Holes of Computing– 2 –CS105New Topic: CacheNew Topic: CacheBuffer, between processor and memoryBuffer, between processor and memoryOften several levels of cachesSmall but fastSmall but fastOld values will be removed from cache to make space for new valuesCapitalizes on spatial locality and temporal localityCapitalizes on spatial locality and temporal localitySpatial locality: If a value is used, nearby values are likely to be usedTemporal locality: If a value is used, it is likely to be used again soon.Parameters vary by system; unknown to programmerParameters vary by system; unknown to programmer““Cache friendly” codeCache friendly” code– 3 –CS105Cache MemoriesCache MemoriesCache memories are small, fast SRAM-based memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. managed automatically in hardware. Hold frequently accessed blocks of main memoryCPU looks first for data in L1, then in L2, then in main CPU looks first for data in L1, then in L2, then in main memory.memory.Typical bus structure:Typical bus structure:mainmemoryI/Obridgebus interfaceL2 cacheALUregister fileCPU chipcache bus system bus memory busL1 cache– 4 –CS105Inserting an L1 Cache Between the CPU and Main MemoryInserting an L1 Cache Between the CPU and Main Memorya b c dblock 10p q r sblock 21......w x y zblock 30...The big slow main memoryhas room for many 4-wordblocks.The small fast L1 cache has roomfor two 4-word blocks.The tiny, very fast CPU register filehas room for four 4-byte words.The transfer unit betweenthe cache and main memory is a 4-word block(16 bytes).The transfer unit betweenthe CPU register file and the cache is a 4-byte block.line 0line 1– 5 –CS105General Org of a Cache MemoryGeneral Org of a Cache Memory••• B–110••• B–110validvalidtagtagset 0:B = 2b bytesper cache blockE lines per setS = 2s setst tag bitsper line1 valid bitper lineCache size: C = B x E x S data bytes•••••• B–110••• B–110validvalidtagtagset 1:•••••• B–110••• B–110validvalidtagtagset S-1:••••••Cache is an arrayof sets.Each set containsone or more lines.Each line holds ablock of data.– 6 –CS105Addressing CachesAddressing Cachest bits s bitsb bits0m-1<tag> <set index> <block offset>Address A:••• B–110••• B–110vvtagtagset 0:•••••• B–110••• B–110vvtagtagset 1:•••••• B–110••• B–110vvtagtagset S-1:••••••The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.The word contents begin at offset <block offset> bytes from the beginning of the block.– 7 –CS105Direct-Mapped CacheDirect-Mapped CacheSimplest kind of cacheSimplest kind of cacheCharacterized by exactly one line per set.Characterized by exactly one line per set.validvalidvalidtagtagtag•••set 0:set 1:set S-1:E=1 lines per setcache blockcache blockcache block– 8 –CS105Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesSet selectionSet selectionUse the set index bits to determine the set of interest.validvalidvalidtagtagtag•••set 0:set 1:set S-1:t bits s bits0 0 0 0 10m-1b bitstag set index block offsetselected setcache blockcache blockcache block– 9 –CS105Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesLine matching and word selectionLine matching and word selectionLine matching: Find a valid line in the selected set with a matching tagWord selection: Then extract the word1t bits s bits100i01100m-1b bitstag set index block offsetselected set (i):(3) If (1) and (2), then cache hit,and block offset selectsstarting byte. =1?(1) The valid bit must be set= ?(2) The tag bits in the cacheline must match thetag bits in the address0110w3w0w1w230 1 2 74 5 6– 10 –CS105Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/setAddress trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]xt=1 s=2 b=1xx x1 0 m[1] m[0]v tag data0 [00002] (miss)(1)1 0 m[1] m[0]v tag data1 1 m[13] m[12]13 [11012] (miss)(3)1 1 m[9] m[8]v tag data8 [10002] (miss)(4)1 0 m[1] m[0]v tag data1 1 m[13] m[12]0 [00002] (miss)(5)0 M[0-1]11 M[12-13]11 M[8-9]11 M[12-13]10 M[0-1]11 M[12-13]10 M[0-1]1– 11 –CS105Why Use Middle Bits as Index?Why Use Middle Bits as Index?High-Order Bit IndexingHigh-Order Bit IndexingAdjacent memory lines would map to same cache entryPoor use of spatial localityMiddle-Order Bit IndexingMiddle-Order Bit IndexingConsecutive memory lines map to different cache linesCan hold C-byte region of address space in cache at one time4-line CacheHigh-OrderBit IndexingMiddle-OrderBit Indexing0001101100000001001000110100010101100111100010011010101111001101111011110000000100100011010001010110011110001001101010111100110111101111– 12 –CS105Set-Associative CachesSet-Associative CachesCharacterized by more than one line per setCharacterized by more than one line per setvalid tagset 0:E=2 lines


View Full Document

Harvey Mudd CS 105 - Cache Memories

Documents in this Course
Processes

Processes

25 pages

Processes

Processes

27 pages

Load more
Download Cache Memories
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Cache Memories and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Cache Memories 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?