DOC PREVIEW
CORNELL CS 3410 - Caches

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1AnnouncementsGoals for Today: cachesPerformanceMemory PyramidMemory HierarchyActive vs Inactive DataInsight of CachesLocalityLocalityMemory HierarchyCache Lookups (Read)Cache OrganizationThree common designsDirect Mapped CacheDirect Mapped CacheTags and OffsetsA Simple Direct Mapped CacheDirect Mapped Cache (Reading)Direct Mapped Cache SizeCache PerformanceMissesAvoiding MissesSlide 24MissesAvoiding MissesThree common designsA Simple Fully Associative CacheFully Associative Cache (Reading)Fully Associative Cache SizeSlide 31Slide 32MissesSummaryCachesHakim WeatherspoonCS 3410, Spring 2011Computer ScienceCornell UniversitySee P&H 5.1, 5.2 (except writes)2AnnouncementsHW3 available due next Tuesday •Work with alone partner•Be responsible with new knowledgeUse your resources•FAQ, class notes, book, Sections, office hours, newsgroup, CSUGLabNext six weeks•Two homeworks and two projects•Optional prelim1 tomorrow, Wednesday, in Philips 101•Prelim2 will be Thursday, April 28th •PA4 will be final project (no final exam)3Goals for Today: cachesCaches vs memory vs tertiary storage•Tradeoffs: big & slow vs small & fast–Best of both worlds•working set: 90/10 rule•How to predict future: temporal & spacial localityCache organization, parameters and tradeoffsassociativity, line size, hit cost, miss penalty, hit rate•Fully Associative  higher hit cost, higher hit rate•Larger block size  lower hit cost, higher miss penalty4PerformanceCPU clock rates ~0.2ns – 2ns (5GHz-500MHz)Technology Capacity $/GB LatencyTape 1 TB $.17 100s of secondsDisk 2 TB $.03 Millions of cycles (ms)SSD (Flash) 128 GB $2 Thousands of cycles (us)DRAM 8 GB $10 (10s of ns)SRAM off-chip 8 MB $4000 5-15 cycles (few ns)SRAM on-chip 256 KB ??? 1-3 cycles (ns)Others: eDRAM aka 1-T SRAM, FeRAM, CD, DVD, …Q: Can we create illusion of cheap + large + fast?50-300 cycles5Memory PyramidDisk (Many GB – few TB)Memory (128MB – few GB)L2 Cache (½-32MB)RegFile100s bytesMemory Pyramid< 1 cycle access1-3 cycle access5-15 cycle access50-300 cycle accessL3 becoming more common(eDRAM ?)These are rough numbers: mileage may vary for latest/greatestCaches usually made of SRAM (or eDRAM)L1 Cache(several KB)1000000+ cycle access6Memory HierarchyMemory closer to processor •small & fast•stores active dataMemory farther from processor •big & slow•stores inactive data7Active vs Inactive DataAssumption: Most data is not active.Q: How to decide what is active?A: Some committee decidesA: Programmer decidesA: Compiler decidesA: OS decides at run-timeA: Hardware decidesat run-time8Insight of CachesQ: What is “active” data?If Mem[x] is was accessed recently...… then Mem[x] is likely to be accessed soon•Exploit temporal locality:… then Mem[x ± ε] is likely to be accessed soon•Exploit spatial locality:9LocalityMemory trace0x7c9a2b180x7c9a2b190x7c9a2b1a0x7c9a2b1b0x7c9a2b1c0x7c9a2b1d0x7c9a2b1e0x7c9a2b1f0x7c9a2b200x7c9a2b210x7c9a2b220x7c9a2b230x7c9a2b280x7c9a2b2c0x0040030c0x004003100x7c9a2b040x004003140x7c9a2b000x004003180x0040031c...int n = 4;int k[] = { 3, 14, 0, 10 };int fib(int i) {if (i <= 2) return i;else return fib(i-1)+fib(i-2);}int main(int ac, char **av) {for (int i = 0; i < n; i++) {printi(fib(k[i]));prints("\n"); }}10Locality0x000000000x7c9a2b1f0x0040031811Memory HierarchyMemory closer to processor is fast but small•usually stores subset of memory farther away–“strictly inclusive”•alternatives:–strictly exclusive–mostly inclusive•Transfer whole blocks(cache lines):4kb: disk ↔ ram256b: ram ↔ L264b: L2 ↔ L112Cache Lookups (Read)Processor tries to access Mem[x]Check: is block containing Mem[x] in the cache?•Yes: cache hit–return requested data from cache line•No: cache miss–read block from memory (or lower level cache)–(evict an existing cache line to make room)–place new block in cache–return requested data and stall the pipeline while all of this happens13Cache OrganizationCache has to be fast and dense•Gain speed by performing lookups in parallel–but requires die real estate for lookup logic•Reduce lookup logic by limiting where in the cache a block might be placed–but might reduce cache effectivenessCache ControllerCPU14Three common designsA given data block can be placed…•… in any cache line  Fully Associative•… in exactly one cache line  Direct Mapped•… in a small set of cache lines  Set Associative15Direct Mapped CacheDirect Mapped Cache•Each block number mapped to a singlecache line index•Simplest hardwareline 0line 10x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x00004816Direct Mapped CacheDirect Mapped Cache•Each block number mapped to a singlecache line index•Simplest hardwareline 0line 1line 2line 30x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x0000280x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x00004817Tags and OffsetsAssume sixteen 64-byte cache lines0x7FFF3D4D = 0111 1111 1111 1111 0011 1101 0100 1101Need meta-data for each cache line:•valid bit: is the cache line non-empty?•tag: which block is stored in this line (if valid)Q: how to check if X is in the cache?Q: how to clear a cache line?18MemoryDirect MappedCacheProcessorA Simple Direct Mapped Cachelb $1  M[ 1 ]lb $2  M[ 13 ]lb $3  M[ 0 ]lb $3  M[ 6 ]lb $2  M[ 5 ]lb $2  M[ 6 ]lb $2  M[ 10 ]lb $2  M[ 12 ]V tag data$1$2$3$4Using byte addresses in this example! Addr Bus = 5 bits0 1011 1032 1073 1094 1135 1276 1317 1378 1399 14910 15111 15712 16313 16714 17315 17916 1810 1011 1032 1073 1094 1135 1276 1317 1378 1399 14910 15111 15712 16313 16714 17315 17916 181Hits: Misses:A =19Direct Mapped Cache (Reading)V Tag BlockTag Index Offset20Direct Mapped Cache Sizen bit index, m bit offsetQ: How big is cache (data only)?Q: How much SRAM needed (data + overhead)?Tag Index Offset21Cache PerformanceCache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mappedData cost: 3 cycle per word accessLookup cost: 2 cycle Mem (DRAM): 4GBData cost: 50 cycle per word, plus 3 cycle per consecutive wordPerformance depends on:Access time for hit, miss penalty, hit rate22MissesCache misses: classificationThe line is being referenced for the first time•Cold (aka Compulsory) MissThe line was in the cache, but has been evicted23Avoiding


View Full Document

CORNELL CS 3410 - Caches

Documents in this Course
Marra

Marra

43 pages

Caches

Caches

34 pages

ALUs

ALUs

5 pages

Caches!

Caches!

54 pages

Memory

Memory

41 pages

Caches

Caches

32 pages

Caches

Caches

54 pages

Caches

Caches

54 pages

Load more
Download Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?