DOC PREVIEW
CORNELL CS 3410 - Study Notes

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1AnnouncementsGoals for Today: cachesCache PerformanceMissesAvoiding MissesSlide 7MissesAvoiding MissesThree common designsA Simple Fully Associative CacheFully Associative Cache (Reading)Fully Associative Cache SizeSlide 14Slide 15MissesSummaryCache TradeoffsSlide 19Compromise2-Way Set Associative Cache3-Way Set Associative Cache (Reading)A Simple 2-Way Set Associative CacheComparing CachesRemaining IssuesEvictionSlide 27Performance ComparisonCache DesignA Real ExampleA Real ExampleBasic Cache OrganizationExperimental ResultsTradeoffsSlide 35Cached Write PoliciesWrite Allocation PoliciesA Simple 2-Way Set Associative CacheHow Many Memory References?A Simple 2-Way Set Associative CacheHow Many Memory References?Write-Back Meta-DataPerformance: An ExamplePerformance: An ExamplePerformance TradeoffsWrite BufferingWrite BufferingWrite-through vs. Write-backCache-coherencySlide 50Cache Conscious ProgrammingCache Conscious ProgrammingSummarySummaryCachesHakim WeatherspoonCS 3410, Spring 2011Computer ScienceCornell UniversitySee P&H 5.2 (writes), 5.3, 5.52AnnouncementsHW3 available due next Tuesday •HW3 has been updated. Use updated version.•Work with alone•Be responsible with new knowledgeUse your resources•FAQ, class notes, book, Sections, office hours, newsgroup, CSUGLabNext six weeks•Two homeworks and two projects•Optional prelim1 has been graded•Prelim2 will be Thursday, April 28th •PA4 will be final project (no final exam)3Goals for Today: cachesCaches vs memory vs tertiary storage•Tradeoffs: big & slow vs small & fast–Best of both worlds•working set: 90/10 rule•How to predict future: temporal & spacial localityCache organization, parameters and tradeoffsassociativity, line size, hit cost, miss penalty, hit rate•Fully Associative  higher hit cost, higher hit rate•Larger block size  lower hit cost, higher miss penalty4Cache PerformanceCache Performance (very simplified): L1 (SRAM): 512 x 64 byte cache lines, direct mappedData cost: 3 cycle per word accessLookup cost: 2 cycle Mem (DRAM): 4GBData cost: 50 cycle per word, plus 3 cycle per consecutive wordPerformance depends on:Access time for hit, miss penalty, hit rate5MissesCache misses: classificationThe line is being referenced for the first time•Cold (aka Compulsory) MissThe line was in the cache, but has been evicted6Avoiding MissesQ: How to avoid…Cold Misses•Unavoidable? The data was never in the cache…•Prefetching!Other Misses•Buy more SRAM•Use a more flexible cache design7Bigger cache doesn’t always help…Mem access trace: 0, 16, 1, 17, 2, 18, 3, 19, 4, …Hit rate with four direct-mapped 2-byte cache lines?With eight 2-byte cache lines?With four 4-byte cache lines?01234567891011121314151617181920218MissesCache misses: classificationThe line is being referenced for the first time•Cold (aka Compulsory) MissThe line was in the cache, but has been evicted…… because some other access with the same index•Conflict Miss… because the cache is too small•i.e. the working set of program is larger than the cache•Capacity Miss9Avoiding MissesQ: How to avoid…Cold Misses•Unavoidable? The data was never in the cache…•Prefetching!Capacity Misses•Buy more SRAMConflict Misses•Use a more flexible cache design10Three common designsA given data block can be placed…•… in any cache line  Fully Associative•… in exactly one cache line  Direct Mapped•… in a small set of cache lines  Set Asociative11MemoryFully AssociativeCacheProcessorA Simple Fully Associative Cachelb $1  M[ 1 ]lb $2  M[ 13 ]lb $3  M[ 0 ]lb $3  M[ 6 ]lb $2  M[ 5 ]lb $2  M[ 6 ]lb $2  M[ 10 ]lb $2  M[ 12 ]V tag data$1$2$3$4Using byte addresses in this example! Addr Bus = 5 bits0 1011 1032 1073 1094 1135 1276 1317 1378 1399 14910 15111 15712 16313 16714 17315 17916 181Hits: Misses:A =12Fully Associative Cache (Reading)V Tag Blockword selecthit?dataline select= = = =32bits64bytesTag Offset13Fully Associative Cache Sizem bit offsetQ: How big is cache (data only)?Q: How much SRAM needed (data + overhead)?Tag Offset, 2n cache lines14Fully-associative reduces conflict misses...… assuming good eviction strategyMem access trace: 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, …Hit rate with four fully-associative 2-byte cache lines?012345678910111213141516171819202115… but large block size can still reduce hit ratevector add trace: 0, 100, 200, 1, 101, 201, 2, 202, …Hit rate with four fully-associative 2-byte cache lines?With two fully-associative 4-byte cache lines?16MissesCache misses: classificationCold (aka Compulsory)•The line is being referenced for the first timeCapacity•The line was evicted because the cache was too small•i.e. the working set of program is larger than the cacheConflict•The line was evicted because of another access whose index conflicted17SummaryCaching assumptions•small working set: 90/10 rule•can predict future: spatial & temporal localityBenefits•big & fast memory built from (big & slow) + (small & fast)Tradeoffs: associativity, line size, hit cost, miss penalty, hit rate•Fully Associative  higher hit cost, higher hit rate•Larger block size  lower hit cost, higher miss penaltyNext up: other designs; writing to caches18Cache TradeoffsDirect Mapped+ Smaller+ Less+ Less+ Faster+ Less+ Very– Lots– Low– CommonFully AssociativeLarger –More –More –Slower –More –Not Very –Zero +High +?Tag SizeSRAM OverheadController LogicSpeedPriceScalability# of conflict missesHit ratePathological Cases?19Set Associative Caches20CompromiseSet Associative Cache•Each block number mapped to a singlecache line set index•Within the set, blockcan go in any lineset 0line 0line 1line 2set 1line 3line 4line 50x0000000x0000040x0000080x00000c0x0000100x0000140x0000180x00001c0x0000200x0000240x00002c0x0000300x0000340x0000380x00003c0x0000400x0000440x0000480x00004c212-Way Set Associative CacheSet Associative CacheLike direct mapped cache•Only need to check a few lines for each access…so: fast, scalable, low overheadLike a fully associative cache•Several places each block can go…so: fewer conflict misses, higher hit rate223-Way Set Associative Cache (Reading)word selecthit? dataline select= = =32bits64bytesTag Index Offset23Memory2-Way Set AssociativeCacheProcessorA Simple 2-Way Set Associative Cachelb $1  M[ 1 ]lb $2  M[ 13 ]lb $3  M[ 0 ]lb $3  M[ 6 ]lb $2  M[ 5 ]lb $2  M[ 6


View Full Document

CORNELL CS 3410 - Study Notes

Documents in this Course
Marra

Marra

43 pages

Caches

Caches

34 pages

ALUs

ALUs

5 pages

Caches!

Caches!

54 pages

Memory

Memory

41 pages

Caches

Caches

32 pages

Caches

Caches

54 pages

Caches

Caches

34 pages

Caches

Caches

54 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?