DOC PREVIEW
CMU CS 15740 - lecture

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 47 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Memory Hierarchy CS 740 Sept. 29, 2000Computer SystemThe TradeoffWhy is bigger slower?Alpha 21164 Chip PhotoAlpha 21164 Chip CachesLocality of ReferenceCaching: The Basic IdeaHow important are caches?Accessing Data in Memory HierarchyDesign Issues for CachesDirect-Mapped CachesIndexing into Direct-Mapped CacheDirect-Mapped Cache Tag MatchingProperties of Direct Mapped CachesVector Product ExampleThrashing ExampleThrashing Example: Good CaseThrashing Example: Bad CaseSet Associative CacheIndexing into 2-Way Associative CacheAssociative Cache Tag MatchingTwo-Way Set Associative Cache ImplementationFully Associative CacheFully Associative Cache Tag MatchingReplacement AlgorithmsImplementing RAND and FIFOWrite PolicyWrite Policy (Cont.)Write BufferingMulti-Level CachesBandwidth MatchingHigh Bandwidth Memory SystemsCache Performance MetricsImpact of Cache and Block SizeImpact of AssociativityImpact of Replacement StrategyImpact of Write StrategyAllocation StrategiesAllocation Strategies (Cont.)Qualitative Cache Performance ModelInteractions Between Program & CacheMatmult Performance (Alpha 21164)Block Matrix MultiplicationBlocked Matrix Multiply (bijk)Blocked Matrix Multiply AnalysisBlocked matmult perf (Alpha 21164)The Memory HierarchyCS 740Sept. 29, 2000Topics•The memory hierarchy•Cache designCS 740 F’00– 2 –Computer SystemdiskDiskdiskDiskMemory-I/O busMemory-I/O busProcessorProcessorCacheCacheMemoryMemoryI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerI/OcontrollerDisplayDisplayNetworkNetworkinterruptsCS 740 F’00– 3 –The TradeofCPUCPUregsregsCacheMemoryMemorydiskdisksize:speed:$/Mbyte:block size:608 B1.4 ns4 Bregister referenceL2-cachereferencememoryreferencedisk memoryreference512kB -- 4MB16.8 ns$90/MB16 B128 MB112 ns$2-6/MB4-8 KB27GB9 ms$0.01/MBlarger, slower, cheaper16 B 8 B 4 KBcache virtual memoryCache128k B4.2 ns4 BL1-cachereference(Numbers are for a 21264 at 700MHz)CS 740 F’00– 4 –Why is bigger slower?•Physics slows us down•Racing the speed of light: (3.0x10^8m/s)•clock = 500MHz•how far can I go in a clock cycle?•(3.0x10^8 m/s) / (500x10^6 cycles/s) = 0.6m/cycle•For comparison: 21264 is about 17mm across•Capacitance:•long wires have more capacitance•either more powerful (bigger) transistors required, or slower•signal propagation speed proportional to capacitance•going “of chip” has an order of magnitude more capacitanceCS 740 F’00– 5 –Alpha 21164 Chip PhotoMicroprocessor Report 9/12/94Caches:L1 dataL1 instructionL2 unified+ L3 of-chipCS 740 F’00– 6 –Alpha 21164 Chip CachesCaches:L1 dataL1 instructionL2 unified+ L3 of-chipRight HalfL2Right HalfL2L1Instr.L1DataL2TagsL3 ControlCS 740 F’00– 7 –Locality of ReferencePrinciple of Locality:•Programs tend to reuse data and instructions near those they have used recently.•Temporal locality: recently referenced items are likely to be referenced in the near future.•Spatial locality: items with nearby addresses tend to be referenced close together in time.sum = 0;for (i = 0; i < n; i++)sum += a[i];*v = sum;Locality in Example:•Data–Reference array elements in succession (spatial)•Instructions–Reference instructions in sequence (spatial)–Cycle through loop repeatedly (temporal)CS 740 F’00– 8 –Caching: The Basic IdeaMain Memory•Stores wordsA–Z in exampleCache•Stores subset of the words4 in example•Organized in lines–Multiple words–To exploit spatial localityAccess•Word must be in cache for processor to accessBig, Slow MemoryABC•••YZSmall,Fast CacheABGHProcessorCS 740 F’00– 9 –How important are caches?(Figure from Jim Keller, Compaq Corp.)•21264 Floorplan•Register files in middle of execution units•64k instr cache•64k data cache•Caches take up a large fraction of the dieCS 740 F’00– 10 –•Between any two levels, memory is divided into lines (aka “blocks”)•Data moves between levels on demand, in line-sized chunks•Invisible to application programmer–Hardware responsible for cache operation•Upper-level lines a subset of lower-level linesaabAccess word w in line a (hit)aabAccess word v in line b (miss)wbababvAccessing Data in Memory HierarchyHighLevelLowLevelCS 740 F’00– 11 –Design Issues for CachesKey Questions:•Where should a line be placed in the cache? (line placement)•How is a line found in the cache? (line identification)•Which line should be replaced on a miss? (line replacement)•What happens on a write? (write strategy)Constraints:•Design must be very simple–Hardware realization–All decision making within nanosecond time scale•Want to optimize performance for “typical” programs–Do extensive benchmarking and simulations–Many subtle engineering tradeoffsCS 740 F’00– 12 –Direct-Mapped CachesSimplest Design•Each memory line has a unique cache locationParameters•Line (aka block) size B = 2b–Number of bytes in each line–Typically 2X–8X word size•Number of Sets S = 2s–Number of lines cache can hold•Total Cache Size = B*S = 2b+sPhysical Address•Address used to reference main memory•n bits to reference N = 2n total bytes•Partition into fields–Offset: Lower b bits indicate which byte within line–Set: Next s bits indicate how to locate line within cache–Tag: Identifies this line when in cachen-bit Physical Addresst s btag set index offsetCS 740 F’00– 13 –Indexing into Direct-Mapped Cache•Use set index bits to select cache setSet 0:0 1•••B–1Tag Valid0 1•••B–1Tag Valid0 1•••B–1Tag ValidSet 1:Set S–1:•••t s btag set index offsetPhysical AddressCS 740 F’00– 14 –Direct-Mapped Cache Tag MatchingIdentifying Line•Must have tag match high order bits of address•Must have Valid = 10 1•••B–1Tag ValidSelected Set:t s btag set index offsetPhysical Address= ?= 1?•Lower bits of address select byte or word within cache lineCS 740 F’00– 15 –Properties of Direct Mapped CachesStrength•Minimal control hardware overhead•Simple design•(Relatively) easy to make fastWeakness•Vulnerable to thrashing•Two heavily used lines have same cache index•Repeatedly evict one to make room for otherCache LineCS 740 F’00– 16 –Vector Product ExampleMachine•DECStation 5000•MIPS Processor with 64KB direct-mapped cache, 16 B line sizePerformance•Good case: 24 cycles / element•Bad case: 66 cycles / elementfloat dot_prod(float x[1024], y[1024]){


View Full Document

CMU CS 15740 - lecture

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?