DOC PREVIEW
Berkeley COMPSCI 252 - Lecture Notes - Caches

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1NOW Handout Page 1EECS 252 Graduate Computer ArchitectureLec 12 - CachesDavid CullerElectrical Engineering and Computer SciencesUniversity of California, Berkeleyhttp://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs2521/28/2004 CS252-S05 L12 Caches2Review: Who Cares About the Memory Hierarchy?µProc60%/yr.DRAM7%/yr.110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)Performance“Moore’s Law”• Processor Only Thus Far in Course:– CPU cost/performance, ISA, Pipelined ExecutionCPU-DRAM Gap• 1980: no cache in µproc; 1995 2-level cache on chip(1989 first Intel µproc with a cache on chip)“Less’ Law?”1/28/2004 CS252-S05 L12 Caches3Review: What is a cache?• Small, fast storage used to improve average access time to slow memory.• Exploits spacial and temporal locality• In computer architecture, almost everything is a cache!– Registers a cache on variables– First-level cache a cache on second-level cache– Second-level cache a cache on memory– Memory a cache on disk (virtual memory)– TLB a cache on page table– Branch-prediction a cache on prediction information?Proc/RegsL1-CacheL2-CacheMemoryDisk, Tape, etc.Bigger Faster1/28/2004 CS252-S05 L12 Caches4Review: Terminology• Hit: data appears in some block in the upper level (example: Block X) – Hit Rate: the fraction of memory access found in the upper level– Hit Time: Time to access the upper level which consists ofRAM access time + Time to determine hit/miss• Miss: data needs to be retrieve from a block in the lower level (Block Y)– Miss Rate = 1 - (Hit Rate)– Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor• Hit Time << Miss Penalty (500 instructions on 21264!)Lower LevelMemoryUpper LevelMemoryTo ProcessorFrom ProcessorBlk XBlk Y1/28/2004 CS252-S05 L12 Caches5Why it works• Exploit the statistical properties of programs• Locality of reference– Temporal– Spatial• Simple hardware structure that observes program behavior and reacts to improve future performance• Is the cache visible in the ISA?yMissPenaltMissRateHitTimeAMAT ×+=()()DataDataDataInstInstInstyMissPenaltMissRateHitTimeyMissPenaltMissRateHitTime×++×+= addressP(access,t)Average Memory Access Time1/28/2004 CS252-S05 L12 Caches6Block Placement• Q1: Where can a block be placed in the upper level? – Fully Associative, – Set Associative, – Direct Mapped2NOW Handout Page 21/28/2004 CS252-S05 L12 Caches71 KB Direct Mapped Cache, 32B blocks• For a 2 ** N byte cache:– The uppermost (32 - N) bits are always the Cache Tag– The lowest M bits are the Byte Select (Block Size = 2 ** M)Cache Index0123:Cache DataByte 00431:Cache Tag Example: 0x50Ex: 0x010x50Stored as partof the cache “state”Valid Bit:31Byte 1Byte 31:Byte 32Byte 33Byte 63:Byte 992Byte 1023:Cache TagByte SelectEx: 0x0091/28/2004 CS252-S05 L12 Caches8Review: Set Associative Cache• N-way set associative: N entries for each Cache Index– N direct mapped caches operates in parallel– How big is the tag?• Example: Two-way set associative cache– Cache Index selects a “set” from the cache– The two tags in the set are compared to the input in parallel– Data is selected based on the tag resultCache DataCache Block 0Cache TagValid:::Cache DataCache Block 0Cache Tag Valid:::Cache IndexMux01Sel1 Sel0Cache BlockCompareAdr TagCompareORHit1/28/2004 CS252-S05 L12 Caches9Q2: How is a block found if it is in the upper level?• Index identifies set of possibilities• Tag on each block– No need to check index or block offset• Increasing associativity shrinks index, expands tagBlockOffsetBlock AddressIndexTagCache size = Associativity * 2index_size* 2offest_size1/28/2004 CS252-S05 L12 Caches10Q3: Which block should be replaced on a miss?• Easy for Direct Mapped• Set Associative or Fully Associative:– Random– LRU (Least Recently Used)Assoc: 2-way 4-way 8-waySize LRU Ran LRU Ran LRU Ran16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%1/28/2004 CS252-S05 L12 Caches11Q4: What happens on a write?• Write through—The information is written to both the block in the cache and to the block in the lower-level memory.• Write back—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.– is block clean or dirty?• Pros and Cons of each?– WT: read misses cannot result in writes– WB: no repeated writes to same location• WT always combined with write buffers so that don’t wait for lower level memory• What about on a miss?– Write_no_allocate vs write_allocate1/28/2004 CS252-S05 L12 Caches12Write Buffer for Write Through• A Write Buffer is needed between the Cache and Memory– Processor: writes data into the cache and the write buffer– Memory controller: write contents of the buffer to memory• Write buffer is just a FIFO:– Typical number of entries: 4– Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycleProcessorCacheWrite BufferDRAM3NOW Handout Page 31/28/2004 CS252-S05 L12 Caches13Review: Cache performanceCycleTimeyMissPenaltMissRateInstMemAccessCPIICCPUtimeExecution×××+×=• Miss-oriented Approach to Memory Access:• Separating out Memory component entirely– AMAT = Average Memory Access Time– Effective CPI = CPIideal_mem+ Pmem* AMATCycleTimeAMATInstMemAccessCPIICCPUtimeAluOps××+×=1/28/2004 CS252-S05 L12 Caches14Impact on Performance• Suppose a processor executes at – Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1 – 50% arith/logic, 30% ld/st, 20% control• Suppose that 10% of memory operations get 50 cycle miss penalty• Suppose that 1% of instructions get same miss penalty• CPI = ideal CPI + average stalls per instruction1.1(cycles/ins) +[ 0.30 (DataMops/ins) x 0.10 (miss/DataMop) x 50 (cycle/miss)] +[ 1 (InstMop/ins) x 0.01 (miss/InstMop) x 50 (cycle/miss)] = (1.1 + 1.5 + .5) cycle/ins = 3.1 • 58% of the time the proc is stalled waiting for memory!• AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.541/28/2004 CS252-S05 L12 Caches15Example: Harvard Architecture• Unified vs Separate I&D (Harvard)• Statistics (given in H&P):– 16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%– 32KB unified:


View Full Document

Berkeley COMPSCI 252 - Lecture Notes - Caches

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lecture Notes - Caches
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes - Caches and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes - Caches 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?