DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 13 – Cache I

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31UC Regents Fall 2005 © UCBCS 152 L13: Cache I2005-10-13John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 13 – Cache Iwww-inst.eecs.berkeley.edu/~cs152/A cosmic ray hits a DRAM cell ...TAs: David Marquardt and Udam SainiUC Regents Fall 2005 © UCBCS 152 L13: Cache I4096 rows1 of 4096 decoder2048 columnsEach column 4 bits deep33,554,432 usable bits(tester found good bits in bigger array)12-bitrow address input8196 bits delivered by sense ampsSelect requested bits, send off the chipLast Time: DRAM designDRAM has high latency to first bit out. A fact of life.UC Regents Fall 2005 © UCBCS 152 L13: Cache IToday: Caches and the Memory SystemMemory Hierarchy: Technology motivation for caching.Locality: Why caching worksCache design: Final project component.DatapathMemoryProcessorInputOutputControlUC Regents Fall 2005 © UCBCS 152 L13: Cache I1977: DRAM faster than microprocessors Apple ][ (1977)Steve WozniakSteve Jobs CPU: 1000 ns DRAM: 400 nsUC Regents Fall 2005 © UCBCS 152 L13: Cache ISince then: technology scaling ...Circuit in 250 nm technology (introduced in 2000)H nanometers longSame circuit in 180 nm technology (introduced in 2003)0.7 x H nmEach dimension 30% smaller. Area is 50% smallerLogic circuits use smaller C’s, lower Vdd, and higher kn and kp to speed up clock rates.UC Regents Fall 2005 © UCBCS 152 L13: Cache IDRAM scaled for more bits, not more MHzAssume Ccell = 1 fFWord line may have 2000 nFet drains,assume word line C of 100 fF, or 100*Ccell.Ccell holds Q = Ccell*(Vdd-Vth)When we dump this charge onto the word line, what voltage do we see?dV = [Ccell*(Vdd-Vth)] / [100*Ccell]dV = (Vdd-Vth) / 100 ≈ tens of millivolts! In practice, scale array to get a 60mV signal.UC Regents Fall 2005 © UCBCS 152 L13: Cache I1980-2003, CPU speed outpaced DRAM ...10DRAMCPUPerformance(1/latency)1001000198020001990YearGap grew 50% per yearQ. How do architects address this gap? A. Put smaller, faster “cache” memories between CPU and DRAM. Create a “memory hierarchy”.10000The power wall2005CPU60% per yr2X in 1.5 yrsDRAM9% per yr2X in 10 yrsUC Regents Fall 2005 © UCBCS 152 L13: Cache ICaches: Variable-latency memory ports Small, fast Large, slow FromCPUTo CPUData in upper memory returned with lower latency. Data in lower level returned with higher latency.UC Regents Fall 2005 © UCBCS 152 L13: Cache ICache replaces data, instruction memoryIRIRBAMIRYMIRRMux,LogicIF (Fetch) ID (Decode) EX (ALU) MEMWBReplace with Instruction Cache and Data Cacheof DRAM main memoryUC Regents Fall 2005 © UCBCS 152 L13: Cache I Recall: Intel ARM XScale CPU (PocketPC)32 KB Instruction Cache32 KB Data Cache180 nm process (introduced 2003)UC Regents Spring 2005 © UCBCS 152 L14: Cache IUC Regents Fall 2005 © UCBCS 152 L13: Cache I2005 Memory Hierarchy: Apple iMac G5 iMac G51.6 GHz$1299.00Reg L1 Inst L1 Data L2 DRAM DiskSize1K 64K 32K 512K 256M 80GLatency(cycles)1 3 3 11 160 1e7Let programs address a memory space that scales to the disk size, at a speed that is usually as fast as register accessManaged by compilerManaged by hardwareManaged by OS,hardware,application Goal: Illusion of large, fast, cheap memoryUC Regents Spring 2005 © UCBCS 152 L14: Cache I(1K)RegistersL1 (64K Instruction)L1 (32K Data)512KL290 nm, 58 M transistorsPowerPC 970 FXUC Regents Fall 2005 © UCBCS 152 L13: Cache ILatency: A closer lookRegL1 InstL1 DataL2 DRAM DiskSize1K 64K 32K 512K 256M 80GLatency(cycles)1 3 3 11 160 1e7Latency(sec)0.6n 1.9n 1.9n 6.9n 100n12.5mHz1.6G 533M 533M 145M 10M 80Architect’s latency toolkit: Read latency: Time to return first byte of a random access(1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later.UC Regents Fall 2005 © UCBCS 152 L13: Cache IPrograms with locality cache well ...Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)TimeMemory Address (one dot per access)Q. Point out bad locality behavior ...SpatialLocalityTemporal LocalityBadUC Regents Fall 2005 © UCBCS 152 L13: Cache IThe caching algorithm in one slideTemporal locality: Keep most recently accessed data closer to processor.Spatial locality: Move contiguous blocks in the address space to upper levels.UC Regents Fall 2005 © UCBCS 152 L13: Cache ICaching terminologyHit: Data appearsin upper level block(ex: Blk X)Miss: Data retrieval from lower level needed(Ex: Blk Y)Hit Rate: The fraction of memory accesses found in upper level.Miss Rate: 1 - Hit RateHit Time: Time to access upper level. Includes hit/miss check. Miss penalty: Time to replace block in upper level + deliver to CPUHit Time << Miss PenaltyUC Regents Fall 2005 © UCBCS 152 L12: Memory and InterfacesAdmin: Final Xilinx Checkoff TomorrowLab report due Monday, 11:59 PM.Final project posted soon ...UC Regents Fall 2005 © UCBCS 152 L13: Cache ICache Design ExampleRecall: Static Memory ...UC Regents Fall 2005 © UCBCS 152 L13: Cache IRecall: Static Memory Cell DesignWordlineBitline Gnd Vdd Vdd Gnd Bitline !UC Regents Fall 2005 © UCBCS 152 L13: Cache ISRAM array: simpler than DRAM arrayWriteDriverWriteDriverWriteDriverWriteDriverWord and bit lines slow down as array grows larger! Architects specify number of rows and columns. ParallelDataI/OLinesAdd muxesto selectsubset of bitsHow could we pipeline this memory?UC Regents Fall 2005 © UCBCS 152 L13: Cache ICache Design ExampleUC Regents Fall 2005 © UCBCS 152 L13: Cache ICPU address space: An array of “blocks” Block #71234560227- 1...32-byte blocks27 bits 5 bitsThe job of a cache is to hold a “popular” subset of blocks.32-bit Memory AddressWhich block?Byte #031UC Regents Fall 2005 © UCBCS 152 L13: Cache IOne Approach: Fully Associative CacheCache Tag (27 bits)Byte Select531 04Ex: 0x04ValidBitByte 31...Byte 1Byte 0Byte 31...Byte 1Byte 0Cache DataHolds 4 blocks====HitReturn bytes of “hit” cache lineBlock # (”Tags”)026Ideal, but expensive ...UC Regents Fall 2005 © UCBCS 152


View Full Document

Berkeley COMPSCI 152 - Lecture 13 – Cache I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 13 – Cache I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 13 – Cache I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 13 – Cache I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?