Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 13 Cache I A cosmic ray hits a DRAM cell 2005 10 13 John Lazzaro www cs berkeley edu lazzaro TAs David Marquardt and Udam Saini www inst eecs berkeley edu cs152 CS 152 L13 Cache I UC Regents Fall 2005 UCB Last Time DRAM design 1 DRAM has high latency to first bit out A fac of of life 12 bit row 40 96 2048 address columns Each input de co 4096 colum de 33 554 432 usable bits r rows n4 tester found good bits in bigger array bits deep 8196 bits delivered by sense amps Select requested bits send off the chip CS 152 L13 Cache I UC Regents Fall 2005 UCB Today Caches and the Memory System Memory Hierarchy Technology motivation for caching Processor Input Control Datapath Memory Output Locality Why caching works Cache design Final project component CS 152 L13 Cache I UC Regents Fall 2005 UCB 1977 DRAM faster than microprocessors Apple 1977 CPU 1000 ns DRAM 400 ns Steve Jobs CS 152 L13 Cache I Steve Wozniak UC Regents Fall 2005 UCB Since then technology scaling Circuit in 250 nm technology introduced in 2000 Same circuit in 180 nm technology introduced in 2003 0 7 x H nm H nanometers long Each dimension 30 smaller Area is 50 smaller Logic circuits use smaller C s lower Vdd and higher kn and kp to speed up clock rates CS 152 L13 Cache I UC Regents Fall 2005 UCB DRAM scaled for more bits not more MHz Assume Ccell 1 fF Word line may have 2000 nFet drains assume word line C of 100 fF or Ccell holds Q Ccell Vdd Vth 100 Ccell When we dump this charge onto the word line what voltage do we see dV Ccell Vdd Vth 100 Ccell dV Vdd Vth 100 tens of millivolts In practice scale array to get a 60mV CS 152 L13 Cache I UC Regents Fall 2005 UCB 1980 2003 CPU speed outpaced DRAM Performance 1 latency 0 1000 1000 Q How do architects address this gap A Put smaller faster cache memories between CPU and DRAM Create a memory hierarchy CPU 60 per yr CPU 2X in 1 5 yrs The power wall Gap grew 50 per year 100 DRAM 9 per yr 2X in 10 yrs 10 DRAM 19 80 19 90 20 00 20 05 Year CS 152 L13 Cache I UC Regents Fall 2005 UCB Caches Variable latency memory ports Data in upper memory returned with lower latency Small fast Large slow Data in lower level returned with higher From latency CPU To CPU CS 152 L13 Cache I UC Regents Fall 2005 UCB Cache replaces data instruction memory IF Fetch Replace with Instructio n Cache and Data Cache of DRAM main memory ID Decode IR EX ALU MEM IR IR A Y M M WB IR Mux Logic R B CS 152 L13 Cache I UC Regents Fall 2005 UCB Recall Intel ARM XScale CPU PocketPC 32 KB Instruction Cache 32 KB Data Cache 180 nm process introduced 2003 CS 152 L13 Cache I UC Regents Fall 2005 UCB CS 152 L14 Cache I UC Regents Spring 2005 UCB 2005 Memory Hierarchy Apple iMac G5 Managed by compiler Reg Managed by hardware L1 Inst L1 Data L2 Size 1K 64K 32K 512K Latency cycles 1 3 3 11 Managed by OS hardware application DRAM Disk 256M 80G 160 1e7 iMac G5 1 6 GHz 1299 00 Goal Illusion of large fast cheap memory Let programs address a memory space that scales to the disk size at a speed that is usually as fast as register access CS 152 L13 Cache I UC Regents Fall 2005 UCB 90 nm 58 M transistors L1 64K Instruction 512K L2 R e gi st er s 1K CS 152 L14 Cache I L1 32K Data PowerPC 970 FX UC Regents Spring 2005 UCB Latency A closer look Read latency Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency cycles 1 3 3 11 160 1e7 Latency sec 0 6n 1 9n 1 9n 6 9n 100n 12 5m 1 6G 533M 533M 145M 10M 80 Hz Architect s latency toolkit 1 Parallelism Request data from N 1 bitwide memories at the same time Overlaps latency cost for all N bits Provides N times the bandwidth Requests to N memory banks interleaving have potential of Nhas times the of 2 Pipeline memory If memory N cycles bandwidth latency CS 152 L13 Cache I UC Regents Fall 2005 UCB Programs with locality cache well Memory Address one dot per access Ba d Temporal Locality Spatial Locality Q Point out bad locality CS 152 L13 Cache I Time Donald J Hatfield Jeanette Gerald Program Restructuring for Virtual Memory IBM Systems Journal UC Regents Fall 2005 UCB 10 3 168 192 1971 The caching algorithm in one slide Temporal locality Keep most recently accessed data closer to processor Spatial locality Move contiguous blocks in the address space to upper levels CS 152 L13 Cache I UC Regents Fall 2005 UCB Caching terminology Hit Data appears in upper level block ex Blk X Hit Rate The fraction of memory accesses found in upper level Miss Data retrieval from lower level needed Ex Blk Y Miss Rate 1 Hit Rate CS 152 L13 Cache I Hit Time Miss Penalty Hit Time Time to access upper level Includes hit miss check Miss penalty Time to replace block in upper level deliver toUCCPU Regents Fall 2005 UCB Admin Final Xilinx Checkoff Tomorrow Lab report due Monday 11 59 PM Final project posted CS 152 L12 Memory and Interfaces UC Regents Fall 2005 UCB Cache Design Example Recall Static Memory CS 152 L13 Cache I UC Regents Fall 2005 UCB Recall Static Memory Cell Design Gnd Vdd Vdd Gnd Wordlin e Bitline CS 152 L13 Cache I Bitline UC Regents Fall 2005 UCB SRAM array simpler than DRAM array Architects specify number of rows and Word and bit lines slow down as array columns grows larger Write Driver Write Driver Write Driver Write Driver Parallel Data I O Lines How could we pipeline this CS 152 L13 Cache I Add muxes to select subset of bits UC Regents Fall 2005 UCB Cache Design Example CS 152 L13 Cache I UC Regents Fall 2005 UCB CPU address space An array of blocks Which block Byte Block 32 byte blocks 32 bit Memory Address 31 0 0 1 2 27 bits The job of a cache is to hold a popular subset of blocks CS 152 L13 Cache I 5 bits 3 4 5 6 7 27 2 1 UC Regents Fall 2005 UCB Byte Select Cache Tag 27 bits Fully One Approach Associative Cache Byte 31 Byte 1 Byte 0 Byte 31 Byte 1 Byte 0 Ideal but expensive 31 Block 26 Tags 5 4 Cache Data Holds 4 blocks 0 0 Ex 0x04 Hit CS 152 L13 Cache I Valid Bit Return bytes of hit cache line UC Regents Fall 2005 UCB Building a cache with one comparator Which …


View Full Document

Berkeley COMPSCI 152 - Lecture 13 – Cache I

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 13 – Cache I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 13 – Cache I and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?