Berkeley COMPSCI 152 - Lecture 13 – Cache I - D1840305

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Lecture 13 – Cache I

Berkeley COMPSCI 152 - Lecture 13 – Cache I

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 43

Download Save

Unformatted text preview:

CS 152 Computer Architecture and Engineering Lecture 13 Cache I A cosmic ray hits a DRAM cell 2006 10 12 John Lazzaro www cs berkeley edu lazzaro TAs Udam Saini and Jue Sun www inst eecs berkeley edu cs152 CS 152 L13 Cache I UC Regents Fall 2006 UCB Today Caches and the Memory System Memory Hierarchy Technology motivation for caching Processor Input Control Datapath Memory Output Locality Why caching works Cache design Final project component CS 152 L13 Cache I UC Regents Fall 2006 UCB 1977 DRAM faster than microprocessors Apple 1977 CPU 1000 ns DRAM 400 ns Steve Jobs CS 152 L13 Cache I Steve Wozniak UC Regents Fall 2006 UCB Since then technology scaling Circuit in 250 nm technology introduced in 2000 Same circuit in 180 nm technology introduced in 2003 Each dimension 30 smaller Area is 50 smaller 0 7 x H nm H nanometers long Logic circuits use smaller C s lower Vdd and higher kn and kp to speed up clock rates CS 152 L13 Cache I UC Regents Fall 2006 UCB DRAM scaled for more bits not more MHz Assume Ccell 1 fF Word line may have 2000 nFet drains assume word line C of 100 fF or 100 Ccell Ccell holds Q Ccell Vdd Vth When we dump this charge onto the word line what voltage do we see dV Ccell Vdd Vth 100 Ccell dV Vdd Vth 100 tens of millivolts In practice scale array to get a 60mV signal CS 152 L13 Cache I UC Regents Fall 2006 UCB 1980 2003 CPU speed outpaced DRAM Performance 1 latency 0 1000 Q How do architects address this gap A Put smaller faster cache memories bet ween CPU and DRAM Create a memory hierarchy CPU 60 per yr 2X in 1 5 yrs 1000 The power wall CPU Gap grew 50 per year 100 DRAM 9 per yr 2X in 10 yrs 10 DRAM 1 9 8 0 CS 152 L13 Cache I 1 9 9 0 2 0 0 0 2 0 0 5 Year UC Regents Fall 2006 UCB Caches Variable latency memory ports Data in upper memory returned with lower latency Data in lower level returned with higher latency Data To Processor Upper Level Memory Small fast Address From Processor Blk X Lower Level Memory Large slow Blk Y From CPU To CPU CS 152 L13 Cache I UC Regents Fall 2006 UCB Cache replaces data instruction memory IF Fetch Replace with Instruction Cache and Data Cache of DRAM main memory 0x4 ID Decode IR IR WB IR IR op 32 A 32 A L U 32 Y rd2 Data Memory R Addr RegFile rs1 rd1 rs2 wd Dout Din WE M MemToReg M WE Instr Mem PC D MEM Mux Logic ws EX ALU Q Addr Data Ext CS 152 L13 Cache I B UC Regents Fall 2006 UCB Recall Intel ARM XScale CPU PocketPC ndby power low voltage ndby current ntage of the bias is used de All core ce and bulk cobalt disilipacitance as ormance and IEEE JOURNAL OF SOLID STATE CIRCUITS VOL 36 NO 11 NOVEMBER 2001 32 KB Instruction Cache 32 KB Data Cache 180 nm process introduced 2003 CS 152 L13 Cache I UC Regents Fall 2006 UCB CS 152 L14 Cache I UC Regents Spring 2005 UCB 2005 Memory Hierarchy Apple iMac G5 Managed by compiler Managed by hardware Managed by OS hardware application Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency cycles 1 3 3 11 160 1e7 iMac G5 1 6 GHz 1299 00 Goal Illusion of large fast cheap memory Let programs address a memory space that scales to the disk size at a speed that is usually as fast as register access CS 152 L13 Cache I UC Regents Fall 2006 UCB 90 nm 58 M transistors L1 64K Instruction 512K L2 R e g i s t e r s 1K CS 152 L14 Cache I L1 32K Data PowerPC 970 FX UC Regents Spring 2005 UCB Latency A closer look Read latency Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency cycles 1 3 3 11 160 1e7 Latency sec 0 6n 1 9n 1 9n 6 9n 100n 12 5m 1 6G 533M 533M 145M 10M 80 Hz Architect s latency toolkit 1 Parallelism Request data from N 1 bit wide memories at the same time Overlaps latency cost for all N bits Provides N times the bandwidth Requests to N memory banks interleaving have potential of N times the bandwidth 2 Pipeline memory If memory has N cycles of latency issue a request each cycle receive it N cycles later CS 152 L13 Cache I UC Regents Fall 2006 UCB Programs with locality cache well Memory Address one dot per access Bad Temporal Locality Spatial Locality Q Point out bad locality behavior CS 152 L13 Cache I Time Donald J Hatfield Jeanette Gerald Program Restructuring for Virtual Memory IBM Systems Journal 10 3 168 192 1971 UC Regents Fall 2006 UCB The caching algorithm in one slide Temporal locality Keep most recently accessed data closer to processor To Processor Upper Level Memory Lower Level Memory Blk X From Processor Blk Y Spatial locality Move contiguous blocks in the address space to upper levels CS 152 L13 Cache I UC Regents Fall 2006 UCB Caching terminology Hit Data appears in upper level block ex Blk X To Processor Hit Rate The fraction of memory accesses found in upper level Upper Level Memory Hit Time Time to access upper level Includes hit miss check Lower Level Memory Blk X From Processor Miss Data retrieval from lower level needed Ex Blk Y CS 152 L13 Cache I Blk Y Miss Rate 1 Hit Rate Hit Time Miss Penalty Miss penalty Time to replace block in upper level deliver to CPU UC Regents Fall 2006 UCB Admin Final Xilinx Checkoff Friday Lab report due Monday 11 59 PM Final project is up CS 152 L13 Cache I UC Regents Fall 2006 UCB Cache Design Example Recall Static Memory CS 152 L13 Cache I UC Regents Fall 2006 UCB Recall Static Memory Cell Design Gnd Vdd Vdd Gnd Wordline Bitline CS 152 L13 Cache I Bitline UC Regents Fall 2006 UCB Lec19 13 SRAM array simpler than DRAM array Architects specify number of rows and columns Typical SRAM Organization 16 word x 4 bit Word and bit lines slow down as array grows larger Din 3 Din 2 Din 1 Din 0 WrEn Precharge WrWrite Driver Precharger Driver WrWrite Driver Precharger Driver WrWrite Driver Precharger Driver WrWrite Driver Precharger Driver SRAM Cell Parallel Data I O Lines SRAM Cell SRAM Cell SRAM Cell Word 1 SRAM Cell SRAM Cell SRAM Cell SRAM Cell Address Decoder Word 0 A0 A1 A2 A3 Word 15 SRAM Cell SRAM Cell SRAM Cell SRAM Cell Sense Amp Sense Amp Sense Amp Sense Amp Dout 3 Dout 2 Dout 1 Dout 0 4 12 04 UCB Spring 2004 How could we pipeline this memory CS 152 L13 Cache I Q …

View Full Document

Berkeley COMPSCI 152 - Lecture 13 – Cache I

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

Berkeley COMPSCI 152 - Lecture 13 – Cache I

Sign up for free to view:

Please select your school