4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.1CS152Computer Architecture and EngineeringLecture 18Memory and CachesApril 7, 1999John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.2µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”Processor-DRAM Memory Gap (latency)Recap: Who Cares About the Memory Hierarchy?4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.3Recap: Static RAM Cell6-Transistor SRAM Cellbit bitword(row select)bit bitword° Write:1. Drive bit lines (bit=1, bit=0)2.. Select row° Read:1. Precharge bit and bit to Vdd2.. Select row3. Cell pulls one line low4. Sense amp on column detects difference between bit and bitreplaced with pullupto save area10014/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.4Recap: 1-Transistor Memory Cell (DRAM)° Write:• 1. Drive bit line• 2.. Select row° Read:• 1. Precharge bit line to Vdd• 2.. Select row• 3. Cell and bit line share charges- Very small voltage changes on the bit line• 4. Sense (fancy sense amp)- Can detect changes of ~1 million electrons• 5. Write: restore the value° Refresh• 1. Just do a dummy read to every cell.row selectbit4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.5Recap: Memory Hierarchy of a Modern Computer System° By taking advantage of the principle of locality:• Present the user with as much memory as is available in thecheapest technology.• Provide access at the speed offered by the fastest technology.ControlDatapathSecondaryStorage(Disk)ProcessorRegistersMainMemory(DRAM)SecondLevelCache(SRAM)On-ChipCache1s 10,000,000s (10s ms)Speed (ns): 10s 100s100sGsSize (bytes):Ks MsTertiaryStorage(Disk)10,000,000,000s (10s sec)Ts4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.6Recap: Memory Systems° Two Different Types of Locality:• Temporal Locality (Locality in Time): If an item is referenced, it willtend to be referenced again soon.• Spatial Locality (Locality in Space): If an item is referenced, itemswhose addresses are close by tend to be referenced soon.° By taking advantage of the principle of locality:• Present the user with as much memory as is available in thecheapest technology.• Provide access at the speed offered by the fastest technology.° DRAM is slow but cheap and dense:• Good choice for presenting the user with a BIG memory system° SRAM is fast but expensive and not very dense:• Good choice for providing the user FAST access time.4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.7° The Five Classic Components of a Computer° Today’s Topics:•Recap last lecture•Continue discussion of DRAM•Cache Review•Advanced Cache• Virtual Memory• Protection• TLBThe Big Picture: Where are We Now? ControlDatapathMemoryProcessorInputOutput4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.8Classical DRAM Organization (square)rowdecoderrowaddressColumn Selector & I/O CircuitsColumnAddressdataRAM Cell Arrayword (row) selectbit (data) lines° Row and Column Addresstogether:• Select 1 bit a timeEach intersection representsa 1-T DRAM Cell4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.9DRAM logical organization (4 Mbit)° Square root of bits per RAS/CASColumn DecoderSense Amps & I/OMemory Array(2,048 x 2,048)A0…A10…11DQWord LineStorage Cell4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.10Block Row Dec.9 : 512RowBlockRow Dec.9 : 512Column Address…BlockRow Dec.9 : 512BlockRow Dec.9 : 512…Block 0 Block 3…I/OI/OI/OI/OI/OI/OI/OI/ODQAddress28 I/Os8 I/OsDRAM physical organization (4 Mbit)4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.11ADOE_L256K x 8DRAM98WE_L° Control Signals (RAS_L, CAS_L, WE_L, OE_L) are allactive low° Din and Dout are combined (D):• WE_L is asserted (Low), OE_L is disasserted (High)- D serves as the data input pin• WE_L is disasserted (High), OE_L is asserted (Low)- D is the data output pin° Row and column addresses share the same pins (A)• RAS_L goes low: Pins A are latched in as row address• CAS_L goes low: Pins A are latched in as column address• RAS/CAS edge-sensitiveCAS_LRAS_LLogic Diagram of a Typical DRAM4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.12ADOE_L256K x 8DRAM98WE_LCAS_LRAS_LOE_LA Row AddressWE_LJunkRead AccessTimeOutput EnableDelayCAS_LRAS_LCol Address Row Address JunkCol AddressD High Z Data OutDRAM Read Cycle TimeEarly Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L° Every DRAM accessbegins at:• The assertion of the RAS_L• 2 ways to read:early or late v. CASJunk Data Out High ZDRAM Read Timing4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.13ADOE_L256K x 8DRAM98WE_LCAS_LRAS_LWE_LA Row AddressOE_LJunkWR Access Time WR Access TimeCAS_LRAS_LCol Address Row Address JunkCol AddressD Junk JunkData In Data In JunkDRAM WR Cycle TimeEarly Wr Cycle: WE_L asserted before CAS_L Late Wr Cycle: WE_L asserted after CAS_L° Every DRAM accessbegins at:• The assertion of the RAS_L• 2 ways to write:early or late v. CASDRAM Write Timing4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.14°Simple:• CPU, Cache, Bus, Memorysame width(32 bits)°Interleaved:• CPU, Cache, Bus 1 word:Memory N Modules(4 Modules); example isword interleaved°Wide:• CPU/Mux 1 word;Mux/Cache, Bus,Memory N words(Alpha: 64 bits & 256bits)Main Memory Performance4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.15° DRAM (Read/Write) Cycle Time >> DRAM(Read/Write) Access Time• - 2:1; why?° DRAM (Read/Write) Cycle Time :• How frequent can you initiate an access?• Analogy: A little kid can only ask his father for money on Saturday° DRAM (Read/Write) Access Time:• How quickly will you get what you want once you initiate an access?• Analogy: As soon as he asks, his father will give him the money° DRAM Bandwidth Limitation analogy:• What happens if he runs out of money on Wednesday?TimeAccess TimeCycle TimeMain Memory Performance4/7/99 ©UCB Spring 1999CS152 / KubiatowiczLec18.16Access Pattern without Interleaving:Start Access for D1CPU MemoryStart Access for D2D1 availableAccess Pattern with 4-way Interleaving:Access Bank 0Access Bank 1Access Bank 2Access Bank 3We can Access Bank 0 againCPUMemoryBank 1MemoryBank 0MemoryBank 3MemoryBank 2Increasing Bandwidth - Interleaving4/7/99 ©UCB Spring 1999CS152 /
View Full Document