Main Memory Background Conventional DRAM system Memory Hierarchy and Cache Design 3 Main Memory Background Main Memory Background Main Memory uses DRAM dynamic RAM Performance Metrics of Main Memory Dynamic because of the need to be refreshed periodically but requires only 1 transistor bit Addresses are divided into 2 parts RAS or Row Access Strobe CAS or Column Access Strobe Latency Affects cache miss penalty Access Time time between the request and when the desired word arrives Cycle Time minimum time between requests Bandwidth Affects I O performance cache miss penalty especially when a large block is used in the L2 cache Cache uses SRAM static RAM No refresh but requires 6 transistors bit Address not divided for fast access Comparison Capacity DRAM is 4 8 times that of SRAM Cycle time SRAM is 8 16 times faster than DRAM Cost SRAM is 8 16 times more expensive than DRAM Page 1 1 Trends in DRAM Main Memory Organizations Basic Memory Organization one word wide bus 4 clock cycles to send the address 24 clock cycles for the access time per word 4 clock cycles to send a word of data Example cache block of 4 words miss penalty 4 x 4 24 4 128 cycles Capacity improves by 60 per year Row access time improves by 7 per year First Technique Wider Main Memory Faster Memory System Cache miss penalty 1 Wider Main Memory 2 Simple Interleaved Memory 3 Independent Memory Banks 4 Avoiding Memory Bank Conflicts 5 DRAM specific Interleaving two word wide bus 2 x 4 24 4 64 cycles four word wide bus 1 x 4 24 4 32 cycles Drawbacks Higher bus costs Multiplexor Reduced expandability More frequent read modifywrite s in memories with error correction Page 2 2 Four way interleaving Second Technique Simple Interleaved Memory Memory consists of several DRAM Chips Each chip is capable of autonomous operation Organize memory chips in banks and issue memory requests to all banks at the same time Banks are one word wide Memory Interleaving Third Technique Independent Memory Banks Mapping addresses to banks affects the behavior of the memory system Optimized for sequential access May spread consecutive addresses to several banks Interleaving Factor Normally word interleaved Can also be byte interleaved Depends on the organization of the bank Multiple independent banks Multiple memory controllers Each bank uses separate address and data lines Goal deliver information from new bank on each cycle Need more banks than the number of cycles to access a bank As memory chip size increases use fewer chips Constructing multiple banks becomes difficult Restricted Expandability Can increase memory only by doubling it Cache Miss Penalty 4 24 4 x 4 44 cycles Page 3 3 Fourth Technique Avoiding Memory Bank Conflicts Avoiding Memory Bank Conflicts Problem int x 256 512 for j 0 j 512 j j 1 for i 0 i 256 i i 1 x i j 2 x i j Even with 128 banks there are conflicts since 512 is a multiple of 128 Example Software solutions loop interchange Resizing the array Hardware solutions Based on the Chinese Remainder Theorem Prime number of banks bank number address mod number of banks address within bank address mod number of words in bank Fifth Technique DRAM Specific Interleaving Summary of Faster Memory Systems Multiple column accesses page mode DRAM buffers a row of bits inside the DRAM for column access e g 16 Kbits row for 64 Mbits DRAM Allow repeated column access without another row access 64 Mbit DRAM cycle time 90 ns optimized access 25 ns 1 Wider Main Memory 2 Simple Interleaved Memory 3 Independent Memory Banks 4 Avoiding Memory Bank Conflicts 5 DRAM specific Interleaving New DRAMs Example RAMBUS each chip acts like a memory system uses a packet switched bus is synchronous to the CPU clock returns a variable amount of data even performs its own refresh Page 4 4
View Full Document