CS 152 Computer Architecture and Engineering Lecture 6 Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http www eecs berkeley edu krste http inst eecs berkeley edu cs152 Last time in Lecture 5 Control hazards branches interrupts are most difficult to handle as they change which instruction should be executed next Speculation commonly used to reduce effect of control hazards predict sequential fetch predict no exceptions Branch delay slots make control hazard visible to software Precise exceptions stop cleanly on one instruction all previous instructions completed no following instructions have changed architectural state To implement precise exceptions in pipeline shift faulting instructions down pipeline to commit point where exceptions are handled in program order 2 12 2008 CS152 Spring 08 2 CPU Memory Bottleneck CPU Memory Performance of high speed computers is usually limited by memory bandwidth latency Latency time for a single access Memory access time Processor cycle time Bandwidth number of accesses per unit time 2 12 2008 if fraction m of instructions access memory 1 m memoryCS152 Spring 08 references instruction 3 Core Memory Core memory was first large scale reliable main memory invented by Forrester in late 40s at MIT for Whirlwind project Bits stored as magnetization polarity on small ferrite cores threaded onto 2 dimensional grid of wires Coincident current pulses on X and Y wires would write cell and also sense original state destructive reads Robust non volatile storage Used on space shuttle computers until recently Cores threaded onto wires by hand 25 billion a year at peak production Core access time 1 s 2 12 2008 DEC PDP 8 E Board 4K words x 12 bits 1968 CS152 Spring 08 4 Semiconductor Memory DRAM Semiconductor memory began to be competitive in early 1970s Intel formed to exploit market for semiconductor memory First commercial DRAM was Intel 1103 1Kbit of storage on single chip charge on a capacitor used to hold value Semiconductor memory quickly replaced core in 70s 2 12 2008 CS152 Spring 08 5 One Transistor Dynamic RAM 1 T DRAM Cell word access transist or VREF bit TiN top electrode VREF Ta2O5 dielectric Storage capacitor FET gate trench stack poly word line 2 12 2008 W bottom electrode access transistor CS152 Spring 08 6 DRAM Architecture Col 1 M word lines Row 1 Row Address Decoder N N M bit lines Col 2M Row 2N Column Decoder Sense Amplifiers Data Memory cell one bit D Bits stored in 2 dimensional arrays on chip Modern chips have around 4 logical banks on each chip each logical bank physically implemented as many 2 12 2008 CS152 Spring 08 7 DRAM Packaging Clock and control signals Address lines 7 DRAM chip multiplexed row column 12 address Data bus 4b 8b 16b 32b DIMM Dual Inline Memory Module contains multiple chips with clock control address signals connected in parallel sometimes need buffers to drive signals to all chips Data pins work together to return wide word e g 64bit data bus using 16x4 bit parts 2 12 2008 CS152 Spring 08 8 DRAM Operation Three steps in read write access to a given bank Row access RAS decode row address enable addressed row often multiple Kb in row bitlines share charge with storage cell small change in voltage detected by sense amplifiers which latch whole row of bits sense amplifiers drive bitlines full rail to recharge storage cells Column access CAS decode column address to select small number of sense amplifier latches 4 8 16 or 32 bits depending on DRAM package on read send latched bits out to chip pins on write change sense amplifier latches which then charge storage cells to required value can perform multiple column accesses on same row without another row access burst mode Precharge charges bit lines to known value required before next row access Each step has a latency of around 20ns in modern DRAMs Various DRAM standards DDR RDRAM have different ways of encoding the signals for transmission to the DRAM but all share same core architecture 2 12 2008 CS152 Spring 08 9 Double Data Rate DDR2 DRAM 200MHz Clock Row Column Precharge Row Data Micron 256Mb DDR2 SDRAM datasheet 2 12 2008 CS152 Spring 08 400Mb s Data Rate 10 Processor DRAM Gap latency Proc 60 year CPU Moore s Law Processor Memory Performance Gap grows 50 year 100 10 DRAM 7 year 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1 1982 1983 1984 1985 1986 1987 DRAM 1980 1981 Performance 1000 Time Four issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access 11 2 12 2008 CS152 Spring 08 Little s Law Throughput T Number in Flight N Latency L CPU Example Table of accesses in flight Memory Assume infinite bandwidth memory 100 cycles memory reference 1 0 2 memory references instruction Table size 1 2 100 2 12 2008 120 entries 120 independent memory operations in flight CS152 Spring 08 12 Typical Memory Reference Patterns Address n loop iterations Instruction fetches Stack accesses subroutine call subroutine return argument access Data accesses cto e v ss e cc ra scalar accesses Time 2 12 2008 CS152 Spring 08 Common Predictable Patterns Two predictable properties of memory references Temporal Locality If a location is referenced it is likely to be referenced again in the near future Spatial Locality If a location is referenced it is likely that locations near it will be referenced in the near future 2 12 2008 CS152 Spring 08 Memory Address one dot per access Memory Reference Patterns Temporal Locality Spatial Locality Time Donald J Hatfield Jeanette Gerald Program Restructuring for Virtual Memory IBM Systems Journal 10 3 168 192 1971 Multilevel Memory Strategy Reduce average latency using small fast memories called caches Caches are a mechanism to reduce memory latency based on the empirical observation that the patterns of memory references made by a processor are often highly predictable loop ADD r2 r1 r1 SUBI r3 r3 1 BNEZ r3 loop 2 12 2008 PC 96 100 104 108 112 CS152 Spring 08 Memory Hierarchy A CPU Small Fast Memory RF SRAM B Big Slow Memory DRAM holds frequently used data capacity Register SRAM DRAM why latency Register SRAM DRAM why bandwidth on chip off chip why On a data access hit data fast memory low latency access miss data fast memory long latency access17 2 12 2008 CS152 Spring 08 Management of Memory Hierarchy Small fast storage e g registers Address usually specified in instruction Generally implemented directly as a register file
View Full Document
Unlocking...