CS250 VLSI Systems Design Lecture 9 Memory John Wawrzynek Krste Asanovic with John Lazzaro and Brian Zimmer TA UC Berkeley Fall 2011 Lecture 9 Memory CS250 UC Berkeley Fall 2011 CMOS Bistable D 1 D 0 Flip State D 0 D 1 Cross coupled inverters used to hold state in CMOS Static storage in powered cell no refresh needed If a storage node leaks or is pushed slightly away from correct value non linear transfer function of high gain inverter removes noise and recirculates correct value To write new state have to force nodes to opposite state Lecture 9 Memory 2 CS250 UC Berkeley Fall 20111 CMOS Transparent Latch Latch transparent output follows input when clock is high holds last value when clock is low Clk D Optional Input Buffer Q Clk Transmission gate switch with both pMOS and nMOS passes both ones and zeros well Schematic Symbols Lecture 9 Memory D Clk Optional Output Buffer Clk Clk D Q 3 Clk Q Transparent on clock low CS250 UC Berkeley Fall 20111 Latch Operation 0 D 1 D 1 Q D 1 0 0 0 1 Clock High Latch Transparent Lecture 9 Memory Q Q Clock Low Latch Holding 4 CS250 UC Berkeley Fall 20111 Flip Flop as Two Latches Sample Clk D Hold Clk Q Q Clk Clk Clk ell c d r a d ilt n a t u out s Clk b w e r n o i a h ps a s r i t x s o Thi flip fl ith e s w ffer y l l a u u b s u Clk Schematic Symbols D Q Clk Lecture 9 Memory Clk Clk 5 CS250 UC Berkeley Fall 20111 Small Memories from Stdcell Latches Read Address Data held in transparent low latches Read Address Decoder Write Address Decoder Write Address Write Data Clk Write by clocking latch Add additional ports by replicating read and write port logic multiple write ports need mux in front of latch Combinational logic for read port synthesized Clk Optional read output latch Expensive to add many ports Lecture 9 Memory 6 CS250 UC Berkeley Fall 20111 6 Transistor SRAM Static RAM Wordline Bit Bit Large on chip memories built from arrays of static RAM bitcells where each bit cell holds a bistable crosscoupled inverters and two access transistors Other clocking and access logic factored out into periphery Lecture 9 Memory 7 CS250 UC Berkeley Fall 20111 012 Intel s 22nm SRAM cell 0 092 um2 SRAM cell for high density applications 0 108 um2 SRAM cell for low voltage applications Bohr Intel Sept 2009 012 3 8 CS250 UC Berkeley Fall 20111 4 5678 49 3 73 7 67 6 Lecture 9 Memory General SRAM Structure Bitline Prechargers Clk Address Decode and Wordline Driver Usually maximum of 128 256 bits per row or column Address Clk Write Enable Lecture 9 Memory Differential Read Sense Amplifiers Differential Write Drivers Write Data 9 Read Data CS250 UC Berkeley Fall 20111 Address Decoder Structure Word Line 0 Word Line 1 Unary 1 of 4 encoding Word Line 15 2 4 Predecoders A0 Lecture 9 Memory A1 A2 Address 10 A3 Clocked Word Line Enable CS250 UC Berkeley Fall 20111 Prechargers Clk Read Cycle Clk Bit Bit From Decoder Bitline differential Wordline Storage Cells Wordline Clock Sense Bit Sense Bit Data Data Sense Amp 1 2 3 Full rail swing 1 Precharge bitlines and senseamp Data Output Set Reset Latch Lecture 9 Memory Data 2 Pulse wordlines develop bitline differential voltage 3 Disconnect bitlines from senseamp activate sense pulldown develop full rail data signals Pulses generated by internal self timed signals often using replica circuits representing critical paths 11 CS250 UC Berkeley Fall 20111 Write Cycle Prechargers Clk Clk Bit Bit From Decoder Wordline Storage Cells 1 2 1 Precharge bitlines Wordline Clock 2 Open wordline pull down one bitline full rail Bit Bit Write Enable Write Data Lecture 9 Memory Write enable can be controlled on a per bit level If bit lines not driven during write cell retains value looks like a read to the cell 12 CS250 UC Berkeley Fall 20111 Column Muxing at Sense Amps Clk From Decoder Wordline Clock Sel0 Sel1 Data Data Sense Amp Difficult to pitch match sense amp to tight SRAM bit cell spacing so often 2 8 columns share one sense amp Impacts power dissipation as multiple bitline pairs swing for each bit read Lecture 9 Memory 13 CS250 UC Berkeley Fall 20111 Building Larger Memories D Bit cells e Bit cells c I O I O I O I O D Bit cells e Bit cells c D Bit cells e Bit cells c D Bit cells e Bit cells c D Bit cells e Bit cells c I O I O D Bit cells e Bit cells c Large arrays constructed by tiling multiple leaf arrays sharing decoders and I O circuitry D Bit cells e Bit cells c I O e g sense amp attached to arrays above and below Leaf array limited in size to 128 256 bits in row column due to RC delay of wordlines and bitlines I O D Bit cells e Bit cells c Also to reduce power by only activating selected sub bank In larger memories delay and energy dominated by I O wiring Lecture 9 Memory 14 CS250 UC Berkeley Fall 20111 Adding More Ports WordlineB WordlineA Differential Read or Write ports BitB BitB BitA BitA Wordline Read Bitline Optional Single ended Read port Lecture 9 Memory 15 CS250 UC Berkeley Fall 20111 Memory Compilers In ASIC flow memory compilers used to generate layout for SRAM blocks in design Often hundreds of memory instances in a modern SoC Memory generators can also produce built in self test BIST logic to speed manufacturing testing and redundant rows columns to improve yield Compiler can be parameterized by number of words number of bits per word desired aspect ratio number of sub banks degree of column muxing etc Area delay and energy consumption complex function of design parameters and generation algorithm Worth experimenting with design space Usually only single read or write port SRAM and one read and one write SRAM generators in ASIC library Lecture 9 Memory 16 CS250 UC Berkeley Fall 20111 Small Memories Compiled SRAM arrays usually have a high overhead due to peripheral circuits BIST redundancy Small memories are usually built from latches and or flipflops in a stdcell flow Cross over point is usually around 1K bits of storage Should try design both ways Lecture 9 Memory 17 CS250 UC Berkeley Fall 20111 Memory Design Patterns Lecture 9 Memory 18 CS250 UC Berkeley Fall 2011 Multiport Memory Design Patterns Often we require multiple access ports to a common memory True Multiport Memory As describe earlier in lecture completely independent read and write port circuitry Banked Multiport Memory Interleave lesser ported banks to provide higher bandwidth Stream Buffered Multiport Memory Use single wider access port to provide multiple narrower streaming ports Cached Multiport Memory Use large single port main memory but add
View Full Document