Memories • Memories in Verilog • Memories on the FPGA • External Memories -- SRAM (async, sync) -- DRAM -- Flash 6.111 Fall 2008 1 Lecture 7Memories: a practical primer • The good news: huge selection of technologies – Small & faster vs. large & slower – Every year capacities go up and prices go down – New kid on the block: high density, fast flash memories • Non-volatile, read/write, no moving parts! (robust, efficient) • The bad news: perennial system bottleneck – Latencies (access time) haven’t kept pace with cycle times – Separate technology from logic, so must communicate between silicon, so physical limitations (# of pins, R’s and C’s and L’s) limit bandwidths • New hopes: capacitive interconnect, 3D IC’s – Likely the limiting factor in cost & performance of many digital systems: designers spend a lot of time figuring out how to keep memories running at peak bandwidth – “It’s the memory, stupid” 6.111 Fall 2008 2 Lecture 7Memories in Verilog • reg bit; // a single register • reg [31:0] word; // a 32-bit register • reg [31:0] array[15:0]; // 16 32-bit regs • wire [31:0] read_data,write_data; wire [3:0] index; // combinational (asynch) read assign read_data = array[index]; // clocked (synchronous) write always @(posedge clock) array[index] <= write_data; 6.111 Fall 2008 3 Lecture 7Multi-port Memories (aka regfiles) reg [31:0] regfile[30:0]; // 31 32-bit words // Beta register file: 2 read ports, 1 write wire [4:0] ra1,ra2,wa; wire [31:0] rd1,rd2,wd; assign ra1 = inst[20:16]; assign ra2 = ra2sel ? inst[25:21] : inst[15:11]; assign wa = wasel ? 5'd30 : inst[25:21]; // read ports assign rd1 = (ra1 == 5’d31) ? 32’d0 : regfile[ra1]; assign rd2 = (ra2 == 5’d31) ? 32’d0 : regfile[ra2]; // write port always @(posedge clk) if (werf) regfile[wa] <= wd; assign z = ~| rd1; // used in BEQ/BNE instructions 6.111 Fall 2008 4 Lecture 7FIFOs // a simple synchronous FIFO (first-in first-out) buffer // Parameters: // LOGSIZE (parameter) FIFO has 1<<LOGSIZE elements // WIDTH (parameter) each element has WIDTH bits // Ports: // clk (input) all actions triggered on rising edge // reset (input) synchronously empties fifo // din (input, WIDTH bits) data to be stored // wr (input) when asserted, store new data // full (output) asserted when FIFO is full // dout (output, WIDTH bits) data read from FIFO // rd (input) when asserted, removes first element // empty (output) asserted when fifo is empty // overflow (output) asserted when WR but no room, cleared on next RD module fifo #(parameter LOGSIZE = 2, // default size is 4 elements WIDTH = 4) // default width is 4 bits (input clk,reset,wr,rd, input [WIDTH-1:0] din, output full,empty,overflow, output [WIDTH-1:0] dout); … endmodule din clk wr full reset FIFO 1<<LOGSIZE locations dout empty overflow rd WIDTH WIDTH 6.111 Fall 2008 5 Lecture 7FIFOs in action // make a fifo with 8 8-bit locations fifo #(.LOGSIZE(3),.WIDTH(8)) f8x8(.clk(clk),.reset(reset), .wr(wr),.din(din),.full(full), .rd(rd),.dout(dout),.empty(empty), .overflow(overflow)); 6.111 Fall 2008 6 Lecture 7FPGA memory implementation • Regular registers in logic blocks – Piggy use of resources, but convenient & fast if small • [Xilinx Vertex II] use the LUTs: – Single port: 16x(1,2,4,8), 32x(1,2,4,8), 64x(1,2), 128x1 – Dual port (1 R/W, 1R): 16x1, 32x1, 64x1 – Can fake extra read ports by cloning memory: all clones are written with the same addr/data, but each clone can have a different read address • [Xilinx Vertex II] use block ram: – 18K bits: 16Kx1, 8Kx2, 4Kx4 with parity: 2Kx(8+1), 1Kx(16+2), 512x(32+4) – Single or dual port – Pipelined (clocked) operations – Labkit XCV2V6000: 144 BRAMs, 2952K bits total 6.111 Fall 2008 7 Lecture 7LUT-based RAMs 6.111 Fall 2008 8 Lecture 7LUT-based RAM Modules // instantiate a LUT-based RAM module RAM16X1S mymem #(.INIT(16’b01101111001101011100)) // msb first (.D(din),.O(dout),.WE(we),.WCLK(clock_27mhz), .A0(a[0]),.A1(a[1]),.A2(a[2]),.A3(a[3])); 6.111 Fall 2008 9 Lecture 7Tools will often build these for you… reg [7:0] segments; always @ (switch[3:0]) begin case (switch[3:0]) 4'h0: segments[6:0] = 7'b0111111; 4'h1: segments[6:0] = 7'b0000110; 4'h2: segments[6:0] = 7'b1011011; 4'h3: segments[6:0] = 7'b1001111; 4'h4: segments[6:0] = 7'b1100110; 4'h5: segments[6:0] = 7'b1101101; 4'h6: segments[6:0] = 7'b1111101; 4'h7: segments[6:0] = 7'b0000111; 4'h8: segments[6:0] = 7'b1111111; 4'h9: segments[6:0] = 7'b1100111; 4'hA: segments[6:0] = 7'b1110111; 4'hB: segments[6:0] = 7'b1111100; 4'hC: segments[6:0] = 7'b1011000; 4'hD: segments[6:0] = 7'b1011110; 4'hE: segments[6:0] = 7'b1111001; 4'hF: segments[6:0] = 7'b1110001; default: segments[6:0] = 7'b00000000; endcase segments[7] = 1'b0; // decimal point end ============================================= * HDL Synthesis * ============================================= Synthesizing Unit <lab2_2>. Related source file is "../lab2_2.v". ... Found 16x7-bit ROM for signal <$n0000>. ... Summary: inferred 1 ROM(s). ... Unit <lab2_2> synthesized. ============================================= Timing constraint: Default path analysis Total number of paths / destination ports: 28 / 7 ------------------------------------------------- Delay: 7.244ns (Levels of Logic = 3) Source: switch<3> (PAD) Destination: user1<0> (PAD) Data Path: switch<3> to user1<0> Gate Net Cell:in->out fanout Delay Delay Logical Name --------------------------------- ------------ IBUF:I->O 7 0.825 1.102 switch_3_IBUF LUT4:I0->O 1 0.439 0.517 Mrom__n0000_inst_lut4_01 OBUF:I->O 4.361 user1_0_OBUF --------------------------------------- Total 7.244ns (5.625ns logic, 1.619ns route) (77.7% logic, 22.3% route) From Lab 2: 6.111 Fall 2008 10 Lecture 7Block Memories (BRAMs) (WDATA + WPARITY)*(LOCATIONS) = 18K bits 1 2 4 8 16 32 1,2,4 16K,8K,4K,2K,1K,512 6.111 Fall 2008 11 Lecture 7BRAM Operation Source:
View Full Document