CS152Computer Architecture and EngineeringLecture 6Verilog (finish)Multiply, Divide, ShiftFebruary 11, 2004John Kubiatowicz (www.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.2Review from last time°Design Process•Design Entry: Schematics, HDL, Compilers•High Level Analysis: Simulation, Testing, Assertions•Technology Mapping: Turn design into physical implementation•Low Level Analysis: Check out Timing, Setup/Hold, etc°Verilog – Three programming styles•Structural: Like a Netlist-Instantiation of modules + wires between them•Dataflow: Higher Level -Expressions instead of gates•Behavioral: Hardware programming-Full flow-control mechanisms-Registers, variables-File I/O, consol display, etc2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.3Verilog subtlety: Blocking Assignments° Blocking Assignments:• Assignments happen more like programming language (sequential code)• Both Right and left sides evaluated completely• Wait until assignment before going on- Can cause unexpected results when connecting output to logic in other always blocks.- Also a bit strange with delays on left hand side (LHS)° Example:reg E, C;always @(posedge clk)beginE = ~A;C = ~E;endAEC2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.4Verilog subtlety: Non-Blocking Assignments° Non-blocking Assignments:• All right-hand sides evaluated immediately• Then assignments occur• If no delays, often want output ports to be assigned with non-blocking assignments° Example:reg E, C;always @(posedge clk)beginE <= ~A;C <= ~E;endA CE2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.5Sequential Logic (Revisited: better scheduling)Must be careful mixing zero-time blocking assignments and edge-triggering: Probably won’t do what you expect when connecting it to other things!module FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) Q = #5 D;endmodule // FFGood: Outputs 5 units “after edge”module FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) Q <= D;endmodule // FFGood: Doesn’t output until “after edge”module FF (CLK,Q,D);input D, CLK;output Q; reg Q;always @ (posedge CLK) #5 Q = D;endmodule // FFProbably Not what you Expect:•Hold time of 5 units•glitches < 5 units ignored2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.6A final word on Verilog°Verilog does not turn hardware design into writing programs!•Since Verilog looks similar to programming languages, some think that they can design hardware by writing programs.-NOT SO.•Verilog is a hardware description language. -The best way to use it is to first figure out the circuit you want, then figure out how to describe it in Verilog.•The behavioral construct hides a lot of the circuit details but you as the designer must still manage:-the structure-data-communication-Parallelism-timing of your design. -Not doing so leads to very inefficient designs!°Read the document on non-blocking assignment in Verilog that I put up on the handouts page. Lots of very interesting things!2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.7How Program: FPGA Generic Design Flow°Design Entry:•Create your design files using:-schematic editor or -hardware description language (Verilog, VHDL)°Design “implementation” on FPGA:•Partition, place, and route (“PPR”) to create bit-stream file•Divide into CLB-sized pieces, place into blocks, route to blocks°Design verification:•Use Simulator to check function,•Other software determines max clock frequency.•Load onto FPGA device (cable connects PC to board)-check operation at full speed in real environment.2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.8Idealized FPGA Logic Block°4-input Look Up Table (4-LUT)•implements combinational logic functions°Register•optionally stores output of LUT•Latch determines whether read reg or LUT4-LUT FF10latchLogic Blockset by configuration bit-stream4-input "look up table"OUTPUTINPUTS2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.94-LUT Implementation°n-bit LUT is actually implemented as a 2nx 1 memory:•inputs choose one of 2n memory locations.•memory locations (latches) are normally loaded with values from user’s configuration bit stream.•Inputs to mux control are the CLB(Configurable Logic Block) inputs.°Result is a general purpose “logic gate”. •n-LUT can implement anyfunction of n inputs!latchlatchlatchlatch16 x 1mux16INPUTSOUTPUTLatches programmed as partof configuration bit-stream2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.10LUT as general logic gate°An n-lut as a direct implementation of a function truth-table°Each latch location holds value of function corresponding to one input combination0000 F(0,0,0,0)0001 F(0,0,0,1)0010 F(0,0,1,0)0011 F(0,0,1,1)0011010001010110011110001001101010111100110111101111INPUTSstore in 1st latchstore in 2nd latchExample: 4-lutExample: 2-lutORANDINPUTS11 1 110 0 101 0 100 0 0Implements any function of 2 inputs. How many functions of n inputs?2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.11Additional application: Distributed RAMRAM16X1SODWEWCLKA0A1A2A3RAM32X1SODWEWCLKA0A1A2A3A4RAM16X2SO1D0WEWCLKA0A1A2A3D1O0==LUTLUTorLUTRAM16X1DSPODWEWCLKA0A1A2A3DPRA0 DPODPRA1DPRA2DPRA3or°CLB LUT configurable as Distributed RAM•A LUT equals 16x1 RAM•Implements Single and Dual-Ports•Cascade LUTs to increase RAM size°Synchronous write°Synchronous/Asynchronous read•Accompanying flip-flops used for synchronous read2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.12Block RAM (Extra RAM not using LUTs)Block RAMSpartan-IIETrue Dual-PortBlock RAMPort APort B°Most efficient memory implementation•Dedicated blocks of memory°Ideal for most memory requirements•Virtex-E XCV2000 has 160? blocks-4096 bits per blocks (4K x 1, 2K x 4, 512 x 8, 256 x 16)•Use multiple blocks for larger memories°Builds both single and true dual-port RAMs°CORE Generator provides custom-sized block RAMs•Quickly generates optimized RAM implementation2/11/03 ©UCB Spring 2004CS152 / Kubiatowicz Lec6.13Additional Application: Shift Register°Each LUT can be configured as shift register•Serial in, serial out°Saves resources: can use less than 16 FFs°Faster: no routing°Note: CAD tools determine with CLB used as LUT, RAM, or shift register, rather than up to
View Full Document