VLSI Systems Design CS250 Fall 2020 John Wawrzynek with Arya Reais Parsi Lecture 04 Recon gurable Architectures 2 CS250 UC Berkeley Fall 20 FPGA Overview Basic idea two dimensional array of logic blocks and ip ops with a means for the user to con gure program 1 the interconnection between the logic blocks 2 the function of each block Simpli ed version of FPGA internal architecture not scalable Lecture 04 Recon gurable Architectures 2 2 CS250 UC Berkeley Fall 20 Recon gurable Fabric Architecture Degrees of Freedom 1 Logic Blocks 2 Capacity and internal structure of combination logic circuits and state element s Clustering and internal interconnect Interconnection Network Architecture Circuit switched not packet switched Topology of network 3 Con guration Architecture how is programming information loaded and distributed con guration depth 4 Hard blocks RAM ALUs Processor Cores Function s count and how integrated into the fabric Lecture 04 Recon gurable Architecture 2 3 CS250 UC Berkeley Fall 20 Xilinx Virtex 5 Colors represent different types of resources Logic Block RAM DSP ALUs Clocking I O Serial I O PCI A routing fabric runs throughout the chip to wire everything Spring 2013 together Lecture 04 Recon gurable Architecture 2 EECS150 Lec02 SDS FPGAs 4 Page 64 CS250 UC Berkeley Fall 20 Con gurable Logic Blocks CLBs Slices define regular connections to the switching fabric and to slices in CLBs above and below it on the die Lecture 04 Recon gurable Architecture 2 5 CS250 UC Berkeley Fall 20 Primitive 5 input Look Up Tables LUTs A 6 2 D Q 1 Q 0 Q 1 Q 0 Q 0 Q 1 A 6 2 D 1 00000 0 00001 1 00010 11101 11110 11111 0 0 1 Computes any 5 input logic function Timing is independent of function Latches set during configuration Lecture 04 Recon gurable Architecture 2 6 CS250 UC Berkeley Fall 20 Virtex 6 LUTs Composition of 5 LUTs May be used as one 6 input LUT D6 out or as two 5 input LUTS D6 and D5 Lecture 04 Recon gurable Architecture 2 7 CS250 UC Berkeley Fall 20 The simplest view of a slice Four 6 LUTs Four Flip Flops Switching fabric may see combinational and registered outputs An actual Virtex slice adds many small features to this simplified diagram We show them one by one Spring 2013 Lecture 04 Recon gurable Architecture 2 8 CS250 UC Berkeley Fall 20 Two 7 LUTs per slice Extra multiplexers F7AMUX F7BMUX Extra inputs AX and CX Lecture 04 Recon gurable Architecture 2 9 CS250 UC Berkeley Fall 20 Or one 8 LUTs per slice Third multiplexer F8MUX Third input BX Spring 2013 Lecture 04 Recon gurable Architecture 2 10 CS250 UC Berkeley Fall 20 Extra muxes to chose LUT option From eight 5 LUTs to one 8 LUT Combinational or registered outs Flip flops unused by LUTs can be used standalone Spring 2013 Lecture 04 Recon gurable Architecture 2 11 CS250 UC Berkeley Fall 20 Virtex Vertical Logic We can map ripple carry addition onto carry chain block The carry chain block also useful for speeding up other adder structures and counters Spring 2013 Lecture 04 Recon gurable Architecture 2 12 CS250 UC Berkeley Fall 20 Putting it all together a SLICEL The previous slides explain all SLICEL features About 50 of the are SLICELs The other slices are SLICEMs and have extra features Spring 2013 EECS150 Lec02 SDS FPGAs Lecture 04 Recon gurable Architecture 2 13 CS250 UC Berkeley Fall 20 Recall 5 LUT architecture Q 1 Q 0 Q 1 Q 0 Q 0 Q 1 A 6 2 D 1 00000 0 00001 1 00010 11101 11110 11111 0 0 1 32 Latches Configured to 1 or 0 Some parts of a logic design need many state elements A 6 2 D SLICEMs replace normal 5 LUTs with circuits that can act like 5 LUTs but can alternatively use the 32 latches as RAM ROM shift registers Lecture 04 Recon gurable Architecture 2 14 CS250 UC Berkeley Fall 20 A SLICEM 6 LUT Memory data input Normal 6 LUT inputs Memory write address EE141 Normal 5 6 LUT outputs Memory data input Control output for chaining LUTs to make larger memories 15 Synchronous write asychronous read SLICEL vs SLICEM SLICEL SLICEM SLICEM adds memory features to LUTs muxes EE141 Page 16 Distributed RAM Primitives All are built from a single slice or less Remember though that the SLICEM LUT is naturally only 1 read and 1 write port EE141 17 Con gurable Interconnect Design Challenges topology traversing long wires incurs delay and energy switches transistors add signi cant delay Mapping time switch matrix could be more richly populated connection block Lecture 04 Recon gurable Architecture 2 18 CS250 UC Berkeley Fall 20 Xilinx FPGAs tile interconnect detail Lecture 04 Recon gurable Architecture 2 19 CS250 UC Berkeley Fall 20 Other Topologies Traditional From exlogic Inc Clos Network uses about half the area of the traditional interconnect and uses only 5 7 metal routing layers Lecture 04 Recon gurable Architecture 2 20 CS250 UC Berkeley Fall 20 Fat Tree Based Interconnect Use Rent s rule for proper thickness Lecture 04 Recon gurable Architecture 2 21 CS250 UC Berkeley Fall 20 Embedded Hard Blocks Many important functions are not e cient when implemented in the recon gurable fabric multiplication large memory processor cores Dedicated blocks take relatively little area and therefore could go unused Lecture 04 Recon gurable Architecture 2 22 CS250 UC Berkeley Fall 20 Xilinx Virtex 5 Colors represent different types of resources Logic Block RAM DSP ALUs Clocking I O Serial I O PCI A routing fabric runs throughout the chip to wire everything Spring 2013 together Lecture 04 Recon gurable Architecture 2 EECS150 Lec02 SDS FPGAs 23 Page 64 CS250 UC Berkeley Fall 20 Virtex DSP48E Slice Efficient implementation of multiply add bit wise logical Lecture 04 Recon gurable Architecture 2 24 CS250 UC Berkeley Fall 20 Block RAM Overview 36K bits of data total can be configured as 2 independent 18Kb RAMs or one 36Kb RAM 64Kx1 when cascaded with an adjacent 36Kb block RAM Each 36Kb block RAM can be configured as 32Kx1 16Kx2 8Kx4 4Kx9 2Kx18 or 1Kx36 memory Each 18Kb block RAM can be configured as 16Kx1 8Kx2 4Kx4 2Kx9 or 1Kx18 memory Write and Read are synchronous operations The two ports are symmetrical and totally independent can have different clocks sharing only the stored data Each port can be configured in one of the available widths independent of the other port The read port width can be different from the write port width for each port The memory content can be initialized or cleared by the configuration bitstream EE141 25 Ultra RAM Blocks EE141 26 State of the Art Xilinx FPGAs Virtex Ultra scale Lecture 04 Recon gurable
View Full Document