6.111 Fall 2004 Lecture 13, Slide 1L13: Reconfigurable Logic ArchitecturesAcknowledgements:R. Katz, “Contemporary Logic Design”, Addison Wesley Publishing Company, Reading, MA, 1993.Frank Honore6.111 Fall 2004 Lecture 13, Slide 2History of Computational Fabrics Discrete devices: relays, transistors (1940s-50s) Discrete logic gates (1950s-60s) Integrated circuits (1960s-70s) e.g. TTL packages: Data Book for 100’s of different parts Gate Arrays (IBM 1970s) Transistors are pre-placed on the chip & Place and Route software puts the chip together automatically – only program the interconnect (mask programming) Software Based Schemes (1970’s- present) Run instructions on a general purpose core ASIC Design (1980’s to present) Turn Verilog directly into layout using a library of standard cells Effective for high-volume and efficient use of silicon area Programmable Logic (1980’s to present) A chip that be reprogrammed after it has been fabricated Examples: PALs, EPROM, EEPROM, PLDs, FPGAs Excellent support for mapping from Verilog6.111 Fall 2004 Lecture 13, Slide 3Reconfigurable Logic• Logic blocks– To implement combinationaland sequential logic• Interconnect– Wires to connect inputs andoutputs to logic blocks• I/O blocks– Special logic blocks at periphery of device forexternal connections• Key questions:– How to make logic blocks programmable?(after chip has been fabbed!)– What should the logic granularity be?– How to make the wires programmable?(after chip has been fabbed!)– Specialized wiring structures for localvs. long distance routes?– How many wires per logic block?LogicLogicConfigurationInputsOutputsnmQQSETCLRD6.111 Fall 2004 Lecture 13, Slide 4Programmable Array Logic (PAL)• Based on the fact that any combinational logic can be realized as a sum-of-products• PALs feature an array of AND-OR gates with programmable connectionsinputsignalsoutputsignalsprogramming of product termsprogramming of sum termsANDarrayOR array6.111 Fall 2004 Lecture 13, Slide 5Cypress PAL CE22V106.111 Fall 2004 Lecture 13, Slide 6Inside the 22v10 PAL• Each input pin (and its complement) sent to the AND array• OR gates for each output can take 8-16 product terms, depending on output pin• “Macrocell” block provides additional output flexibility...Fixed OR array (not programmable)6.111 Fall 2004 Lecture 13, Slide 7Inside the 22v10 “Macrocell” Block• Outputs may be registered or combinational, positive or inverted• Registered output may be fed back to AND array for FSMs, etc.From Lattice Semiconductorb. Combinational/active lowd. Combinational/active highCombinational/active lowCombinational/active high6.111 Fall 2004 Lecture 13, Slide 8RAM Based Field Programmable Logic -XilinxCLBCLBCLBCLBSwitchMatrixProgrammableInterconnectI/O Blocks (IOBs)ConfigurableLogic Blocks (CLBs)D QSlewRateControlPassivePull-Up,Pull-DownDelayVccOutputBufferInputBufferQ DPad DQSDRDECS/RContr olDQSDRDECS/RContro l11F'G'H'DINF'G'H'DINF'G'H'H'HFunc.Gen.GFunc.Gen.FFunc.Gen.G4G3G2G1F4F3F2F1C4C1C2C3 KYX H1 DIN S/R EC6.111 Fall 2004 Lecture 13, Slide 9The Xilinx 4000 CLB6.111 Fall 2004 Lecture 13, Slide 10Two 4-input Functions, Registered Outputand a Two Input Function6.111 Fall 2004 Lecture 13, Slide 115-input Function, Combinational Output6.111 Fall 2004 Lecture 13, Slide 12LUT Mapping• N-LUT direct implementation of a truth table: any function of n-inputs.• N-LUT requires 2Nstorage elements (latches)• N-inputs select one latch location (like a memory)4LUT exampleLatches set by configuration bitstreamInputsOutputWhy Latches and Not Registers?6.111 Fall 2004 Lecture 13, Slide 13Configuring the CLB as a RAMMemory is built using Latches not FFsRead is same a LUT Function!16x26.111 Fall 2004 Lecture 13, Slide 14Xilinx 4000 Interconnect6.111 Fall 2004 Lecture 13, Slide 15Xilinx 4000 Interconnect DetailsWires are not ideal!6.111 Fall 2004 Lecture 13, Slide 16Add Bells & WhistlesHardProcessor I/OBRAMGigabit SerialMultiplierProgrammableTerminationZVCCIOZZImpedanceControlClockMgmt18 Bit18 Bit36 BitCourtesy of David B. Parlour, ISSCC 2004 Tutorial, “The Reality and Promise of Reconfigurable Computing in Digital Signal Processing”6.111 Fall 2004 Lecture 13, Slide 17Xilinx 4000 Flexible IOBAdjust Transition TimeAdjust the Sampling EdgeOutputs through FF or bypassed6.111 Fall 2004 Lecture 13, Slide 18The Virtex II CLB (Half Slice Shown)6.111 Fall 2004 Lecture 13, Slide 19Adder ImplementationY = A ⊕ B ⊕ CinABCinCoutLUT: A⊕B1 half-Slice = 1-bit adderDedicated carry logic6.111 Fall 2004 Lecture 13, Slide 20Carry Chain1 CLB = 4 Slices = 2, 4-bit adders64-bit Adder: 16 CLBs+CLB15CLB0A[3:0]B[3:0]A[63:60]B[63:60]A[63:0]B[63:0]Y[63:0]Y[3:0]Y[63:60]Y[64]CLBs must be in same columnCLB1A[7:4]B[7:4]Y[7:4]6.111 Fall 2004 Lecture 13, Slide 21Virtex II FeaturesDouble Data Rate registersDigital Clock ManagerEmbedded MultiplierBlock SelectRAM6.111 Fall 2004 Lecture 13, Slide 22The Latest Generation: Virtex-II ProCourtesy XilinxHigh-speed I/OEmbedded PowerPcEmbedded memoriesHardwired multipliersFPGA Fabric6.111 Fall 2004 Lecture 13, Slide 23Altera FLEX 10K Family8 LE’s per LABSRAM-based programming6.111 Fall 2004 Lecture 13, Slide 24Altera Logic ElementThe use of cascade chain6.111 Fall 2004 Lecture 13, Slide 25FLEX 10K Logic Array BlockFLEX 10K70: 9 rows (312 chan/row), 52 columns (24 chan/col)6.111 Fall 2004 Lecture 13, Slide 26FLEX 10K Embedded Array Block6.111 Fall 2004 Lecture 13, Slide 27Altera’s New Stratix ArchitectureEmbedded DSP feature: 9x9, 18x18, 36x36 with 52-bit accumulatorUp to 11,310 LE’s, 10Mbits RAM10 LE’s per LAB6.111 Fall 2004 Lecture 13, Slide 28Design Flow - Mapping• Technology Mapping: Schematic/HDL to Physical Logic units• Compile functions into basic LUT-based groups (function of target architecture)always @(posedge Clock or negedge Reset)beginif (! Reset)q <= 0;elseq <= (a & b & c) | (b & d);endQQSETCLRDLUTQQSETCLRDabcdb6.111 Fall 2004 Lecture 13, Slide 29Design Flow – Placement & Route• Placement – assign logic location on a particular device LUTLUTLUT Routing – iterative process to connect CLB inputs/outputs and IOBs. Optimizes critical path delay – can take hours or days for large, dense designsIterate placement if timing not metSatisfy timing? Æ Generate Bitstream to config deviceChallenge! Cannot use full chip for reasonable speeds (wires are not ideal). Typically no more than 50%
View Full Document