System Level Tools to Accelerate FPGA Design for Signal Processing Jim Hwang Sr Manager DSP Software and Design Methodologies Why FPGA DSP High performance Flexibility Time to Market Functional extensions to existing equipment Standard part no NRE Inventory issues Early system bring up on hardware The Highly Parallel Signal Processor Switch Switch Matrix Matrix Embedded RISC CPU CLB CLB IOB IOB DCM DCM Programmable Fabric 300 MHz Synchronous DualPort RAM Up to 12 5 million gates BRAM Programmable I Os with LVDS 50 Multipliers 18b x 18b multiplier 300MHz pipelined Impedance Controller XCITE Impedance Control 3 125Gb Serial Exploiting Parallelism Conventional DSP Solutions New DSP architectures such as VLIW and super scalar have one goal provide higher degrees of parallelism Architecture evolution along this design axis does not scale Too many MAC functional units makes programming compilers and scheduling difficult The effective computing per chip area decreases Memories grow geometrically while the datapath does not DSP Systems in FPGAs Device technology is only part of the solution The software and IP are complex and historically have been a barrier to entry Require design methodologies for Productivity Rapid design exploration Hardware abstraction Single source for the entire design development cycle Modeling Verification Implementation Automatic code generation FPGA as DSP Platform Like other ASICs the FPGA is largely a value proposition for very high performance applications Digital communications infrastructure Software defined radio Video imaging SAR adaptive arrays beamforming Spartan II family devices counter this trend This has profound implications for design methodology DSP microprocessors are very good at tasks where performance is not a problem Not targeting most of the tera bytes of legacy DSP code yet What Language Do you use MATLAB Why do we like it Easy to learn Interpreted High level abstractions Extensive libraries and built in functions Rich facilities for data analysis and visualization Used in many signal processing textbooks But is it a good language for specifying hardware Imperative language with sequential semantics No concurrency model Dynamically typed flexible but System C Considerable activity advocating imperative sequential languages for system level design C C SystemC Co Ware Synopsys Cadence This is not a bad thing for Embedded systems High level e g untimed untyped functional modeling Validation of complex systems This is not a good thing for Hardware description High performance DSP system design System C Language designers give a lot of thought to semantics C C semantics derived from microprocessor considerations A good language for hardware must model concurrency Object oriented principles are not a cure for semantic flaws Hardware synthesis from C is not a solved problem High performance circuits carefully tuned to target technology All C based design flows depend on design iteration to the point of RTL code before synthesis At this level of abstraction C C becomes contorted VHDL may not be beautiful but it models concurrency well A 1 Fallacy Premise software is easier than hardware consequently systems should be specified in the language of software engineers Empirical evidence to contrary Software products invariably ship with more bugs than hardware products There are more software engineers at Xilinx than hardware engineers Conclusion do not assume that imperative sequential software languages are best suited for DSP hardware and system specification 1Observed by Bob Broderson UCB A Picture is Worth Classical DSP algorithm description Block diagrams Signal flow graphs Inherently concurrent Visual languages and development environments Synchronous Data Flow Ptolemy UCB SPW Cadence Simulink MathWorks A good match Simulink Graphical simulation environment Continuous and discrete time dynamical systems Well suited for modeling hardware and getting better Block libraries for DSP communications image processing digital control and much more Open architecture Extensible Public APIs Amenable to programming in C C Java MATLAB inside and underneath The implications should not be underestimated System Generator for DSP Xilinx software for FPGA modeling implementation FPGA interfaces provided in Simulink environment Libraries of functions for modeling DSP and other systems Automatic code generation of FPGA circuits Fast on ramp into the FPGA System level abstractions create new opportunities in the lab Explore architectures for DSP algorithms Implementation issues e g quantization pipelining Emphasize system level test and test bench methodologies Actually run the system in silicon Advanced Hardware Lab Circa 1984 Got wirewrap Advanced Hardware Lab 2002 xc2v2000e FPGA Got System Generator System Generator for DSP Visual data flow paradigm Polymorphic block libraries Bit and cycle true modeling Seamlessly integrated with Simulink and MATLAB Test bench and data analysis Automatic code generation Synthesizable VHDL IP cores HDL test bench Project and constraint files System Generator cont Supports common Simulink idioms Data type propagation Polymorphic blocks Sample time propagation Block customization MATLAB hooks SysGen Modeling Analysis Observing quantization effects All fixed point data carries floating point values as well Data Narrowing Quantization Error Block SysGen Modeling Automatic test bench generation for HDL simulation Behavioral Post mapping Post place routing Log test vectors at input and outputs Pull into generated HDL test bench A Simple MAC Engine Use MATLAB to customize the data path yn N 1 h x i n i i 0 N 1 Select precision to avoid overflow Minimize resources required hixn i Parametric implementation Fine tune the accumulator to avoid stall 2 i 0 m N 1 h i i 0 AccumWidth m log 2 hi CORDIC Processor Versatile family of algorithms for computing functions Arctan square root division logarithm Shift and add architecture Well suited to FPGA implementation Iteration Equations xi 1 xi yi yi 1 yi xi zi 1 zi i tan 1 2 i y 0 i 1 1 CORDIC Processor cont Output precision depends on the number of PEs Trade area for precision Use Matlab to parameterize the processor Propagate parameters down to blocks Data width Constant values Conditionally and iteratively instance PEs LMS Adaptive Filter xn yn W z n dn Yn 1 XnTWn Wn 1 Wn 2 nXn Sum of product calculations can be computed in parallel Pipelined LMS Algorithm De couple the LMS update and FIR
View Full Document
Unlocking...