U of U CS 3710 - HDL Coding Practices to Accelerate Design Performance

Unformatted text preview:

HDL Coding Practices to Accelerate Design PerformanceUse of Resets and PerformanceSRLsMultipliers and RAMsGeneral LogicUse Adder Chains Instead of Adder TreesMaximize Block RAM PerformanceGeneral Use of RegistersInference vs. InstantiationClock Enable and Gated ClocksNested If-Then-Else, Case Statements, and Combinatorial For-LoopsHierarchyConclusionAdditional ResourcesRevision HistoryWP231 (1.1) January 6, 2006 www.xilinx.com 1© 2005–2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. One of the most important factors in getting themaximum performance from any FPGA design isproper coding of the design’s RTL description.Certain seemingly minor decisions made whilecrafting an RTL-level design can mean the differencebetween a design operating at less than 100 MHzand one operating at more than 400 MHz. Dependable design performance is the result ofcareful consideration of many factors during thedesign process. First, the hardware platform thatbest suits the design must be selected. Next, theselected device architecture and the settings andfeatures of the implementation tools need to bestudied. Lastly, and this is the purpose of thisdocument, HDL code that maps efficiently onto thetargeted device must be written. Different resourcesdetailing each of these subjects can be found on theweb. This document focuses on the latter bypresenting coding styles and tips to accelerate designperformance. Proper FPGA coding practices arereiterated, and the lesser known techniques directlyapplicable to the latest Xilinx FPGA architectures arepresented.White Paper: Virtex-4, Spartan-3/3L, and Spartan-3E FPGAsWP231 (1.1) January 6, 2006HDL Coding Practices to Accelerate Design PerformanceBy: Philippe Garrault and Brian PhilofskyR2 www.xilinx.com WP231 (1.1) January 6, 2006White Paper: HDL Coding Practices to Accelerate Design PerformanceRUse of Resets and PerformanceFew system-wide choices have as profound an effect on performance, area, and power as the choice of the reset. Some system architects specify the use of a global asynchronous reset for the system for the sole purpose of circuit initialization at power-up. This is, however, not necessary for FPGA designs. With Xilinx FPGA architectures, the use of a reset and the type of reset can have serious implications on the design performance. Sub-optimal reset strategies can:• prevent the use of a device library component, such as shift register look-up table (SRL)• prevent the use of synchronous elements of dedicated hardware blocks• prevent optimizations of the logic inside the fabric• severely constrain placement and routing because reset signals often have high fanout SRLsAll current Xilinx FPGA architectures have the capability to configure the look-up table (LUT) element as logic, ROM/RAM, or SRL. Synthesis tools can infer the use of any one of these structures from RTL code; however, in order to use performance-optimized shift register SRL, a reset cannot be described in the code because the SRL library component does not have a reset. Using resets in code that infers shift registers requires either several flip-flops or additional logic around the SRL to allow a reset function. As illustrated in Figure 1, code without resets on shift registers generally produces a single register on the output, which is optimal for area and performance. The effect on area and power is more obvious when using a reset versus not using one, but the affect on performance is a little less clear. When building a shift register out of flip-flops, performance of the shift register is generally not going to be critical because the timing path between registers (clock-to-out of a flip-flop, the associated delay in routing, and the setup time of the next flip-flop) is not normally long enough to be the longest path in the design. The added consumption of resources (flip-flops and routing), however, can have a negative influence on the placement and routing choices for other portions of the design, possibly resulting in longer routing delays for other paths in the design. In the case of adding additional logic to the SRL to emulate a reset function, a portion of this logic appears on the clock-to-out of the SRL, increasing the time it takes for the data to reach its destination logic, thus reducing performance. Tips• Avoid resets on shift registers because it prevents inference of area and performance optimized SRL library cells.Figure 1: Performance-Optimized Shift RegisterDQQ15AddressCE (Write Enable)CLKSRLC16EQDSynchronous OutputWP231_01_110905FFWhite Paper: HDL Coding Practices to Accelerate Design PerformanceWP231 (1.1) January 6, 2006 www.xilinx.com 3RMultipliers and RAMsAll current Xilinx FPGA architectures contain dedicated arithmetic resources. Such resources can be used to perform multiplication, as in many DSP algorithms, but can also be used in other applications, e.g., barrel shifters.Similarly, almost every FPGA design uses RAM of various sizes, regardless of the application. All current Xilinx FPGAs contain block RAM elements that can be implemented as RAM, ROM, a large LUT, or even general logic. Using both the multipliers and RAM resources can result in more compact and higher performing designs. The choice of the reset type can impact the design in terms of performance. Both multiplier blocks and RAM registers contain only synchronous resets; if an asynchronous reset is coded for these functions, the registers within these blocks cannot be used. This has a severe effect on performance. For example, using a fully pipelined multiplier targeting a fastest Virtex™-4 device with an asynchronous reset can result in a performance of around 200 MHz. Reworking the code to use a synchronous reset can more than double the performance to 500 MHz. Similar to the multipliers, Virtex-4 block RAMs have optional registers. When these output registers are used, they can reduce the clock-to-out times of the RAMs and increase overall design speed. These optional registers do not have reset ports; consequently, the output registers cannot be enabled if the code describes a reset behavior. A secondary issue arises when using the RAMs as a LUT or general logic. At times, it is advantageous for both area and performance reasons to condense several LUTs, configured as ROM or general logic, into a single


View Full Document
Download HDL Coding Practices to Accelerate Design Performance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HDL Coding Practices to Accelerate Design Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HDL Coding Practices to Accelerate Design Performance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?