High-Performance Carry Chains for FPGAs

Home> Academic Documents> High-Performance Carry Chains for FPGAs

DOC PREVIEW

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Main PageFPGA98Front MatterTable of ContentsSession IndexHigh-Performance Carry Chains for FPGAsScott Hauck, Matthew M. Hosler, Thomas W. Fry{hauck, mhosler, zaphod}@ece.nwu.eduDepartment of Electrical and Computer EngineeringNorthwestern UniversityEvanston, IL 60208-3118 USAAbstractCarry chains are an important consideration for mostcomputations, including FPGAs. Current FPGAs dedicatea portion of their logic to support these demands via asimple ripple carry scheme. In this paper we demonstratehow more advanced carry constructs can be embedded intoFPGAs, providing significantly higher performance carrycomputations. We redesign the standard ripple carry chainto reduce the number of logic levels in each cell. We alsodevelop entirely new carry structures based on highperformance adders such as Carry Select, CarryLookahead, and Brent-Kung. Overall, these optimizationsachieve a speedup in carry performance of 3.8 times overcurrent architectures.IntroductionAlthough originally intended as a way to efficiently handlerandom logic tasks in standard hardware systems, FPGAshave become the driving force behind a new computingparadigm. By mapping algorithms to these FPGAssignificant performance benefits can be achieved.However, in order to achieve these gains the FPGAresources must be able to efficiently support thecomputations required in the target application.The key to achieving high performance hardware is tooptimize the circuit’s critical path. For most datapathcircuits this critical path goes through the carry chain usedin arithmetic and logic operations. In an arithmetic circuitsuch as an adder or subtractor, this chain represents thecarries from bit position to bit position. For logicaloperations such as parity or comparison, the chaincommunicates the cumulative information needed toperform these computations. Optimizing such carry chainsis a significant area of VLSI design, and is a major focus ofhigh-performance arithmetic circuit design.In order to support datapath computations most FPGAsinclude special resources specifically optimized forimplementing carry computations. These resourcessignificantly improve circuit performance with a relativelyinsignificant increase in chip area. However, because theseresources use a relatively simple ripple carry scheme, carrycomputations can still be a major performance bottleneck.In this paper we will discuss methods for significantlyimproving the performance of carry computations inFPGAs.Basic Ripple Carry CellA basic ripple carry cell, similar to that found in the Altera8000 series FPGAs [1], is shown in Figure 1a. Mux 1,combined with the two 2-LUTs feeding into it, creates a 3-LUT. This element can produce any Boolean function ofits three inputs. Two of its inputs (X and Y) form theprimary inputs to the carry chain. The operands to thearithmetic or logic function being computed are sent in onthese inputs, with each cell computing one bit position’sresult. The third input can be either another primary input(Z), or the carry from the neighboring cell, depending onthe programming of mux 2’s control bit. The potential tohave Z replace the carry input is provided so that an initialcarry input can be provided to the overall carry chain(useful for incrementers, combined adder/subtractors, andother functions). Alternatively the logic can be used as astandard 3-LUT for functions that do not need a carrychain. An additional 3-LUT (not shown in the figure) iscontained in each cell, which can be used to compute thesum for addition, or other functions.Before we discuss modifications to this adder to improveperformance, it is important to understand the role of the“Cout1” and “Cout0” signals in the carry chain. Duringcarry computations the Cin input controls mux 1, whichchooses which of these two signals will be the Cin for thenext stage in the carry chain. If Cin is true, Cout = Cout1,while if Cin is false Cout = Cout0. Thus, Cout1 is theoutput whenever Cin = 1, while Cout0 is the outputwhenever Cin = 0. If we consider the possiblecombinations of values Cout1 and Cout0 can assume, thereare four possibilities, three of which correspond toconcepts from standard adders (Table 1). If both Cout0and Cout1 are true, Cout is true no matter what Cin is,which is the same as the “generate” state in a standardadder. Likewise, when both Cout0 and Cout1 are false,Cout is false regardless of the state of Cin, and thiscombination of Cout1 and Cout0 signals is the “kill” statefor this carry chain. If Cout0 and Cout1 are different, theCout output will depend on the Cin input. When Cout0 =0 and Cout1 =1, the Cout output will be identical to the Cininput, which is the normal “propagate” state for this carrychain. The last state, with Cout0 = 1 and Cout1 = 0, is notfound in normal adders. In this state, the output stilldepends on the input, but in this case the Cout output is theinverse of the Cin input. We will call this state “inversepropagate”.For a normal adder, the inverse propagate state is neverencountered. Thus, it might be tempting to disallow thisstate. However, for other computations this state isessential. For example, consider implementing a paritycircuit with this carry chain, where each cell takes the XORof the two inputs, X and Y, and the parity of theneighboring cell. If X and Y are both zero, the Cout of thecell will be identical to the parity of the neighboring cell,which is brought in on the Cin signal. Thus, the cell is innormal propagate mode. However, if X is true and Y isfalse, then the Cout will be the opposite of Cin, since()CinCin =⊕⊕ 01 . Thus, the inverse propagate state isimportant for implementing circuits like parity, and thussupporting this state in the carry chain we increase thetypes of circuits that can be efficiently implemented. Infact, by allowing an Inverse Propagate mode in the carrychain, the chain can be viewed as simply a series of 3-LUTs connected together, allowing any critical path to beimplemented efficiently.Cout0 Cout1 Cout Name0 0 0 Kill0 1 Cin Propagate10CinInverse Propagate1 1 1 GenerateTable 1. Combinations of Cout0 and Cout1values, and the resulting carry output. The finalcolumn lists the name for that combination.One last issue must be considered in this carry chainstructure. In an FPGA, the cells represent resources thatcan be used to compute arbitrary functions. However, thelocation of functions within this structure is completely upto the


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

Please select your school