ISU CPRE 583 - ShiWal95A - D445712

Home> Schools> Iowa State University> Computer Engineering (CPRE) > CPRE 583> ShiWal95A

ISU CPRE 583 - ShiWal95A

Pages 8

Download Save

Unformatted text preview:

AbstractMany algorithms rely on floating point arithmeticfor the dynamic range of representations and require mil-lions of calculations per second. Such computationallyintensive algorithms are candidates for acceleration usingcustom computing machines (CCMs) being tailored for theapplication. Unfortunately, floating point operatorsrequire excessive area (or time) for conventional imple-mentations. Instead, custom formats, derived for individ-ual applications, are feasible on CCMs, and can beimplemented on a fraction of a single FPGA. Usinghigher-level languages, like VHDL, facilitates the devel-opment of custom operators without significantly impact-ing operator performance or area. Properties, includingarea consumption and speed of working arithmetic opera-tor units used in real-time applications, are discussed.1.0 IntroductionUntil recently, any meaningful floating point arith-metic has been virtually impossible to implement onFPGA based systems due to the limited density and speedof older FPGAs. In addition, mapping difficultiesoccurred due to the inherent complexity of floating pointarithmetic. With the introduction of high level languagessuch as VHDL, rapid prototyping of floating point unitshas become possible. Elaborate simulation and synthesistools at a higher design level aid the designer for a morecontrollable and maintainable product. Although lowlevel design specifications were alternately possible, thestrategy used in the work presented here was to specifyevery aspect of the design in VHDL and rely on automatedsynthesis to generate the FPGA mapping.Image and digital signal processing applicationstypically require high calculation throughput [2,6]. Thearithmetic operators presented here were implemented forreal-time signal processing on the Splash-2 CCM, whichinclude a 2-D fast Fourier transform (FFT) and a systolicarray implementation of a FIR filter. Such signal process-ing techniques necessitate a large dynamic range of num-bers. The use of floating point helps to alleviate theunderflow and overflow problems often seen in fixed pointformats. An advantage of using a CCM for floating pointimplementation is the ability to customize the format andalgorithm data flow to suit the application’s needs.This paper examines the implementations of vari-ous arithmetic operators using two floating point formatssimilar to the IEEE 754 standard [5]. Eighteen and sixteenbit floating point adders/subtracters, multipliers, anddividers have been synthesized for Xilinx 4010 FPGAs[8]. The floating formats used are discussed in Section 2.Sections 3, 4, and 5 present the algorithms, implementa-tions, and optimizations used for the different operators.Finally a summary, in terms of size and speed, of the dif-ferent floating point units is given Section 6.2.0 Floating Point Format RepresentationThe format which was used is similar to the IEEE754 standard used to store floating point numbers. Forcomparison purposes, single precision floating point usesthe 32 bit IEEE 754 format shown in Figure 1.Quantitative Analysis of Floating Point Arithmetic on FPGA Based CustomComputing MachinesNabeel Shirazi, Al Walters, and Peter AthanasVirginia Polytechnic Institute and State UniversityDepartment of Electrical EngineeringBlacksburg, Virginia [email protected] fFigure 1: 32 Bit Floating Point Format.Bit #: 31 30 23 22 0Presented at the IEEE Symposium on FPGAs for Custom Computing MachinesNapa Valley, California, April 1995The floating point value (v) is computed by:In Figure 1, the sign field, s, is bit 31 and is used to spec-ify the sign of the number. Bits 30 down to 23 are theexponent field. This 8 bit quantity is a signed number rep-resented by using a bias of 127. Bits 22 down to 0 areused to store the binary representation of the floating pointnumber. The leading one in the mantissa, 1.f, does notappear in the representation, therefore the leading one isimplicit. For example, -3.625 (dec) or -11.101 (binary) isstored in the following way: v = -11 2(128-127)1.1101 where:s = 1, e = 128 (dec) 80 (hex), and f = 680000 (hex).Therefore -3.625 is stored as: C0680000 (hex).The 18-bit floating point format was developed, inthe same manner, for the 2-D FFT application[6]. The for-mat was chosen to accommodate two specific require-ments: (1) the dynamic range of the format needed to bequite large in order to represent very large and small, posi-tive and negative real numbers accurately, and (2) the datapath width into one of the Xilinx 4010 processors ofSplash-2 is 36 bits wide and two operands were needed tobe input on every clock cycle. Based on these require-ments the format in Figure 2 was used.The 18 bit floating point value (v) is computed by:The range of real numbers that this format can represent is x 1019 to x 10-19.The second floating point format investigated was a16-bit representation used by the FIR filter application [7].Like the FFT application, since multiple arithmetic opera-tions needed to be done on a single chip, we chose a 16-bitformat for two reasons: (1) local, 16-bit wide memorieswere used in pipelined calculations allowing single readcycles only, and (2) more logic was necessary to imple-ment the FIR taps in addition to the two arithmetic units.which do complex number operations. The format wasdesigned as a compromise between data width and a largeenough dynamic number range. The 16-bit format isshown in Figure 3.The 16 bit floating point value (v) is computed by:The range of real numbers that this 16 bit format can rep-resent is x 109 to x 10-10.3.0 Floating-Point Addition and SubtractionThe aim in developing a floating point adder/sub-tracter routine was to pipeline the unit in order to producea result every clock cycle. By pipelining the adder, thespeed increased, however, the area increased as well. Dif-ferent coding structures were tried in the VHDL code usedto program the Xilinx chips in order to minimize size.3.1 AlgorithmThe floating-point addition and subtraction algorithmstudied here is similar to what is done in most traditionalprocessors, however, the computation is performed inthree stages and is presented in this section. The notationsi, ei and fi are used to represent the sign, exponent andmantissa fields of the floating point number, vi. A blockdiagram of the

View Full Document


School:
Email:
New Password:
Confirm Password:

ISU CPRE 583 - ShiWal95A

Sign up for free to view:

Please select your school