Fast Fourier Transform:Practical aspects and Basic ArchitecturesMultiplication complexity per output pointMultiplies and addsStructural considerationsInverse FFTIn-place computationRegularity, parallelismQuantization noiseParticular casesDFT pruningRelated transformsRelated transforms: DHTRelated transforms: DCTRelationship with FFTImplementation issuesImplementation on general purpose computersDigital Signal ProcessorsVector and multi-processorsASICsArchitecturesThe naive approach1 multiply-add cellCascade FFTFFT networkFFT networkThe perfect-shuffle networkThe Mesh implementationPerformance summaryReadings6.973 Communication System Design – Spring 2006Massachusetts Institute of TechnologyVladimir StojanovićCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Fast Fourier Transform:Practical aspects and Basic ArchitecturesLecture 9Multiplication complexity per output point CTFFT and SRFFTCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System Design2CTFFT and SRFFT0.01.02.03.04.05.06.07.08.09.010.03 4 5 6 7 8 9 10n = log2 NM/N- radix 2- radix 4- split-radix- lower boundMrMcFigure by MIT OpenCoursWare.3Multiplies and addsReal multipliesReal adds Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System DesignN Radix 2Radix 2Radix 4Radix 4SRFFTSRFFTPFAPFAWinogradWinograd16326412825651210242048163264128256512102420483060120240504100825203060120240504100825202488264712180043601024823560202081392785620686819651612843076717216388136276632157235489492N15240810322504589613566307286861614897654882833614838896423085380122922765261444384888207648121338829548840763848882076501614540346689962810020046011002524580417660Figure by MIT OpenCourseWare.Structural considerations How to compare different FFT algorithms? Many metrics to choose from The ease of obtaining the inverse FFT In-place computation Regularity Computation Interconnect Parallelism and pipelining Quantization noiseCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System Design4Inverse FFT FFTs often used for computing FIR filtering Fast convolution (FFT + pointwise multiply + IFFT) In some applications (like 802.11a) Can reuse FFT block to do the IFFT (half-duplex scheme) Simple trick [Duhamel88] Swap the real and imaginary inputs and outputs If FFT(xR,xI,N) computes the FFT of sequence xR(k)+jxI(k) Then FFT(xI,xR,N) computes the IFFT of jxR(k)+xI(k)Exchange the real and imag partCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System Design5In-place computation Most algorithms allow in-place computation Cooley-Tukey, SRFFT, PFA No auxilary storage of size dependent on N is needed WFTA (Winograd Fourier Transform Algorithm) does not allow in-place computation A drawback for large sequences Cooley-Tukey and SRFFT are most compatible with longer size FFTsCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System Design6Regularity, parallelism Regularity Cooley-Tukey FFT very regular Repeat butterflies of same type Sums and twiddle multiplies SRFFT slightly more involved Different butterfly types in parallel e.g. radix-2 and radix-4 used in parallel on even/odd samples PFA even more involved Repetitive use of more complicated modules (like cyclic convolution, for prime length DFTs) WFTA most involved Repetition of parts of the cyclic conv. modules from PFA Parallelization Fairly easy for C-TFFT and SRFFT Small modules applied on sets of data that are separable and contiguous More difficult for PFA Data required for each module not in contiguous locationsCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].6.973 Communication System Design76.973 Communication System Design8Quantization noise Roundoff noise generated by finite precision of operations inside FFT (adds, multiplies) CTFFT (lengths 2n) Four error sources per butterfly (variance 2-2B/12) Total variance per butterfly 2-2B/3 Each output node receives signals from a total of N-1 butterflies in the flow graph (N/2 from the first stage, N/4 in the second, …) Total variance for each output ~ N/3*2-2B Assuming input power 1/3N2(|x(n)|<1/N to avoid overflow) Output power is 1/3N Error-to-signal ratio is then N22-2B(needs 1 additional bit per stage to maintain SER) Since a maximum magnitude increases by less than 2x from stage to stage we can prevent the overflow by requiring that |x(n)|<1 and scaling by ½ from stage-to-stage The output will be 1/N of the previous case, but the input magnitude can be Nx larger, improving the SER Error-to-signal ratio is then 4N*2-2B(needs 1/2 additional bit per stage to maintain SER) Radix-4 and SRFFT generate less roundoff noise than radix-2 WFTA Fewer multiplications (hence fewer noise sources) More difficult to include proper rescaling in the algorithm Error-to-signal ratio is higher than in CTFFT or SRFFT Two more bits are necessary to represent data in WFTA for same error order as CTFFTCite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].Particular cases DFT algorithms for real data sequence xk Xkhas Hermitian
View Full Document