32-bit Logarithmic Number System Based ALU PrologueIntroductionFunctional SpecificationFP to LNSFP to LNSFP to LNSLNS to FPLNS ALULNS ALUFP 2 LNSLNS 2 FPSign BitToolsTest VectorsResultsResultsResultsOptimizationsProblemsConclusions32-bit Logarithmic Number System Based ALU Viswanadham SankuProloguez Most DSP applications involve arithmetic operations on single or double precisionz Logarithmic Number System offers advantage over floating point arithmetic on smaller precisions.z Multiplication and Division in LNS are simplified to fixed-point addition and subtractionz Proposals for Logarithmic microprocessorIntroductionz Focus on single precision multiplication, division and square-rootz Complex Add and Subtract operationsz Critical operations are conversion from FP to LNS and LNS to FPz Involves look-up tables or mathematical computationsz Small precisions use lookup tablesz Large precisions use computational approachFunctional Specificationz MultiplicationX*Y = 2log2(X*Y)= 2log2(X)+log2(Y)z DivisionX/Y = 2log2(X/Y)= 2log2(X)-log2(Y)z Square Root√X = 2log2(√X)= 2(1/2)log2(X)FP to LNS => log2(X) LNS to FP => 2XFP to LNSz Floating point representation= (-1)s*1.f*2expX = (1+.x1x2x3…x23)*2expz Logarithmic representationY = log2(X) = log2(1+.x1x2x3…x23 *2exp’)= log2(1+.x1x2x3…x23) + exp’exp’ = exp – basez Without optimization needs 2x23x223= 46MB ROMs exp fs exp’ log2(1+x)FP to LNSFactorizationz log2(1+.x1x2x3…xn) = log2((1+.x1x2…xm)(1+.00…0cm+1cm+2…c2m))= log2(1+.x1x2…xm) + log2(1+.00…0cm+1cm+2…c2m)z log2(1+.x1x2…xm) requires an m x 2mmemoryz log2(1+.00…0cm+1cm+2…c2m) requires an m x 2mmemoryz For 32-bit FP, two lookup tables of total size 24 x 212bitsi.e., 12KB ROMz Computation of .00…0cm+1cm+2…c2mrequires additional memoryFP to LNS• Calculate .00…0cm+1cm+2…c2m = c * 2-ma= .x1x2…xmb=.xm+1xm+2…x2mc=.c1c2…cm1+. x1x2…xn= 1+. x1x2…xm+.00..0xm+1xm+2…x2m= 1+a+b*2-m= (1+. x1x2…xm)(1+.00…0cm+1cm+2…c2m)= (1+a)*(1+c*2-m)(1+a)*(1+c*2-m) = 1+a+b*2-mc = (1+b)/(1+a) – 1/(1+a)c= 2log2(1+b) – log2(1+a)–2-log2(1+a)•log2(1+b) and log2(1+a) both can use same ROM•2zcan be computed using ‘LNS to FP’ lookup tablesLNS to FPz Calculate 2Z2Z=20.Z1Z2Z3…Zn=2(0.Z1Z2…Zm)+(0.00..0Zm+1Zm+2…Z2m)=2(0.Z1Z2…Zm)x 2(0.00..0Zm+1Zm+2…Z2m)z 2(0.Z1Z2…Zm)requires an m x 2mmemoryz 2(0.00..0Zm+1Zm+2…Z2m)requires an m x 2mmemoryz For 32-bit FP, two lookup tables of total size 24 x 212bitsi.e., 12KB ROMz Same ROM can be used in FP to LNS conversionz Total Memory requirement = 12KB + 12 KB = 24KB to convert to and from LNSLNS ALULNS ALUFP 2 LNSLNS 2 FPSign BitToolsz ECE lab toolsAldec Active-HDL 7.1 Synplicity Synplify Pro 8.x Xilinx ISE for FPGA 7.x Synopsisz At HomeAldec Active-HDL 7.1 z Hardware description language: VHDLz Platforms: Xilinx FPGA (Virtex II XC2V2000)Standard-cell ASIC (TSMC 90nm)Test Vectorsz Used real-java package from sourceforge.netto generate test vectorsz Generated normalized random values for both input and output valuesz Generated vectors to verify special cases Zero, Infinity, NaNz De-normalized values are not coveredz Test bench reads test vectors stored in file and verifies the output valuesResultsz Written design status 80%, doesn’t cover de-normalized valuesz Verified through functional simulation 70%z Failed to verify through post-synthesis and timing simulation, Data errors.z Out of memory exception while analyzing over Synopsis (ASICs).ResultsResultsz Xilinx FPGA (Virtex II XC2V2000)CLB Slices: 2043ROMs: Implemented as look-up tablesLatency: 96 nsecOptimizationsz 2zfunction can be evaluated using an approximation 2z= 2 – log2(2-z)z Design can be pipelined for maximum throughputz BRAMs can be used for lookup tables to improve performancez Better factorization algorithms can be used to reduce memory requirements for higher precisionsProblemsz Out of memory error on Synopsisz Debugging post synthesisz Denormals are not handledConclusionsz LSN performance is comparable to FP multiplication and divisionz With more optimized implementation LNS can be a better alternative to FP for single and double precisionsz If there are series of multiplications and divisions then LNS performs lot better as most of the time is consumed in
View Full Document