MAP ARTPowerPoint PresentationSlide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17MAP ARTMAP ARTChris Savarese, Yashesh Shroff, Greg LawrenceApril 27, 2000CS252Advisor: Dr. Jan RabaeyMapping Architectural Properties to an Algorithm for Redundant Triangulation2OutlineOutline•Introduction•Background•Time and Energy Profiling•Parallel Architectures•Conclusions: Our Dream Architecture•Future Work3•Goal: Given a basic localization algorithm, explore architectural alternatives for the minimization of energy consumption.IntroductionIntroduction•The concept of localization•Energy saving techniques•What we did…4•Introduction•Background•Time and Energy Profiling•Parallel Architectures•Conclusions: Our Dream Architecture•Future WorkOutlineOutline•Background5The Localization AlgorithmThe Localization Algorithm(x1-xn) (y1-yn)(z1-zn)(xn-1-xn) (yn-1-yn) (zn-1-zn) x y z=b1bn-1Am3U31Bn-11UN2N1N3N1(x1,y1,z1)N2(x2,y2,z2)U (x,y,z)N3(x3,y3,z3)[Am3] [Qm3] ·[R33]Solve:U = R-1QT bQRdcmp()6The StrongARM ArchitectureThe StrongARM Architecture•Power: 200mW, 0.25m, 1.5V•Clock Speed: 200 MHz•Cache:- 16 KB I-cache- 8 KB D-cache- 32-way set-associative, round-robin replacement- 512B, 2-way Minicache•31/16 GPR (32-bit)• Auto-increment addressing • No FP processor• MAC7The Tensilica Xtensa ArchitectureThe Tensilica Xtensa Architecture•Power: 200mW, 0.25 m, 1.5V•Clock Speed: 170 MHz•Cache:- 16 KB I-cache- 16 KB D-cache- Direct mapped•32 Registers (32-bits)Processor Configuration• Xtensibility Use of TIE instructions • No FP processor• Zero overhead loops8•Introduction•Background•Time and Energy Profiling•Parallel Architectures•Conclusions: Our Dream Architecture•Future WorkOutlineOutline•Time and Energy Profiling9Profiler Output:Profiling ResultsProfiling Results-----------------------------------------------_fmul 18.21% 18.21% 0.00% 188000-----------------------------------------------lubksb 15.27% 5.17% 10.10% 10000 _fneq 0.37% 0.00% 14000 _fdiv 4.23% 0.00% 30000 _fmul 5.03% 0.00% 52000 _frsb 0.46% 0.00% 52000StrongARM Processor68JXtensa Processor144JFloating PointEnergy = nom. core power #cycles clock period10Fixed Point ArithmeticFixed Point Arithmetic•Floating Point vs. Fixed Point•Add / Sub are straightforward•Multiply / Divide require shifting•Why can we use it for localization?• Low accuracy requirements• Limited range in measurements (< 10m)• Small matrices small error propagation0000 . 000016 16S E Mantissa1 8 2311StrongARM Processor43J Xtensa Processor69JFixed PointProfiler Output:Fixed Point Profiling ResultsFixed Point Profiling Results-----------------------------------------------_fmul 18.21% 18.21% 0.00% 188000-----------------------------------------------lubksb 15.27% 5.17% 10.10% 10000 _fneq 0.37% 0.00% 14000 _fdiv 4.23% 0.00% 30000 _fmul 5.03% 0.00% 52000 _frsb 0.46% 0.00% 52000StrongARM Processor68JXtensa Processor144JFloating PointEnergy = nom. core power #cycles clock period(37% less)(52% less)12•Introduction•Background•Time and Energy Profiling•Parallel Architectures•Conclusions: Our Dream Architecture•Future WorkOutlineOutline•Parallel Architectures13Parallel ArchitecturesParallel Architectures- Write sequential code in Matlab- Extract data-dependencies- Workload analysis CP1 CP2 CP3P14•Introduction•Background•Time and Energy Profiling•Parallel Architectures•Conclusions: Our Dream Architecture•Future WorkOutlineOutline•Conclusions: Our Dream Architecture15•CacheOur Dream ArchitectureOur Dream Architecture•Floating point hardware•MAC hardware•Zero overhead loops•Auto increment•Register file size Direct mapped16Future WorkFuture Work•FPGA implementation•Xtensa customizations• TIE instructions• Floating Point Coprocessor•Realistic algorithm for PicoRadio17Many Thanks To…Many Thanks To…•Dr. Bart Kienhuis, EECS Post Doc• Ptolemy and other tools: Parallel issues•Fred Burghardt, BWRC Technical Staff• PicoRadio Testbed•Marlene Wan, BWRC Student• StrongARM Energy Profiling•Vandana Prabhu, BWRC Student• Tensilica Tools•The Berkeley Wireless Research
View Full Document