**Unformatted text preview:**

Lec22.5-13/27/01Nonlinear EstimationProfessor David H. StaelinMassachusetts Institute of TechnologyLec22.5-23/27/01Case I: Nonlinear PhysicsJ1[]12ˆpDD=1d⎡⎤⎢⎥⎣⎦dDataBest Fit,"Linear Regression"Parameter1DSlope2DpOptimumˆEstimator P(d)pProbability Distribution0P{p}Minimum Square Error (MSE) is usually the "optimum" sought.It is important that the "training" data used for regressionis similar to, but independent of, the "test" data used forperformance evaluation.Lec22.5-33/27/01Case II: Non-Gaussian StatisticsJ2AdBest linear estimatorˆNote: Linear estimator can set p<0 (which is non-physical):nonlinear estimator approaches correct asymptotes as dNote: Physics here is linear. Can use polynomials etc. forˆp(d), or recursive linear →±∞estimators updating D.d0pAˆpd{}APpdMaximum a posterioriprobability "MAP"estimatorpAMinimizes MSE given dAˆpdBest Nonlinear Estimatorˆ(avoids p 0).<Lec22.5-43/27/01Linear Estimates for Nonlinear ProblemsJ3In general, the more varied the data d, the smaller the performance gapbetween linear estimators and nonlinear ones, which are a superset.()()12 1212ˆp d , d can be linear in d , d and (noiseless case)perfect, even though p d , d is nonlinear()()()22101 2 20122201 2101 22012012212Say d a a p a p and d b b p b pThen p d b b p /bdaapadbbp/bccpcd(defines plane in p, d , d - space=+ + =+ +=−−=+ + −− =+ +()2101221 112abˆTherefore p p c d c d /c where c a 0b==−+ − = − ≠12Linear estimates are suboptimum for d or d alone.2dp1dpOptimum LinearEstimateTruth (without noise)1d2dOptimum LinearEstimateLec22.5-53/27/01Linear Estimates for Nonlinear ProblemsJ42d()12ˆPlane p d , d is thelinear estimator1dp()12pd d()21pd d()12pd,dp∆()12ˆPlane p d ,d has twoangular degrees of freedomto use for minimizing ( p).∆Lec22.5-63/27/01Generalization to Nth-Order NonlinearitiesJ5Note: For perfect linear estimation the number of differentobservations must equal or exceed n, the maximumorder of the polynomials characterizing the physics.……2n1111 12 1n2n2221 22 2nLet d c a p a p a pdcapap ap=+ + ++=+ + ++Claim: In non-singular cases there exists anˆexact linear estimator p Dd constant.=+……2nnnn1 n2 nn12 ndcapap apwhere d , d , , d are observed noise-freedata related to p by an nth-order polynomial.=+ + ++•••Lec22.5-73/27/01Nonlinear Estimators for Nonlinear ProblemsL1()augˆApproaches:1) p Dd where d isaugmented via polynomials2) Rank reduction first via KLT3) Neural netsˆ4) Iterations with D p=22 3aug121 2121Case (1): d 1, d, d, d, d, dd, d, for example=KLTNonlinearAugmentationaugDdOption A:Dropaugdd′dˆpUseful if d is of High OrderCase (2): Use of KLT to Reduce RankLec22.5-83/27/01“Karhunen-Loeve Transform” (KLT)L2[]1nijijiijid = Kd transforms a noisy redundant data vector d into areduced-rank vector d with most variance in d , leastin d , where E dd , and ∆>′′′′′′⎡⎤=δλ λ ≥λ⎣⎦1ttdd2n . . . .0C = Edd K K0 . . . . ∆λ⎡⎤⎡⎤⎢⎥=λ⎢⎥⎣⎦⎢⎥λ⎣⎦The KLT is essentially the same as Principal ComponentAnalysis (PCA) and Empirical Orthogonal Functions (EOF).KLThe.g.10KMT(h)231Basis FunctionsAre Rows of KLec22.5-93/27/01Nonlinear Estimators Using Rank ReductionL3Dd′Option B:dˆpUseful if d is very nonlinearDropKLTNonlinearaugmentationOption C:augDd′ˆpdropaugd′KLTKLTNonlinearaugmentationdropd′daugdLec22.5-103/27/01Neural Networks (NN)L4Neural nets compute complex polynomialswith great efficiency, simplicity.ˆpNNdˆp3NN1NNd2NNd′d′′NMW"Hidden Layers"NN may be recursive or purely Feed-Forward (FFNN)NN coefficients computedvia relaxation optimizationback propagation.ΣΣ11dNd01W11WN1W1M1d′Md′NMWLec22.5-113/27/01Neural Networks (NN)L5ijWeights W are found by guessing, then perturbing to reduce costˆfunction on p p over "training set" of known d,p . Trainingcan be tedious. Commercial software tools are commonly used.⎡⎤⎡⎤−⎣⎦⎣⎦Degrees of freedom in training set should exceed the number ofweights by a modest factor of ~ 3-10. Risks of "overtraining" arereduced by monitoring NN performance on independent test data.The more nonlinear problems need more layers and moreinternal nodes, as determined empirically. Neural nets canalso perform recognition tasks.ˆpLinearEstimatorNNdSimple neural nets are easier to train correctly,so merge with linear techniques, e.g.Lec22.5-123/27/01Accommodating Nonlinearity via IterationM1A. Library RetrievalsoDdiDkDip∧p∧kDoDpdiiSuccessive D Chosenfrom Library as f p∧⎛⎞⎜⎟⎝⎠Lec22.5-133/27/01Accommodating Nonlinearity via IterationM2B. Physics RecomputationdDWnoii1pp∧∧=+∆∑++Nonlinear Physicsop "First Guess"∧np∧ip∧∆iEssential only that D always yields p inright direction (so errors shrink).∧∆+−i1ˆd+Lec22.5-143/27/01Accommodating Nonlinearity via IterationM3C. Spatial IterationidiAlternate Method(Use if d Threshold∆>+Delayip∧ip∧∆D+Delayid∧∆++−i1d∧−i1p∧−45678SpaceiWorks well if data is smooth.Lec22.5-153/27/01Multi-Dimensional EstimationM4()()()()()()()e.g., Atmospheric Temperature Profilesˆ1) T h,x,y f d x,yˆ2) T h,x,y f d x,y , d x ,y ,...Estimate improved by additional correlationsand degrees of freedom.==±∆±∆Lec22.5-163/27/01Genetic AlgorithmsM51) Characterize algorithm by segmented character string,e.g., a binary number.3) Many algorithms (strings) are tested, and the metricsfor each are evaluated for some ensemble of test cases.2) Significance of string is defined, e.g.,A. As weights in a linear estimator or neural net.B. It characterizes the architecture of a neural net.4) Randomly combine elements of best algorithms to formnew ones; some random changes may be made too.5) Iterate steps (3) and(4) until asymmtotic optimum isreached. Can be used for pattern recognition, signaldetection, parameter estimation, and other purposes.6) Proper choice of "gene" size accelerates convergence(genes swapped as

View Full Document