MIT 6 262 - Estimation - D2819329

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 262> Estimation

MIT 6 262 - Estimation

School name Massachusetts Institute of Technology

Course 6 262- Discrete Stochastic Processes

Pages 34

Download Save

Unformatted text preview:

Chapter 10Estimation10.1 IntroductionEstimation, as considered here, involves a probabilistic experiment with two random vectors(rv’s) X and Y . The experiment is performed, but only the resulting sample value y ofthe random vector Y is observed. The observer then “estimates” the sample value x fromthe observation y. For simplicity, we assume throughout (except where explicitly assumedotherwise) that the combined random vector (X1, . . . , Xn, Y1, . . . , Ym)Thas a finite non-singular covariance matrix and finite joint probability density. We denote the estimate ofX , as a function of the observed sample value y, byˆx(y). For a given observation y, onecould, for example, chooseˆx(y) to be the conditional mean, the conditional median, or theconditional mode of the rv X conditional on Y = y.Estimation problems occur in an amazing variety of situations, often referred to as measure-ment problems or recovery problems. For example, in communication systems, the timingand the phase of the transmitted signals must be recovered at the receiver. Often it is nec-essary to measure the channel, and finally, for analog data, the receiver often must estimatethe transmitted waveform at finely spaced sample times. In control systems, the state ofthe system must be estimated in order to generate appropriate control signals. In statistics,one tries to estimate parameters for some probabilistic system model from trial sample val-ues. In any experimental science, one is always concerned with measuring quantities in thepresence of noise and experimental error.The problem of estimation is very similar to that of detection. With detection, we mustdecide between a discrete set of alternatives on the basis of some observation, whereas herewe make a selection from a vector continuum of choices. Although this does not appearto be a very fundamental di↵erence, it leads to a surprising set of di↵erences in approach.In many typical detection problems, the cost of di↵erent kinds of errors is secondary andwe are concerned primarily with maximizing the probability of correct choice. In typicalestimation problems, with a continuum of alternatives, the probability of selecting the exactcorrect value is zero, and we are concerned with making the error small according to somegiven criterion.47010.1. INTRODUCTION 471A fundamental approach to estimation is to use a cost function, C(x,ˆx) to quantify the costassociated with an estimateˆx when the actual sample value of X is x . This cost function,C(x,ˆx) is analogous to the cost C`k, defined in Section 8.3, of making decision k when `is the correct hypothesis. The minimum cost criterion or Bayes criterion for estimation isto chooseˆx(y), for the observed y, to minimize the expected cost, conditional on Y = y.Specifically, given the observation y , chooseˆx(y) to minimizeZC[x,ˆx(y)]fX |Y(x | y) dx = E[C[X , ˆx(Y )] | Y =y].That is, for each y,ˆx(y ) = arg minˆxZC [x ,ˆx] fX |Y(x | y)dx = arg minˆxE [C[X ,ˆx] | Y =y] . (10.1)The notation arg minuf(u) means the value of u that minimizes f(u) (if the minimum is notunique, any minimizing u can be chosen). This choice of estimate minimizes the expectedcost for each observation y. Thus it also minimizes the expected cost when averaged overY , i.e., it minimizesEhC[X ,ˆX (Y )]i=ZEhC[X ,ˆX (Y )] | Y =yifY(y)dy.over the estimation rule1ˆX (Y ). To interpretˆX (Y ), note that the probability model mapseach sample point ! 2 ⌦ into a sample value y = Y (!). The estimation rule maps that yintoˆx(y), which is the sample value ofˆX (Y ) for that ! and estimation rule.10.1.1 The squared cost functionIn practice, the most important cost function is the squared cost function,C[x,ˆx)]def=nXj=1[xj ˆxj]2. (10.2)This cost function will often described in the following two ways,Xj[xj ˆxj]2= xTx = kxk2.The estimate that minimizes EhC[X ,ˆX (Y )]ifor the squared cost function is called theMinimum Mean-Square-Error Estimate (MMSE) estimate. It is also often called the Bayesleast squares estimate. In order to minimize EhPj(Xj ˆxj(y))2| Y =yiover ˆx(y), it issufficient to choose ˆxj(y), for each j, to minimize E⇥(Xj ˆxj(y))2| Y =y⇤. This is simply1We often refer to this as estimating X from Y . By assumption, however, we already know everythingabout X — it distribution, its joint distribution with Y , etc. We are instead estimating the sample valueˆx(y) of X conditional on a sample value y of Y .472 CHAPTER 10. ESTIMATIONthe second moment of Xjaround ˆxj(y), conditional on Y = y. This is minimized bychoosing ˆxj(y) to be the mean of Xjconditional on Y = y.Simple though this result is, it is a central result of estimation theory, and we state it as atheorem.Theorem 10.1.1. The MMSE estimate ˆx(y), as a function of the observation Y = y, isgiven byˆx(y) = E [X | Y=y] =Zx fX|Y(x | y) dx. (10.3)Define the estimation error ⇠⇠⇠ for a given estimation rule as⇠⇠⇠def=ˆX(Y )  X .For MMSE, ˆx(y) is the conditional mean of X conditional on Y = y. Thus E [⇠⇠⇠ |Y =y] = 0for all y, so E [⇠⇠⇠ ] = 0 also . The covariance matrix of ⇠⇠⇠ is then given byK⇠⇠⇠= E [⇠⇠⇠ ⇠⇠⇠T] = Eh⇣ˆX(Y )  X⌘⇣ˆX(Y )  X⌘Ti(10.4)This can be simplified by recalling that E [⇠⇠⇠ |Y =y] = 0 for all y. Thus if q(y) is any vectorvalued function of the same dimension as x and ⇠⇠⇠, thenE [⇠⇠⇠ |Y =y] gT(y) = E [⇠⇠⇠ gT(Y )|Y =y] = 0.Averaging over y, we get the important relation that the MMSE estimation error and anysuch g(y) satisfyE [⇠⇠⇠ gT(Y )] = 0. (10.5)In Section 10.5, this equation will be interpreted as an orthogonality principle. For now,since g(y) is an arbitrary function (of the right dimension), we can replace it withˆx(y),getting Eh⇠⇠⇠ˆXT(Y )i= 0. Substituting this into (10.4), we finally get a useful simplifiedexpression for the covariance of the estimation error for MMSE estimation,K⇠⇠⇠= E [⇠⇠⇠ XT] . (10.6)10.1.2 Other cost functionsSubsequent sections of this chapter focus primarily on the squared cost function. First,however, we briefly discuss several other cost functions. One is the absolute value costfunction where C(xˆx) =Pn|xn ˆxn|; this expected cost is minimized, for each y, bychoosing ˆxn(y) to be the conditional median of fXn|Y(xn| y) (see Exercise 1.10). Theabsolute value cost function places less weight on large estimation errors and more on smallestimation errors than the squared

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 262 - Estimation

Sign up for free to view:

Please select your school