EECS 70 Discrete Mathematics and Probability TheorySpring 2014 Anant Sahai Lecture 16VarianceQuestion: Let us return once again to the question of how many heads in a typical sequence of n coin flips.Recall that we used the Galton board to visualize this as follows: consider a token falling through the Galtonboard, deflected randomly to the right or left at each step. We think of these random deflections as takingplace according to whether a fair coin comes up heads or tails. The expected position of the token when itreaches the bottom is right in the middle, but how far from the middle should we typically expect it to land?Denoting a right-move by +1 and a left-move by −1, we can describe the probability space here as the setof all words of length n over the alphabet {±1}, each having equal probability12n. Let the r.v. X denote ourposition (relative to our starting point 0) after n moves. ThusX = X1+ X2+ ···+ Xn,where Xi=(+1 if ith toss is Heads;−1 otherwise.Now obviously we have E(X) = 0. The easiest way to see this is to note that E(Xi) = (12×1)+(12×(−1)) =0, so by linearity of expectation E(X) =∑ni=1E(Xi) = 0. But of course this is not very informative, and isdue to the fact that positive and negative deviations from 0 cancel out.What the above question is really asking is: What is the expected value of |X|, the distance of the distancefrom 0? Rather than consider the r.v. |X|, which is a little awkward due to the absolute value operator, wewill instead look at the r.v. X2. Notice that this also has the effect of making all deviations from 0 positive,so it should also give a good measure of the distance traveled. However, because it is the squared distance,we will need to take a square root at the end to make the units make sense.Let’s calculate E(X2):E(X2) = E((X1+ X2+ ···+ Xn)2)= E(∑ni=1X2i+∑i6= jXiXj)=∑ni=1E(X2i) +∑i6= jE(XiXj)In the last line here, we used linearity of expectation. To proceed, we need to compute E(X2i) and E(XiXj)(for i 6= j). Let’s consider first X2i. Since Xican take on only values ±1, clearly X2i= 1 always, so E(X2i) = 1.What about E(XiXj)? Well, XiXj= +1 when Xi= Xj= +1 or Xi= Xj= −1, and otherwise XiXj= −1.Also,Pr[(Xi= Xj= +1) ∨(Xi= Xj= −1)] = Pr[Xi= Xj= +1] + Pr[Xi= Xj= −1] =14+14=12,so XiXj= 1 with probability12. In the above calculation we used the fact that the events Xi= +1 andXj= +1 are independent, so Pr[Xi= Xj= +1] = Pr[Xi= +1] ×Pr[Xj= +1] =12×12=14(and similarly forPr[Xi= Xj= −1]). Therefore XiXj= −1 with probability12also. Hence E(XiXj) = 0. Since Xiand Xjareindependent, we saw in the last lecture note that it is the case that E(XiXj) = E(Xi)E(Xj) = 0.EECS 70, Spring 2014, Lecture 16 1Plugging these values into the above equation givesE(X2) = (n ×1) + 0 = n.So we see that our expected squared distance from 0 is n. One interpretation of this is that we might expectto be a distance of about√n away from 0 after n steps. However, we have to be careful here: we cannotsimply argue that E(|X|) =pE(X2) =√n. (Why not?) We will see later in the lecture how to make precisedeductions about |X| from knowledge of E(X2).For the moment, however, let’s agree to view E(X2) as an intuitive measure of “spread” of the r.v. X. Infact, for a more general r.v. with expectation E(X) = µ, what we are really interested in is E((X −µ)2), theexpected squared distance from the mean. In our random walk example, we had µ = 0, so E((X −µ)2) justreduces to E(X2).Definition 16.1 (variance): For a r.v. X with expectation E(X) = µ, the variance of X is defined to beVar(X) = E((X −µ)2).The square root σ(X) :=pVar(X) is called the standard deviation of X.The point of the standard deviation is merely to “undo” the squaring in the variance. Thus the standarddeviation is “on the same scale as” the r.v. itself. Since the variance and standard deviation differ just by asquare, it really doesn’t matter which one we choose to work with as we can always compute one from theother immediately. We shall usually use the variance. For the random walk example above, we have thatVar(X) = n, and the standard deviation of X, σ(X ), is√n.The following easy observation gives us a slightly different way to compute the variance that is simpler inmany cases.Theorem 16.1: For a r.v. X with expectation E(X) = µ, we have Var(X) = E(X2) −µ2.Proof: From the definition of variance, we haveVar(X) = E((X −µ)2) = E(X2−2µX + µ2) = E(X2) −2µE(X) + µ2= E(X2) −µ2.In the third step here, we used linearity of expectation. 2ExamplesLet’s see some examples of variance calculations.1. Fair die. Let X be the score on the roll of a single fair die. Recall from an earlier lecture that E(X) =72.So we just need to compute E(X2), which is a routine calculation:E(X2) =1612+ 22+ 32+ 42+ 52+ 62=916.Thus from Theorem 16.1Var(X) = E(X2) −(E(X))2=916−494=3512.More generally, if X is a random variable that takes on values 1,...,n with equal probability 1/n (i.e.X has a uniform distribution), the mean, variance and standard deviation of X are:EECS 70, Spring 2014, Lecture 16 2E(X) =n + 12, Var(X) =n2−112, σ(X) =rn2−112.(You should verify these.)2. Number of fixed points. Let X be the number of fixed points in a random permutation of n items (i.e.,the number of students in a class of size n who receive their own homework after shuffling). We sawin an earlier lecture that E(X) = 1 (regardless of n). To compute E(X2), write X = X1+ X2+ ···+ Xn,where Xi=(1 if i is a fixed point;0 otherwiseThen as usual we haveE(X2) =n∑i=1E(X2i) +∑i6= jE(XiXj). (1)Since Xiis an indicator r.v., we have that E(X2i) = Pr[Xi= 1] =1n. Since both Xiand Xjare indicators,we can compute E(XiXj) as follows:E(XiXj) = Pr[Xi= 1 ∧Xj= 1] = Pr[both i and j are fixed points] =1n(n −1).[Check that you understand the last step here.] Plugging this into equation (1) we getE(X2) = (n ×1n) + (n(n −1) ×1n(n−1)) = 1 + 1 = 2.Thus Var(X ) = E(X2) −(E(X))2= 2 −1 = 1. I.e., the variance and the mean are both equal to 1.Like the mean, the variance is also independent of n. Intuitively at least, this means that it is unlikelythat there will be more than a small number of fixed points even when the number of items, n, is verylarge.Variance for sums of independent random variablesOne of the most important and useful facts about variance is if a random variable X is the sum of independentrandom variables X = X1+ ···Xn, then its variance is the
View Full Document