New version page

# WUSTL ESE 520 - proof of CLT

Pages: 13
Documents in this Course

2 pages

2 pages

2 pages

## This preview shows page 1-2-3-4 out of 13 pages.

View Full Document

End of preview. Want to read all 13 pages?

View Full Document
Unformatted text preview:

Two Proofs of the Central Limit TheoremYuval FilmusJanuary/February 2010In this lecture, we describe two proofs of a central theorem of mathemat-ics, namely the central limit theorem. One will be using cumulants, and theother using moments. Actually, our proofs won’t be entirely formal, but wewill explain how to make them formal.1 Central Limit TheoremWhat it the central limit theorem? The theorem says that under rather gen-eral circumstances, if you sum independent random variables and normalizethem accordingly, then at the limit (when you sum lots of them) you’ll get anormal distribution.For reference, here is the density of the normal distribution N(µ, σ2) withmean µ and variance σ2:1√2πσ2e−(x−µ)22σ2.We now state a very weak form of the central limit theorem. Suppose thatXiare independent, identically distributed random variables with zero meanand variance σ2. ThenX1+ ··· + Xn√n−→ N(0, σ2).Note that if the variables do not have zero mean, we can always normalizethem by subtracting the expectation from them.The meaning of Yn−→ Y is as follows: for each interval [a, b],Pr[a ≤ Yn≤ b] −→ Pr[a ≤ Y ≤ b].1This mode of convergence is called convergence in distribution.The exact form of convergence is not just a technical nicety — the normal-ized sums do not converge uniformly to a normal distribution. This meansthat the tails of the distribution converge more slowly than its center. Es-timates for the speed of convergence are given by the Berry-Ess´een theoremand Chernoff’s bound.The central limit theorem is true under wider conditions. We will beable to prove it for independent variables with bounded moments, and evenmore general versions are available. For example, limited dependency canbe tolerated (we will give a number-theoretic example). Moreover, randomvariables not having moments (i.e. E[Xn] doesn’t converge for all n) aresometimes well-behaved enough to induce convergence. Other problematicalrandom variable will converge, under a different normalization, to an α-stabledistribution (look it up!).2 Normal Distribution and Meaning of CLTThe normal distribution satisfies a nice convolution identity:X1∼ N(µ1, σ21), X2∼ N(µ2, σ22) =⇒ X1+ X2∼ N(µ1+ µ2, σ21+ σ22).Moreover, we can scale a normally distributed variable:X ∼ N(µ, σ2) =⇒ cX ∼ N(cµ, c2σ2).Even more exciting, we can recover the normal distribution from these prop-erties. The equation N(0, 1)+N(0, 1) =√2N(0, 1) in essence defines N(0, 1)(up to scaling), from which the entire ensemble can be recovered.These properties point at why we should expect the normalized sums inthe central limit theorem to converge to a normal variable. Indeed, supposethe convergence is to a hypothetical distribution D. From the equationsX1+ ··· + Xn√n−→ DX1+ ··· + X2n√2n−→ Dwe would expect D + D =√2D, so D must be normal. Therefore the realcontent of the central limit theorem is that convergence does take place. The2exact form of the basin of attraction is deducible beforehand — the onlyquestion is whether summing up lots of independent variables and normal-izing them accordingly would get us closer and closer to the only possiblelimit, a normal distribution with the limiting mean and variance.3 Moment Generating FunctionThe main tool we are going to use is the so-called moment generating func-tion, defined as follows for a random variable X:MX(t) = E[etX].Expanding the Taylor series of etX, we discover the reason it’s called themoment generating function:MX(t) =∞Xn=0E[Xn]n!tn.The moment generating function is thus just the exponential generating func-tion for the moments of X. In particular,M(n)X(0) = E[Xn].So far we’ve assumed that the moment generating function exists, i.e. theimplied integral E[etX] actually converges for some t 6= 0. Later on (onthe section on characteristic functions) we will discuss what can be doneotherwise. For now, we will simply assume that the random variable X hasmoments of all orders, and it follows that MX(t) is well-defined (the diligentreader will prove this using monotonicity of the p-norm k · kp).The moment generating function satisfies the following very useful iden-tities, concerning convolution (sum of independent variables) and scaling(multiplication by a constant):MX+Y(t) = E[et(X+Y )] = E[etXetY] = MX(t)MY(t),McX(t) = E[etcX] = MX(ct).For the first identity, X and Y must be independent of course.34 Example: Bernoulli and PoissonA Bernoulli random variable Ber(p) is 1 with probability p and 0 otherwise.A binomial random variable Bin(n, p) is the sum of n independent Ber(p)variables.Let us find the moment generating functions of Ber(p) and Bin(n, p). Fora Bernoulli random variable, it is very simple:MBer(p)= (1 − p) + pet= 1 + (et− 1)p.A binomial random variable is just the sum of many Bernoulli variables, andsoMBin(n,p)=1 + (et− 1)pn.Now suppose p = λ/n, and consider the asymptotic behavior of Bin(n, p):MBin(n,λ/n)=1 +(et− 1)λnn−→ eλ(et−1).As the reader might know, Bin(n, p) −→ Poisson(λ), where the Poissonrandom variable is defined byPr[Poisson(λ) = n] = e−λλnn!.Let us calculate the moment generating function of Poisson(λ):MPoisson(λ)(t) = e−λ∞Xn=0λnetnn!= e−λeλet= eλ(et−1).This is hardly surprising. In the section about characteristic functions weshow how to transform this calculation into a bona fide proof (we commentthat this result is also easy to prove directly using Stirling’s formula).5 CumulantsWe are now almost ready to present our first proof. We first define thecumulant generating function of a random variable X:KX(t) = log MX(t).4This somewhat strange definition makes more sense once we notice thatMX(t) = 1 + O(t), so that it makes sense to take its logarithm. In fact,using the Taylor series of log(1 + x),log(1 + t) = t −t22+ ···we can expand KX(t) as a power series, which begins as follows:KX(t) =E[X]t +E[X2]2t2+ ···−(E[X]t + ···)22+ ···= E[X]t +E[X2] − E[X]22t2+ ···Hence the first two coefficients of KX(t) (as an exponential generating func-tion, that is disregarding the 1/n! factors) are the expectation and the vari-ance. We call these coefficients cumulants. Formally, we can define the nthcumulant Kn[X] as follows:Kn[X] = K(n)X(0).In particular, we have just shown thatK0[X] = 0, K1[X] = E[X], K2[X] = V[X].In general, using the Taylor series of log(1 + x), we can express Kn[X] asa polynomial in the

View Full Document
Unlocking...