DOC PREVIEW
Berkeley COMPSCI 70 - n19

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EECS 70 Discrete Mathematics and Probability TheorySpring 2014 Anant Sahai Note 19Deviations: small and largeOften, our goal is to understand how collections of a large number of independent random variables behave.This is what the laws of large numbers reveal. In general, the idea is that the average of a large number ofi.i.d. random variables will approach the expectation. Sometimes that is enough, but usually, we also needto understand how fast does the average converge to the expectation. We saw this already in the context ofpolling — we needed to understand how many people to poll in order to get a trustworthy high-precisionestimate of what the population is like.So far, our main tools have been the Markov and Chebyshev inequalities:(Markov) If X ≥ 0, then Pr(X ≥ a) ≤E[X]a(Chebyshev) If Xiare i.i.d., then Pr 1nn∑i=1Xi−E[X1]≥ ε!≤Var(X1)nε2Taking the limit as n goes to infinity, Chebyshev’s inequality implies that if the Xiare i.i.d, thenPr 1nn∑i=1Xi−E[X1]≥ ε!→n→∞0so the average of the Xiconverges to the expectation E[X1]. This is called the weak law of large numbers.The role of precision is played by the tolerance ε above. The measure of trustworthiness is how smallthe probability of exceeding the tolerance gets. For the sake of simplicity, let us use the reciprocal of theprobability of being outside of our desired precision as our measure of confidence. Looking at Chebyshev’sinequality, we are tempted to draw two conclusions about what happens as n gets large:• If we fix the desired confidence (pick a suitably low probability of our estimate being incorrect), thenthe precision ε seems to be improving like√Var(X )√n.• If we fix the precision ε, then the confidence improves linearly in n with a slope that depends on theprecision.It turns out that the first of these is essentially correct, while the second is often overly pessimistic. The firstis covered by the central limit theorem and is often referred to as the study of “small deviations.” Later inthis lecture, we will see that actually, the confidence improves exponentially in n. That is explored usingwhat are called Chernoff Bounds and the general area is referred to as the study of “large deviations.” Thedifference between “small” and “large” is that the small deviations are shrinking with n.EECS 70, Spring 2014, Note 19 1The Central Limit Theorem: studying “small” deviationsThe Central Limit Theorem has a long and illustrious history. At an intuitive level, it says that the appropriately-scaled1sum of a bunch of independent random variables behaves like a Gaussian (AKA Normal) randomvariable.Theorem 19.1: (Central Limit Theorem) Let Xibe i.i.d. random variables with E[Xi] = µ and Var(Xi) = σ2,and defineZn=∑ni=1Xi−nµ√n ·σ2.Then for for all z we have(CLT) limn→∞Pr(Zn≤ z) =Zz−∞1√2πe−x2/2dx.We say that Zn→ N(0,1): Znconverges in probability towards the standard Gaussian with mean 0 andvariance 1.Notice that the convergence above is in exactly the same sense (the CDF of Znconverges) as the convergenceof the appropriately scaled geometric random variables to a continuous-valued exponential random variable.The Berry-Esséen inequality gives a more precise quantification of the speed of this convergence:(Berry-Esséen)Pr(Zn≤ z) −Zz−∞1√2πe−x2/2dx≤0.77E[|X1−E[X1]|3](Var(X1))3/2√n.The convergence can be much faster than that2, but if all we know is that there exists a third momentE[|X1−E[X1]|3], then that’s pretty much the best that we can have. As a rule of thumb, you should be waryabout using the CLT in the context of probabilities that are smaller than O(1/√n).The CLT construction of Znis done specifically so that its mean is always zero and variance is always 1. Thefact that the CLT works validates theqVar(X )nscaling of precision that even Chebyshev’s coarser inequalitypredicted.Example 1. (From Bertsekas and Tsitsiklis) We have 100 bags, with weight distributed uniformly in[5,50], so that the mean weight is 27.5lbs and the variance is 168.75lbs. Can we upper bound the prob-ability that the total weight is larger than 3000lbs?For comparison, let’s see what happens if we use Chebyshev’s inequality,Pr |1100100∑i=1Xi−2750100| ≥ 2.50!≤168.75100 ·(2.5)2≈ 0.27.That is a strict inequality, but intuitively it is overestimating the probability by a factor of two since it is alsoincluding the case of the average being significantly smaller than the mean as well.1Another way of interpreting the issue of appropriately scaling is to say that the sum of a bunch of appropriately small indepen-dent random variables behaves like a Gaussian. This is used to justify Gaussian distributions for thermal noise that results from thecombined random motion of many molecules or electrons.2You are strongly encouraged to use a computer to plot for yourself how fast the CDF converges for the following example: Xiuniform. The case of Bernoulli random variables is illustrated at the end of this note. Both of these converge quite rapidly.EECS 70, Spring 2014, Note 19 2If we use the CLT instead,Pr 100∑i=1Xi≥ 3000!= 1 −Pr 100∑i=1Xi< 3000!= 1 −Pr∑100i=1Xi−2750√100 ·168.75<250√100 ·168.75≈ 1 −PrZ <250√100 ·168.75≈ 0.027.where Z is a random variable distributed as N(0,1). So the CLT predicts that the probability of having atotal weight of over 3000lbs is about 3%, which is much lower than the 27% bound (or 13% if we divide bytwo) that we got from Chebyshev’s inequality!It turns out that the Normal approximation here is very good even though the CLT is not a bound and 3% isnot an appropriately low number according to the Berry-Esséen inequality. This is because the CLT is verygood for the sums of bounded random variables like the uniform.Remark. In order to use the CLT to get easily calculated bounds, the following approximations will oftenprove useful: for any z > 0,1 −1z2e−z2/2z√2π≤Z∞z1√2πe−x2/2dx ≤e−z2/2z√2π.This way, you can approximate the tail of a Gaussian even if you don’t have a calculator capable of doingnumeric integration handy. It also reveals how the tail scales in a “closed form” way. We will use this later.Example 2. Imagine a polling situation, in which we assume that people have a probability p of supportingObama. Suppose we poll n people, and we want


View Full Document

Berkeley COMPSCI 70 - n19

Documents in this Course
Notes 1

Notes 1

12 pages

Note 2

Note 2

6 pages

Notes

Notes

6 pages

Notes

Notes

7 pages

Note 10

Note 10

8 pages

n20

n20

10 pages

n18

n18

10 pages

n17

n17

6 pages

n16

n16

9 pages

n15

n15

10 pages

n14

n14

9 pages

n13

n13

6 pages

n12

n12

7 pages

n11

n11

6 pages

n10

n10

8 pages

n9

n9

5 pages

n8

n8

7 pages

n7

n7

8 pages

n6

n6

5 pages

n5

n5

8 pages

n4

n4

7 pages

n3

n3

10 pages

n2

n2

7 pages

n1

n1

5 pages

Load more
Download n19
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view n19 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view n19 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?