Princeton COS 521 - Lecture 10 - D1333005

Home> Schools> Princeton University> Computer Science (COS) > COS 521> Lecture 10

Princeton COS 521 - Lecture 10

Course Cos 521- Intro. To Genomics and Computation

Pages 4

Download Save

Unformatted text preview:

CS174 Lecture 10 John CannyChernoff BoundsChernoff bounds are another kind of tail bound. Like Markoff and Chebyshev, they bound the totalamount of probability of some random variable Y that is in the “tail”, i.e. far from the mean.Recall that Markov bounds apply to any non-negative random variable Y and have the form:Pr[Y ≥ t] ≤YtwhereY =E[Y ]. Markov bounds don’t depend on any knowledge of the distribution of Y .Chebyshev bounds use knowledge of the standard deviation to give a tighter bound. The Chebyshevbound for a random variable X with standard deviation σ is:Pr[|X −X|≥tσ] ≤1t2But we already saw that some random variables (e.g. the number of balls in a bin) fall offexponentially with distance from the mean. So Markov and Chebyshev are very poor bounds forthose kinds of random variables.The Chernoff bound applies to a class of random variables and does give exponential fall-off ofprobability with distance from the mean. The critical condition that’s needed for a Chernoff boundis that the random variable be a sum of independent indicator random variables. Since that’s truefor balls in bins, Chernoff bounds apply.Bernoulli Trials and the Binomial DistributionThe first kind of random variable that Chernoff bounds work for is a random variable that is asum of indicator variables with the same distribution (Bernoulli trials). That is, if Xiis a random1variable with Pr[Xi=1]=p, Pr[Xi=0]=(1−p), and the Xiare all independent. Tossing a coinis a Bernoulli trial. So is the event that a randomly tossed ball falls into one of n bins (p =1/n).IfX =ni=1Xiis a sum of Bernoulli trials, then X has a Binomial distribution. We derived this already for coinsand balls into bins. It is:Pr[X = k]=nk pk(1 − p)n−kthe Chernoff bounds approximate a generalization of the binomial distribution.Poisson TrialsThere is a slightly more general distribution that we can derive Chernoff bounds for. If insteadof a fixed probability we allow every Xito have a different probability, Pr[Xi=1]=pi, andPr[Xi=0]=(1− pi), then these event are called Poisson trials. A Poisson trial by itself is reallyjust a Bernoulli trial. But when you have a lot of them together with different probabilities, theyare called Poisson trials. But it is very important that the Ximust still be independent.Chernoff Bounds (lower tail)Let X1,X2,...,Xnbe independent Poisson trials with Pr[Xi=1]=pi. Then if X is the sum ofthe Xiand if µ is E[X], for any δ ∈ (0, 1]:Pr[X<(1 − δ)µ] <e−δ(1 − δ)(1−δ) µThis bound is quite good, but can be clumsy to compute. We can simplify it to a weaker boundwhich is:Pr[X<(1 − δ)µ] < exp(−µδ2/2)the simplified bound makes it clear that the probability decreases exponentially with distance δfrom the mean.Example In n tosses of a fair coin, what’s the probability of m<n/2 heads? Let X be the numberof heads, then µ = n/2 and δ =(1− 2m/n) is the relative distance of m from µ. The bound givesus a probability of fewer than m heads which isPr[X<m] < exp(−(n/4)(1 − 2m/n)2)So if we toss the coin 100 times and ask for less than 10 heads, the probability is less thanexp(−16) = 1.12 × 10−7.2Proof of the Chernoff bound First write the inequality as an inequality in exponents, multipliedby t>0:Pr[X<(1 − δ)µ] = Pr[exp(−tX) > exp(−t(1 − δ)µ)]Its not clear yet why we introduced t, but at least you can verify that the equation above is correctfor positive t. We will need to fix t later to give us the tightest possible bound. Now we can applythe Markov inequality to the RHS above:Pr[X<(1 − δ)µ] <E[exp(−tX)]exp(−t(1 − δ)µ)]Notice that exp(−tX) is a product of independent random variables exp(−tXi). This is theheart of the Chernoff bound. The expected value of X is the product of the expected valuesE[exp(−tXi)]. So we have thatPr[X<(1 − δ)µ] <ni=1E[exp(−tXi)]exp(−t(1 − δ)µ)]Now E[exp(−tXi)] is given byE[exp(−tXi)] = pie−t+(1− pi)=1− pi(1 − e−t)We would like to express this as the exponential of something, so that we can simplify the productexpression it appears in. As we did in an earlier lecture, we use the fact that 1 − x<exp(−x) withx = pi(1 − e−t),wegetE[exp(−tXi)] < exp(pi(e−t− 1))and from there we can simplify:ni=1E[exp(−tXi)] <ni=1exp(pi(e−t− 1)) = expni=1pi(e−t− 1)=exp(µ(e−t− 1))because µ =pi, and e−tis a constant in the sum. Substituting back into the overall bound gives:Pr[X<(1 − δ)µ] <exp(µ(e−t− 1))exp(−t(1 − δ)µ)]=exp(µ(e−t+ t − tδ − 1))Now its time to choose t to make the bound as tight as possible. That means minimizing the RHSwrt t. Taking the derivative of (e−t+ t − tδ − 1) and setting it to zero gives:−e−t+1− δ =0and solving gives us t =ln(1/(1 − δ)). Making that substitution gives:Pr[X<(1 − δ)µ] < exp(µ((1 − δ)+(1− δ)ln(1/(1 − δ)) − 1))and after cancelling the 1’s and applying the exponential, we get:Pr[X<(1 − δ)µ] <e−δ(1 − δ)(1−δ)µ3which is the bound we are looking for.To get the simpler form of the bound, we need to get rid of the clumsy term (1 − δ)(1−δ). Firsttake the log to give (1 − δ)ln(1− δ). Now the Taylor expansion of natural log isln(1 − δ)=−δ − δ2/2 − δ3/3 − δ4/4 ···multiplying by (1 − δ) gives:(1 − δ)ln(1− δ)=−δ + δ2/2+all positive terms > −δ + δ2/2and we can apply exponentiation to give:(1 − δ)(1−δ)> exp(−δ + δ2/2)We can substitute this inequality into the earlier bound to get:Pr[X<(1 − δ)µ] <e−δ(1 − δ)(1−δ)µ<exp(−δ)exp(−δ + δ2/2) µ=exp(−µδ2/2)Chernoff Bounds (upper tail)Let X1,X2,...,Xnbe independent Poisson trials with Pr[Xi=1]=pi. Then if X is the sum ofthe Xiand if µ is E[X], for any δ>0:Pr[X>(1 + δ)µ] <eδ(1 + δ)(1+δ) µProof The proof is almost identical to the proof for the lower tail bound. Start by introducing a tparameter:Pr[X>(1 + δ)µ] = Pr[exp(tX) > exp(t(1 + δ)µ)]compute the Markov bound, convert the product of expected values to a sum, and then solve for tto make the bound as tight as possible. QEDThe upper-tail bound can be simplified. Suppose δ>2e − 1, thenPr[X>(1 + δ)µ] <eδ(1 + δ)(1+δ) µ<eδ(2e)(1+δ) µ<eδ(2e)δ µ=2−δµwhich shows once again an exponential drop-off in probability with δ.By a more complicated argument, which we wont give here, you can show that for δ<2e − 1,the Chernoff bound

View Full Document


School:
Email:
New Password:
Confirm Password:

Princeton COS 521 - Lecture 10

Sign up for free to view:

Please select your school