CS 70 Discrete Mathematics for CSSpring 2008 David Wagner Note 20Some Important DistributionsQuestion: A biased coin with Heads probability p is tossed repeatedly until the first Head appears. What isthe expected number of tosses?As always, our first step in answering the question must be to define the sample space Ω. A moment’sthought tells us thatΩ = {H,TH, TTH, TTTH,...},i.e., Ω consists of all sequences over the alphabet {H,T} that end with H and contain no other H’s. This isour first example of an infinite sample space (though it is still discrete).What is the probability of a sample point, sayω= TTH? Since successive coin tosses are independent (thisis implicit in the statement of the problem), we havePr[TTH] = (1− p) × (1− p) × p = (1− p)2p.And generally, for any sequenceω∈ Ω of length i, we have Pr[ω] = (1− p)i−1p. To be sure everything isconsistent, we should check that the probabilities of all the sample points add up to 1. Since there is exactlyone sequence of each length i ≥ 1 in Ω, we have∑ω∈ΩPr[ω] =∞∑i=1(1− p)i−1p = p∞∑i=0(1− p)i= p×11− (1− p)= 1,as expected. [In the second-last step here, we used the formula for summing a geometric series.]Now let the random variable X denote the number of tosses in our sequence (i.e., X(ω) is the length ofω).Our goal is to compute E[X]. Despite the fact that X counts something, there’s no obvious way to write it asa sum of simple r.v.’s as we did in many examples in the last lecture note. (Try it!) Instead, let’s just dive inand try a direct computation. Note that the distribution of X is quite simple:Pr[X = i] = (1− p)i−1p for i = 1,2, 3, . . .So from the definition of expectation we haveE[X] = (1× p) + (2× (1− p)p) + (3× (1− p)2p) + ··· = p∞∑i=1i(1− p)i−1.This series is a blend of an arithmetic series (the i part) and a geometric series (the (1− p)i−1part). Thereare several ways to sum it. Here is one way, using an auxiliary trick (given in the following Theorem) thatis often very useful. [Ask your TA about other ways.]Theorem 20.1: Let X be a random variable that takes on only non-negative integer values. ThenE[X] =∞∑i=1Pr[X ≥ i].CS 70, Spring 2008, Note 20 1Proof: For notational convenience, let’s write pi= Pr[X = i], for i = 0,1,2,.... From the definition ofexpectation, we haveE[X] = (0× p0) + (1× p1) + (2× p2) + (3× p3) + (4× p4) + ···= p1+ (p2+ p2) + (p3+ p3+ p3) + (p4+ p4+ p4+ p4) + ···= (p1+ p2+ p3+ p4+ ···) + (p2+ p3+ p4+ ···) + (p3+ p4+ ···) + (p4+ ···) + · · ·= Pr[X ≥ 1] + Pr[X ≥ 2] + Pr[X ≥ 3] + Pr[X ≥ 4] + ··· .In the third line, we have regrouped the terms into convenient infinite sums. You should check that youunderstand how the fourth line follows from the third. Our “...” notation here is a little informal, but themeaning should be clear. Here is a more rigorous, but perhaps less clear proof:∞∑i=1Pr[X ≥ i] =∞∑i=1∞∑j=iPr[X = j] =∑1≤i≤ j<∞Pr[X = j] =∞∑j=1j∑i=1Pr[X = j] =∞∑j=1j× Pr[X = j] = E[X].2An alternative statement of Theorem 20.1 isE[X] =∞∑i=0Pr[X > i].This may look familiar from your last homework.Using Theorem 20.1, it is easy to compute E[X]. The key observation is that, for our coin-tossing r.v. X,Pr[X ≥ i] = (1− p)i−1. (1)Why is this? Well, the event “X ≥ i” means that at least i tosses are required. This is exactly equivalent tosaying that the first i− 1 tosses are all Tails. And the probability of this event is precisely (1− p)i−1. Now,plugging equation (1) into Theorem 20.1, we getE[X] =∞∑i=1Pr[X ≥ i] =∞∑i=1(1− p)i−1= 1+ (1− p) + (1− p)2+ (1− p)3+ ··· =11− (1− p)=1p.Here we have used the sum of the geometric series, namely, 1+ x+ x2+ x3+ · · · = 1/(1− x) if −1 < x < 1.So, the expected number of tosses of a biased coin until the first Head appears is1p. For a fair coin, theexpected number of tosses is 2.The geometric distributionThe distribution of the random variable X that counts the number of coin tosses until the first Head appearshas a special name: it is called the geometric distribution with parameter p (where p is the probability thatthe coin comes up Heads on each toss).Definition 20.1 (geometric distribution): A random variable X for whichPr[X = i] = (1− p)i−1p for i = 1,2,3,...is said to have the geometric distribution with parameter p.CS 70, Spring 2008, Note 20 2If we plot the distribution of X (i.e., the values Pr[X = i] against i) we get a curve that decreases monotoni-cally by a factor of 1− p at each step. For posterity, let’s record two important facts we’ve learned about thegeometric distribution:Theorem 20.2: For a random variable X having the geometric distribution with parameter p,1. E[X] =1p; and2. Pr[X ≥ i] = (1− p)i−1for i = 1, 2, . . ..The geometric distribution occurs very often in applications because frequently we are interested in howlong we have to wait before a certain event happens: how many runs before the system fails, how manyshots before one is on target, how many poll samples before we find a Democrat, etc. The next sectiondiscusses a rather more involved application, which is important in its own right.The Coupon Collector’s ProblemQuestion: We are trying to collect a set of n different baseball cards. We get the cards by buying boxes ofcereal: each box contains exactly one card, and it is equally likely to be any of the n cards. How many boxesdo we need to buy until we have collected at least one copy of every card?The sample space here is similar in flavor to that for our previous coin-tossing example, though rather morecomplicated. It consists of all sequencesωover the alphabet {1, 2, . . .,n}, such that1.ωcontains each symbol 1,2,...,n at least once; and2. the final symbol inωoccurs only once.[Check that you understand this!] For any suchω, the probability is just Pr[ω] =1ni, where i is the lengthofω(why?). However, it is very hard to figure out how many sample pointsωare of length i (try it for thecase n = 3). So we will have a hard time figuring out the distribution of the random variable X, which is thelength of the sequence (i.e., the number of boxes bought).Fortunately, we can compute the expectation E[X] very easily, using (guess what?) linearity of expectation,plus the fact we have just learned about the expectation of the geometric distribution. As usual, we wouldlike to writeX = X1+ X2+ ... +
View Full Document