Berkeley STAT 157 - Coincidences, near misses and one-in-a-million chances

Unformatted text preview:

Chapter 18Coincidences, near missesand one-in-a-million chances18.1 The birthday problem and its relativesThe birthday problem (W) – often called the birthday paradox –isdescribedin almost every textbook and popular science account of probability. Mystudents know the conclusionwith 23 people in a room, there is roughly a 50% chance thatsome two will have the same birthday.Rather than repeat the usual “exact” calculation I will show how to do someback-of-an-envelope calculations, in section 18.2 below. Starting from thisresult there are many directions we could go, so let me point out five ofthese.It really is a good example of a quantitative predic ti on that one couldbet money on. In class, and in a popular talk, I show the active roster ofa baseball team1which conveniently has 25 players and their birth dates.The predicted chance of a birthday coinci de nc e is about 57% With 30 MLBteams one expects around 17 teams to have the coincidence; and one canreadily check this prediction in class in a minute or so (print out th e 30pages and distribute among students).1e.g. atlanta.braves.mlb.com/team/roster active.jsp?c id=atl; each MLB team has apage in the same format159160CHAPTER 18. COINCIDENCES, NEAR MISSES AND ONE-IN-A-MILLION CHANCESIt’s fun to ask students to suggest circumstances where the pre-diction might not be accurate. This is, if you ac tual l y see a gr ou p ofstrangers in a room and k now roughly why they are there – people rarelygo into rooms “at random” – what make you unsure of the validi ty of thestandard calculation? Two common suggestions are(i) if you see identical twins(ii) that the calculation in general may be inaccurate because of non-uniformityof population birth dates over the year.Point (i) is c le ar and point (ii) is di s cu sse d in the next section (plausiblelevels of non-uniformity tur n out to have negligible effect). Other circum-stances involve very creative imagination or arcane knowledge (a party ofCanadian profess i onal ice hockey players2). As mentioned above, it is a rareexample of a mathematically simple yet reliable model!It illustrates the theme “coincidences are more likely than youthink”. This is an important theme as regards people’s intuitive percep-tion of chance. But the b i r thd ay problem and other “small u ni verse” set-tings, where one can specify i n advance all the possible coincidences andtheir probabilities, are very remote f rom our notion of weird coincidencesin everyday life. A typical blurb for popular science books is “. . . explainshow coincidences are not surprising” while the author merely does the birth-day problem. This is surely not convincing to non-mathematicians. I willdrepeat this critique more forcefull y in section 18.6. My own (unsuccessful)attempt to do better is recounted in section 18.3.One can invent and solve a huge number of analogous math prob-ability problems and I show a glimpse of such problems in section 18.2.These can be engagi n g as recreational math and for illustrating mathemati-cal techniques – but I find it almost impossible to produce novel interestingdata to complement such theory.There is an opposite probl e m with sports data on “hot hands” for indi-vidual players, or winning/losing streaks for teams. Here there is plenty ofdata, but coming up with an accur at e chance model is difficult; saying thatwe see streaks longer th an predicted in an oversimplified chance model isnot telling us anything concrete about the world of sports.2who have substantial non-uniformity of birthdays. A 1985 paper Birthdate and successin minor hockey by Roger Barnsley and A. H. Thompson and subsequent work, popularizedin Gladwell’s Outliers, attributes this to the annual age cutoff for starting min o r hockey.18.2. USING THE POISSON APPROXIMATION IN SIMPLE MODELS16118.2 Using the Poisso n approximation in simplemodelsIn this section I want to make the pointmathematicians know how to do calc ul at i ons in “small universe”settings, where one can specify in advance all the possible coin-cidences and their probabi l i ti e s.In fact while mathematici ans have put gre at ingenuity into finding exactformulas, it is simpler and more informati ve to use approximate ones, basedon the informal Poisson approximation. If events A1,A2,... are roughlyindependent, and each has small probability, then the random number thatoccur has mean (exactl y ) µ =�iP (Ai) and distribution (approximately)Poisson(µ), soP (none of the events oc c ur) ≈ exp�−�iP (Ai)�. (18.1)Consider the birthday problem with k people and non-uniform distribu-tionpi= P (born of day i of the year).For each pair of p e opl e , the chance they have the same birthday is�ip2i,and there are�k2�pairs, so from (18.1)P (no birthday coincidence) ≈ exp�−�k2��ip2i�.Write median-k for the value of k that makes this probability close to 1/2(and th er ef ore makes the chance there is a coincidence close to 1/2). Wecalculatemedian-k ≈12+1.18/��ip2i.For the uniform distribution over N categories this becomesmedian-k ≈12+1.18√Nwhich for N = 365 gives the familiar answer 23.To illustrate robustness to non-uniformity, imagine hypothetically thathalf the categories were twice as likely as the other half, so pI=43Nor162CHAPTER 18. COINCIDENCES, NEAR MISSES AND ONE-IN-A-MILLION CHANCES23N. T he ap p roximation becomes12+1.12√N which for N = 365 becomes22. The smallness of the change might be considered another “paradox”,and is in f act atypical of combinatorial problems in gener al . In the couponcollector’s problem, for instance, the change would be much more noticable.Let me quickly mention two variants. If we ask for the coincidence ofthree people having the same birthday, then we can repeat the argumentabove to getP (no three-person birthday coincidence) ≈ exp�−�k3��ip3i�and then in the un i for m case,median-k ≈ 1+1.61N2/3which for N = 365 gives the less familiar answer 83.If instead of calendar days we have k events at independent uniformtimes during a year, and regard a coincide nc e as seein g two of these eventswithin 24 hours (not necessarily the same calendar day), then the chancethat a parti c ul ar two events are within 24 hour s is 2/N for N = 365, andwe can repeat the calculation for the birthday problem to getmedian-k ≈12+1.18�N/2 ≈ 16.Finding real-world instances where such theoretical predictions are applica-ble seems quite hard, in t h


View Full Document

Berkeley STAT 157 - Coincidences, near misses and one-in-a-million chances

Download Coincidences, near misses and one-in-a-million chances
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Coincidences, near misses and one-in-a-million chances and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Coincidences, near misses and one-in-a-million chances 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?