Duke STA 101 - CentralLimitTheorem - D906221

Home> Schools> Duke University> Statistical Science (STA) > STA 101> CentralLimitTheorem

DOC PREVIEW

Duke STA 101 - CentralLimitTheorem

School name Duke University

Course Sta 101- Data Analy/stat Infer

Pages 14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

13.0 Central Limit Theorem• Discuss Midterm/Answer Questions• Box Models• Expected Value and Standard Error• Central Limit Theorem113.1 Box ModelsA Box Model describes a process in terms of making repeated draws,with replacement, from a box containing numbers.Since draws are made with replacement, the outcomes in a series ofdraws are independent. The value on the first draw does not affect thevalue on the second.Box models describe: repeated rolls of a die, repeated tosses of a coin(fair or unfair), drawing a random sample of people and getting theirheights or whom they plan to vote for.2For a box model, the expected value is the average of the numbers in thebox.For categorical data, such as H or T in coin-tossing, or voting for Bushor Kerry, one averages zeroes and ones. When averaging zeroes and ones,the result is just a proportion, or the total number of people voting forKerry divided by the total number of people whom you ask.TheLaw of Averages says that if one makes many draws from the boxand averages the results, that average will converge to the expected valueof the box.But the Law of Averages says nothing about the outcome on the nextdraw.313.2 Expected Value and Standard ErrorSuppose that a box contains B numbers, X1, . . . , XB. Then the expectedvalue of the box is isEV =¯X =1BBXi=1Xiand the standard deviation of the box issd =r1n[(X1− EV )2+ ··· + (XB− EV )2]=vuut 1nBXi=1X2i!− EV2.This is just the same as calculating the mean and standard deviation fora list of numbers.4The standard error for the average of n draws from a box (withreplacement) is:se =sd√n.The standard error is the likely size of the difference between the averageof n draws from the box and the expected value of the box.Note that as n → ∞, the standard error se goes to zero. This is a formalstatement of the Law of Averages. It means that the sample average is agood estimate of the average in the box, and the accuracy of the estimateimproves as you take more and more draws from the box.5The standard error for the sum of n draws from a box (with replacement)is:se =√n × sd.Analgously, the standard error is the likely size of the difference betweenthe sum of n draws from the box and n times the expected value of thebox.Note that as n → ∞, this standard error does not go to zero. This meansthat as the number of draws increases, the likely difference between thesum of the draws and nEV gets larger, rather than smaller.This concern about the sum, rather than the average, arises in contextssuch as investment or gambling, where the total return from multipletrials is important, not the average return.613.3 The Central Limit TheoremThe Central Limit Theorem is one of the high-water marks ofmathematical thinking. It was worked upon by James Bernoulli,Abraham de Moivre, and Alan Turing. Over the centuries, the theoryimproved from special cases to a very general rule.Essentially, the Central Limit Theorem allows one to describe howaccurately the Law of Averages works. Most people have a goodintuitive understanding of the Law of Averages, but in many cases it isimportant to determine whether a particular size of deviation betweenthe sample mean and the (usually unknown) expected value is probableor improbable. That is, what is the chance that the sample average ismore than d away from the true EV?7Formally, the Central Limit Theorem for averages says:¯X − EVsd/√n˙∼ N(0, 1)where¯X is the average of n draws, EV is the expected value of the box,and sd is the standard deviation of the box.This means that the left-hand side is a random number that isapproximately normal with mean zero and standard deviation one.The approximation gets better as n gets larger.Modifications of this formula hold for many other situations, e.g., whenthere is a little dependence, or when the box changes from draw to draw.8A version of the Central Limit holds for sums:n¯X − nEV√n sd˙∼ N(0, 1).Note that n¯X is just the sum of the draws from the box. (This shouldbe obvious to everyone.)This formula is useful when calculating the chance of winning a givenamount of money when gambling, or getting more than a specific scoreon a test.With these two central limit formulas, one can answer all sorts ofpractical questions.9Problem 1: You want to estimate the average income of people inDurham. You take a random sample of 100 households, and find that¯X is $42,000 and the sample sd is $5,000. What is the (approximate)probability that the true mean household income in Durham is morethan $42,500?• What is the box model for this problem?• What is the expected value?• What is the standard deviation?Note that in order to solve this, we have to assume that the standarddeviation of the sample is equal to the sd of the box. In practice, thereis a very easy way to handle this, but we will not talk about that untillater in the course.10P[EV > 42, 500] = P[−EV < −42, 500]= P[¯X − EV <¯X − 42, 500]= P[¯X − EVsd/√n<¯X − 42, 500sd/√n]˙= P[Z <¯X − 42, 500sd/√n]= P[Z < (42, 000 − 42, 500)/(5000/10)]= P[Z < −1]From the standard normal table, we know this has chance (1/2) (100 -68.27) = 15.865%, so the probability of the estimate being too low by$500 is just .15865.11Problem 2: You are playing Red and Black in roulette. (A roulette wheelhas 38 pockets; 18 are red, 18 are black, and 2 are green—the house takesall the money on green). You pick either red or black; if the ball lands inthe color you pick, you win a dollar. Otherwise you lose a dollar.Suppose you make 100 plays. What is the chance that you lose $10 ormore?What is the box model?.12There are 38 tickets, and 18 are labelled 1 and the 20 are labelled -1.So the expected value of the box isEV =138[1 + 1 + ··· + 1 + (−1) + (−1) + ··· + (−1)]=138[−2]= −1/19.The standard deviation of the box issd =r(138Xi = 138X2i) − EV2=p1 − (−1/19)2= .998614.13The probability of losing more than $10 or more in 100 plays isP[sum < −10] = P[sum − nEV < −10 − nEV ]= P[sum − nEV√nsd<−10 − nEV√nsd]˙= P[Z <−10 − nEV√nsd]= P[Z < [−10 − (100)(−1/19)]/(10 ∗ .998614)]= P[Z < .47434].From the standard normal table, the chance of this is about 1/2(100 -34.73)%, so the probability is

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 14 pages.

Duke STA 101 - CentralLimitTheorem

Sign up for free to view:

Please select your school