DOC PREVIEW
Duke STA 101 - Central Limit Theorem

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

10/15/09 1 FPP 16-18 Expected values, standard errors, Central Limit Theorem Statistical inference  Up to this point we have focused primarily on exploratory statistical analysis  We know dive into the realm of statistical inference  The ideas associated with sampling distributions, p-values, and confidence intervals are more abstract and are therefore slightly harder  These concepts are also very powerful  For good if used correctly  For bad if used incorrectly10/15/09 2 Statistics vs probability modeling  Probability: know the truth, want to estimate the chances that data occur  Statistics: know the data that occur, want to infer about the truth Parameter μ Statistic Inference Sample Population € x Law of averages  What does the law of averages say?  Toss a coin  As # of tosses increase the  |#heads – 0.5(#tosses|   %heads – 50% 10/15/09 3 Chance processes  When tossing a coin:  Actual #heads ≠ Expected #heads  What is the likely size of the difference?  Strategy: Find an analogy between the process being studied and drawing numbers at random from a box (box model) Box models  A so called box model is a good starting point into statistical inference  The purpose of these very simple models is to analyze chance variability  They are a construction for learning about characteristics of populations10/15/09 4 Motivating example  Population: 119,106 graduates of Duke  Variable: donation amount in $$ to Duke Annual Fund in 2001  Box model:  make a ticket for every alumnus containing his/her donation amount  Put all these tickets in a hypothetical box. Box models: typical questions  Pick 100 tickets at random from the box, with replacement 1. Before collecting the data, what do you expect the sum of these 100 alumni donations to equal? 2. What do you think is a typical deviation from this expected value? 3. Before collecting the data how many of the 100 alumni people do you expect to be donators? 4. What do you think is a typical deviation from this expected value?10/15/09 5 Characteristics of alumni donations  For the 119,106 alumni:  Average of all donations = $735  SD of donations = $23,827  42,938 donated (36%)  76,168 did not donate (64%) Learning about the sample sum  When we sample randomly, the sum of the 100 tickets will differ for different samples  What is the expected value (EV) of the sample sum  E(sample sum) = n*(average of box)  What is a typical deviation of a sample sum from this expected value  Standard error (SE) of sum = *(SD of box) € n10/15/09 6 Sample sum of donations for 100 alumni  So the sum of the 100 alumni donations should be:  E(sample sum) = 100*($735) = $73,500  give or take the SE  SE =  How sure are we about the sum of donations using a sample of 100?  Key idea  If we take independent samples of 100 alumni over and over again, recording each sample then  The average of the sample sums should be around $73,500  The SD of the sample sums should be around $238,270 € 100($23,827) = $238,270Box model for binary (dichotomous) outcomes  42,938 donated and 76,168 did not  Make a box with tickets comprised of 42,938 ones and 76,168 zeros.  Average of box = % of ones = 0.36 = p  SD of box = 0.48  Short cut for SD for binary box models (and only for binary box models)  Sample 100 tickets out of the box with replacement.  What does this process remind you of?10/15/09 7 Sample number of donators out of 100 alumni  The number of donators in the sample equals the sample sum of the 0-1 tickets  Thus, the expected number of donators is EV of sample sum = n * (Average of box) = 100 * 0.36 = 36  The typical deviation of the sample sum for expected value is The Standard error (SE) of sum = * (SD of box) = 10 * .48 = 4.8 € nSample number of donators out of 100 alumni  Hence, the number of alumni who donated out of a random sample of 100 should be 36, give or take around 5 people (SD = 4.8).  Compared to the average donation per alumni how “confident” are we that any give sample of 100 will produce 36 donors.  Key idea  If we take independent samples of 100 alumni over and over again, recording the number of donators in each sample  The average of the sample number of donators should be around 36  The SD of the sample numbers of donators should be around 4.810/15/09 8 A problem from the text  100 draws are made with replacement from a box containing the seven numbers 101 102 103 104 105 106 107  Suppose you were betting. The closer your guess is to the sample sum, the more money you win. What number would you guess?  How much would you expect the sample sum to be off from the expected value of the sum? Difference between SD and SE  SD is the typical deviation from the average in a box. SD is a property of the box; it doesn’t depend on a random sampling  SE is the typical deviation from the expected value in a random sample. SE results from random sampling  SE gives an idea of how large the chance error is  Sum of draws is likely to be around its expected value, but to be off by a chance error similar in size to its SE  Sum of draws = EV ± chance error10/15/09 9 EV and SE of the sample average or percent  Since sample average(percent) = sample sum /n we get 1. Just like sample sums, sample averages and sample percentages are subject to chance variation 2. EV for sample average ( or %) = EV of sample sum / n = Avg. of box. 3. SE for sample average (or %) = SE for sample sum / n = SD of box /√n Common theme for SE of sample average and sample percentage  Fir a binary variable, the population SD =  So both the sample average and sample percentage have a standard error of the form  SE = Population SD / € n10/15/09 10 Sample averages and percentages  In a random sample of 100 alumni, we expect the sample


View Full Document

Duke STA 101 - Central Limit Theorem

Download Central Limit Theorem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Central Limit Theorem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Central Limit Theorem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?