Expected Values, Standard Errors, Central Limit TheoremStatistical inferenceStatistics vs probability modelingCoin tossStudy of chance errorLaw of averagesSlide 7Chance processesBox modelsBox ModelConstructing Box modelsDuke donor exampleBox models: typical questionsCharacteristics of alumni donationsLearning about the sample sumSample sum of donations for 100 alumniBox model for binary (dichotomous) outcomesSample number of donators out of 100 alumniSlide 19Chance error / Standard ErrorA problem from the textDifference between SD and SEEV and SE of the sample average or percentCommon theme for SE of sample average and sample percentageSample averages and percentagesSlide 26Shape of chance processParameters vs statisticsSampling distributionsSampling distribution constructionSilly exampleApproximating sampling distributionsCool appletCentral Limit TheoremThe central limit theoremThe Central Limit TheoremCentral Limit TheoremDoes CLT applyCentral Limit Theorem M&MsSize of sampleCLT and M&MsCLT household exampleSlide 43Alumni donations exampleSlide 45CLT under three conditionsSlide 47FPP 16-18Expected Values, Standard Errors, Central Limit TheoremStatistical inferenceUp to this point we have focused primarily on exploratory type statistical analyses (with a little probability thrown in).We will now dive into the realm of statistical inferenceThe ideas associated with sampling distributions, p-values, and confidence intervals are more abstract and are therefore slightly harderThese concepts are also very powerfulFor good if used correctlyFor bad if used incorrectlyStatistics vs probability modelingProbability: know the truth, want to estimate the chances that data occurStatistics: know the data that occur, want to infer about the truthCoin tossSuppose we tossed a coin 50 times. We are interested to know if this coin is fair. If the coin is fair then then a straightforward model that mimics reality is:# heads = 0.5(# of tosses)It should be fairly obvious that the number of heads won’t be exactly 25. How far away from 25 would convince us that the coin isn’t fair?Statistical model:# heads = 0.5(# of tosses) + chance errorThis chance error will help us answer the question how many heads is too many for the coin not to be fairWe will study this chance error quite rigorously.Study of chance errorPlan of attack for study of chance errorLaw of averagesSampling distributionsCentral limit theoremOur main tool will be so called “box models”Law of averagesWhat does the law of averages say? Toss a coinAs # of tosses increase the|#heads – 0.5(#tosses)| |%heads – 50%| In words:As the number of tosses goes upThe difference between the number of heads and half the number of tosses gets biggerThe difference between the percentage of heads and 50% gets smaller (if coin is fair)Law of averagesA die is thrown some number of times, and the object is to guess the total number of spots. There is a one-dollar penalty for each spot that the guess is off. For instance, if you guess 200 and the total is 215, you lose $15. Which do you prefer: 50 throws, or 100?Chance processesWhen tossing a coin: Actual #heads ≠ Expected #headsWhat is the likely size of the difference?Strategy: Find an analogy between the process being studied and drawing numbers at random from a box (box model)Box modelsA so called box model is a good starting point into statistical inferenceThe purpose of these very simple models is to analyze chance variabilityThey are a construction for learning about characteristics of populationsThey help us incorporate the probability techniques we learned in studying chance error.Box ModelA die is thrown some number of times, and the object is to guess the total number of spots. What is “typical” total number of spots after 50 throws. After 100 throws. Create a box model for thisConstructing Box modelsA quiz has 25 multiple choice questions. Each question has 5 possible answers, one of which is correct. A correct answer is worth 4 points, but a point is taken off for each incorrect answer. A student answers all of the questions by guessing randomly. What is the box model for this scenario?What is the “expected” score on the quiz?What is the range of scores?What is the SD of scores?Duke donor examplePopulation: 119,106 graduates of DukeVariable: donation amount in $$ to Duke Annual Fund in 2001Box model:make a ticket for every alumnus containing his/her donation amountPut all these tickets in a hypothetical box.Box models: typical questionsPick 100 tickets at random from the box, with replacement1. Before collecting the data, what do you expect the sum of these 100 alumni donations to equal?2. What do you think is a typical deviation from this expected value?1. We can answer these questions with a box model3. Before collecting the data how many of the 100 alumni people do you expect to be donators?4. What do you think is a typical deviation from this expected value?1. To answer these questions need another box modelCharacteristics of alumni donationsFor the 119,106 alumni:Average of all donations = $735SD of donations = $23,82742,938 donated (36%)76,168 did not donate (64%)Learning about the sample sumWhen we sample randomly, the sum of the 100 tickets will differ for different samplesWhat is the expected value (EV) of the sample sumE(sample sum) = n*(average of box) = n*(μ)What is a typical deviation of a sample sum from this expected valueStandard error (SE) of sum = *(SD of box) = € n€ n *σSample sum of donations for 100 alumniSo the sum of the 100 alumni donations should be: E(sample sum) = 100*($735) = $73,500 give or take the SESE =How sure are we about the sum of donations using a sample of 100?Key idea If we take independent samples of 100 alumni over and over again, recording the sum of each sample thenThe average of the sample sums should be around $73,500The SD of the sample sums should be around $238,270 € 100($23,827) = $238,270Box model for binary (dichotomous) outcomes42,938 donated and 76,168 did notMake a box with tickets comprised of 42,938 ones and 76,168 zeros. Average of box = % of ones = 0.36 = pSD of box = 0.48Short cut for SD for binary box models (and only for binary box models)Sample
View Full Document