Duke STA 101 - Expected Values, Standard - D1766436

Home> Schools> Duke University> Statistical Science (STA) > STA 101> Expected Values, Standard

DOC PREVIEW

Duke STA 101 - Expected Values, Standard

School name Duke University

Course Sta 101- Data Analy/stat Infer

Pages 47

This preview shows page 1-2-3-22-23-24-45-46-47 out of 47 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 47 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Expected Values, Standard Errors, Central Limit TheoremStatistical inferenceStatistics vs probability modelingCoin tossStudy of chance errorLaw of averagesSlide 7Chance processesBox modelsBox ModelConstructing Box modelsDuke donor exampleBox models: typical questionsCharacteristics of alumni donationsLearning about the sample sumSample sum of donations for 100 alumniBox model for binary (dichotomous) outcomesSample number of donators out of 100 alumniSlide 19Chance error / Standard ErrorA problem from the textDifference between SD and SEEV and SE of the sample average or percentCommon theme for SE of sample average and sample percentageSample averages and percentagesSlide 26Shape of chance processParameters vs statisticsSampling distributionsSampling distribution constructionSilly exampleApproximating sampling distributionsCool appletCentral Limit TheoremThe central limit theoremThe Central Limit TheoremCentral Limit TheoremDoes CLT applyCentral Limit Theorem M&MsSize of sampleCLT and M&MsCLT household exampleSlide 43Alumni donations exampleSlide 45CLT under three conditionsSlide 47FPP 16-18Expected Values, Standard Errors, Central Limit TheoremStatistical inferenceUp to this point we have focused primarily on exploratory type statistical analyses (with a little probability thrown in).We will now dive into the realm of statistical inferenceThe ideas associated with sampling distributions, p-values, and confidence intervals are more abstract and are therefore slightly harderThese concepts are also very powerfulFor good if used correctlyFor bad if used incorrectlyStatistics vs probability modelingProbability: know the truth, want to estimate the chances that data occurStatistics: know the data that occur, want to infer about the truthCoin tossSuppose we tossed a coin 50 times. We are interested to know if this coin is fair. If the coin is fair then then a straightforward model that mimics reality is:# heads = 0.5(# of tosses)It should be fairly obvious that the number of heads won’t be exactly 25. How far away from 25 would convince us that the coin isn’t fair?Statistical model:# heads = 0.5(# of tosses) + chance errorThis chance error will help us answer the question how many heads is too many for the coin not to be fairWe will study this chance error quite rigorously.Study of chance errorPlan of attack for study of chance errorLaw of averagesSampling distributionsCentral limit theoremOur main tool will be so called “box models”Law of averagesWhat does the law of averages say? Toss a coinAs # of tosses increase the|#heads – 0.5(#tosses)| |%heads – 50%| In words:As the number of tosses goes upThe difference between the number of heads and half the number of tosses gets biggerThe difference between the percentage of heads and 50% gets smaller (if coin is fair)Law of averagesA die is thrown some number of times, and the object is to guess the total number of spots. There is a one-dollar penalty for each spot that the guess is off. For instance, if you guess 200 and the total is 215, you lose $15. Which do you prefer: 50 throws, or 100?Chance processesWhen tossing a coin: Actual #heads ≠ Expected #headsWhat is the likely size of the difference?Strategy: Find an analogy between the process being studied and drawing numbers at random from a box (box model)Box modelsA so called box model is a good starting point into statistical inferenceThe purpose of these very simple models is to analyze chance variabilityThey are a construction for learning about characteristics of populationsThey help us incorporate the probability techniques we learned in studying chance error.Box ModelA die is thrown some number of times, and the object is to guess the total number of spots. What is “typical” total number of spots after 50 throws. After 100 throws. Create a box model for thisConstructing Box modelsA quiz has 25 multiple choice questions. Each question has 5 possible answers, one of which is correct. A correct answer is worth 4 points, but a point is taken off for each incorrect answer. A student answers all of the questions by guessing randomly. What is the box model for this scenario?What is the “expected” score on the quiz?What is the range of scores?What is the SD of scores?Duke donor examplePopulation: 119,106 graduates of DukeVariable: donation amount in $$ to Duke Annual Fund in 2001Box model:make a ticket for every alumnus containing his/her donation amountPut all these tickets in a hypothetical box.Box models: typical questionsPick 100 tickets at random from the box, with replacement1. Before collecting the data, what do you expect the sum of these 100 alumni donations to equal?2. What do you think is a typical deviation from this expected value?1. We can answer these questions with a box model3. Before collecting the data how many of the 100 alumni people do you expect to be donators?4. What do you think is a typical deviation from this expected value?1. To answer these questions need another box modelCharacteristics of alumni donationsFor the 119,106 alumni:Average of all donations = $735SD of donations = $23,82742,938 donated (36%)76,168 did not donate (64%)Learning about the sample sumWhen we sample randomly, the sum of the 100 tickets will differ for different samplesWhat is the expected value (EV) of the sample sumE(sample sum) = n*(average of box) = n*(μ)What is a typical deviation of a sample sum from this expected valueStandard error (SE) of sum = *(SD of box) = € n€ n *σSample sum of donations for 100 alumniSo the sum of the 100 alumni donations should be: E(sample sum) = 100*($735) = $73,500 give or take the SESE =How sure are we about the sum of donations using a sample of 100?Key idea If we take independent samples of 100 alumni over and over again, recording the sum of each sample thenThe average of the sample sums should be around $73,500The SD of the sample sums should be around $238,270 € 100($23,827) = $238,270Box model for binary (dichotomous) outcomes42,938 donated and 76,168 did notMake a box with tickets comprised of 42,938 ones and 76,168 zeros. Average of box = % of ones = 0.36 = pSD of box = 0.48Short cut for SD for binary box models (and only for binary box models)Sample

View Full Document