DOC PREVIEW
Duke STA 101 - Bayesian Statistics

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

22.0 Bayesian Statistics• Answer Questions• Review Midterm• Bayesian Inference122.1 Bayesian InferenceRecall Bayes’ Theorem:P (A1|B) =P (B|A1) ∗ P (A1)Pki=1P (B|Ai) ∗ P (Ai)where the A1, . . . , Akare mutually exlcusive andP (A1or A2or · · · or Ak) = 1.This is a formalism for how we learn. P (A1) the prior probability of A1,given information we have before event B. Then we combine our priorprobability with the new information on event B through P (B|A1) to getour new opinion about the probabilty of A1, written as P (A1|B).P (A1) is our prior opinion, and P (A1|B) is our posterior opinion.2To review the use of the formula, remember the breathalyzer example.Suppose 20% of the people on the road after 2 a.m. are legally drunk.Suppose a police officer stops someone at random and admnisters atbreathalyzer test. The breathalyzer test has probability .95 of identifyinga person who is legally drunk, and probability .1 of misidentifying aperson who is not legally drunk.When the officer makes the stop, his prior probability of drunkeness is.2, the proportion of drivers who are drunk. How should that opinionchange if the breathalyzer is positive?Here A1is the event that a person is drunk, and B is the event that thetest is positive. The A2is the event that the person is not drunk—this ismutually exclusive of A1, and the probability of A1or A2is 1.3We apply Bayes’ Theorem. ClearlyP (A1|B) = P ( Drunk | positive test )=P (B|A1) ∗ P (A1)Pki=1P (B|Ai) ∗ P (Ai)=P ( pos. test | Drunk ) ∗ P ( Drunk )P ( pos | D ) ∗ P ( D ) + P ( pos | not D ) ∗ P ( not D )=.95 ∗ .2.95 ∗ .2 + .1 ∗ .8= .7037So after failing the test, the police office should believe you have about a70% chance of being drunk.4The frequentist paradigm:• sees science as objective• defines probability as a long-run frequency in independent, identicaltrials• looks at parameters (i.e., the true mean of the population, the trueprobability of heads) as fixed quantitiesThis paradigm leads one to specify the null and alternative hypotheses,collect the data, calculate the significance probability under theassumption that the null is true, and draw conclusions based on thesesignificance probabilities using the size of the observed effects to guidedecisions.5The Bayesian paradigm:• sees science as subjective• defines probability as a subjective belief (which must be consistentwith all of one’s other beliefs)• looks at parameters (i.e., the true mean of the population, the trueprobability of heads) as random quantities because we can neverknow them with certainty.This paradigm leads one to specify plausible models, to assign a priorprobability to each model, to collect data, to calculate the probabilityof the data under each model, to use Bayes’ theorem to calculate theposterior probability of each model, and to make inferences based onthese posterior probabilities. The posterior probabilities enable oneto make predictions about future observations and one uses one’slossfunctionto make decisions that minimize the probable loss.622.2 RU486 ExampleThe “morning after” contraceptive RU486 was tested in a clinical trial inScotland. This discussion slightly simplifies the design.Assume 800 women report to a clinic; they have each had sex withinthe last 72 hours. Half are randomly assigned to take RU486; half arerandomly given the conventional theory (high doses of estrogen andsynthetic progesterone).Among the RU486 group, none became pregnant. Among theconventional therapy group, there were 4 pregnancies. Does thisinformation show that RU486 is more effective than conventionaltreatment?We shall compare the frequentist and Bayesian approaches.7If the two therapies (R and C, for RU486 and conventional) are equallyeffective, then the probability that an observed pregnacy came from theR group is the proportion of women in the R group, or .5; letp = P [ an observed pregnancy came from group R ].A frequentist wants to make a hypothesis test. Specifically,H0: p ≥ .5 vs. HA: p < .5If the evidence supports the alternative, then RU486 is more effectivethan conventional treatment.The data are 4 observations from a binomial, where p is the probabilitythat a pregnancy is from group R.How do we calculate the significance probability?8The significance probability is the chance of observing a result as or moresupportive of the alternative than the one in the sample, when the nullhypothesis is true.Our sample had no children from the RU486 group, which is assupportive as we could have. SoP − value = P [ 0 successes in 4 tries |H0] = (1 − .5)4= .0625.Most frequentists would fail to reject, since .0625 > .05.Suppose we had observed 1 pregnancy in the R group. What would theP-value be then?9In the Bayesian analysis, we begin by listing the models we considerplausible. For example, suppose we thought we had no information apriori about the probability that a child came from the R group. In thatcase all values of p between 0 and 1 would be equally likely.Without calculus we cannot do that case, so let us approximate it byassuming that each of the following values for p.1, .2, .3, .4, .5, .6, .7, .8, .9is equally likely. So we consider 9 models, one for each value of theparameter p.If we picked one of the models, say with p = .1, then that means theprobability of a sampole pregnancy coming from the R group is .1, and.9 that it comes from the C group. But we are not sure about the model.10Model Prior P(data—model) Product Posteriorp P[model] P[k = 0 — p] P[model — data].1 1/9 .656 .0729 .427.2 1/9 .410 .0455 .267.3 1/9 .240 .0266 .156.4 1/9 .130 .0144 .084.5 1/9 .063 .0070 .041.6 1/9 .026 .0029 .017.7 1/9 .008 .0009 .005.8 1/9 .002 .0002 .001.9 1/9 .000 .0000 .0001 .1704 111So the most probable of the nine models has p = .1. And the probabilitythat p < .5 is .427 + .267 + .156 + .084 = .934.Note that in performing the Bayes calculation,• We were able to find the probability that p ≤ .5, which we could notdo in the frequentist framework.• In calculating this, we used only the data that were observed. Datathat were more extreme than what we observed plays no role in thecalculation or the logic.• Also note that the prior probability of p = .5 dropped from 1/9=.111to .041. This illustrates how our prior belief changes after seeing thedata.Suppose a new person analyzes the same data. But their prior does notput equal weight on the 9 models; they put weight .52 on p = .5 andequal weight on the


View Full Document

Duke STA 101 - Bayesian Statistics

Download Bayesian Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?