122S:138Bayes factorsLecture 19Nov. 14, 2008Kate Cowles374 SH, [email protected] factors for model comparison andhypothesis testing• simplest case: null and alternative hypothe-ses both simple• equivalently: comparing two models that dif-fer according to point values of one parame-ter3Bayes’ rule applied to the example (from lec-tures 1)You take the blood test and the result is positive (+).This is the data or observation.MODELPrior Like Product Posteriorfor +Have disease .001 .95 .00095 .019Don’t have disease .999 .05 .04995 .981.05090 1Hypothesis-testing view:H0: p = 0.05HA: p = 0.95“simple hypotheses” regarding probability of positive test4priorModel description probabilityM0don’t have disease .999M1have disease .001EquivalentlypriorHypothesis description probabilityH0p = .05 .999H1p = .95 .001Prior od ds in favor of Model 1 vs. Model 0:P r(M1)P r(M0=.001.999=1999Bayes factor in favor of Model 1 vs. Model 0BF10=P r(data|M1)P r(data|M0)=.95.05= 19where “data” is posi tive test5Bayes factor in simple/simple case• BF10is weight of evidence contained in thedata in favor of M1vs M0• usually rep o rted on log10scale• interpretation (Kass and Raftery, JASA, 1995)log10(B10) B10Evidence against H0(or M0)0 to 1/2 1 to 3.2 Not worth more than bare mention1/2 to 1 3.2 to 10 Substantial1 to 2 10 to 100 Strong> 2 > 100 Decisive6Posterior probabilities and posterior odds• posterior odds in exampleP r(M1|data)P r(M0|data)=.019.981= .0194• relationship among BF, posterior odds andprior odds in si mple/simple caseBF10=P r(M1|data)P r(M0|data)P r(M1)P r(M0)BF is ratio of posterior odds to prior odds7Before considering more general case,recall Bayes’ rule:p(θ|y) =p(θ)p(y|θ)Rp(θ)p(y|θ)dθDenominator is:Rp(θ)p(y|θ)dθ=Rp(θ, y)dθ= p(y)• the “marginal likelihood” of the data• depends on– da ta– mod el (form of likelihood and prio r)8More general caseTo compare two competin g models, M1and M0:• Compute the marginal likelihood of the dataunder each model– let θ1= parameters under M1– let θ0= parameters under M0p(y|M1) =Zp(θ1)p(y|θ1)dθ1p(y|M0) =Zp(θ0)p(y|θ0)dθ0BF10=p(y|M1)p(y|M0)9More general caseH0: θ ∈ Θ0H1: θ ∈ Θ1Bayesian hypothe si s test involves calculating pos-terior probabilities:P (Θ0|y)P (Θ1|y)10Example• Child is gi ven intellignce test, with resultingscore Y.• Y ∼ N(θ, 100)– whe re θ represents child’s own true IQ– 100 is varia nce if same child takes repeatedIQ tests of the same kind• in population as a whole, IQ scores are dis-tributed as:θ ∼ N (100, 225)• if child scores y = 115, then posterior distri-bution of θ is:(θ|y) ∼ N(110.4, 69.2)11Example continuedH0: θ ≤ 100H1: θ > 10Prior proba bilities and pri or odds:p(θ) = N(100, 2 2 5)− > P r(θ ≤ 100) = .5P r(θ > 100) = .5Prior od ds -0.50.5= 1Posterior prob a bilities and o dds:P r(θ ≤ 100|y) = .106P r(θ > 100|y) = .89412posterior odds0.1060.894= 0.119Bayes factor in favor of H0vs H1isBF01=0.1060.8940.50.5= 0.119Baeys factor in favor of H1vs H0isBF10= 8.4413Bayesian hypothesis testing and frequen-tist p-values• In one-sided testing situations like this, fre-quentist p-va lue will sometimes have a Bayesianjustification.• Example:– no rmal likelihood, variance knownY ∼ N (θ, σ2)– no ninformative priorp(θ) ∝ 1– posteriorp(θ|y) ∼ N(y, σ2)14• HypothesesH0: θ ≤ θ0H1: θ > θ0• posterior probability of H0P r(θ ≤ θ0|y) = Φθ0− yσ• classical p-valuepval = P r(Y ≥ y|θ = θ0)= 1 − Φy − θ0σ• By symmetry of normal distributio n, P r(θ ≤θ0|y) = p-value against H015Testing a point null hypothesis• common in frequentist practiceH0: θ = θ0H1: θ 6= θ0• where θ could have any value on a continuum• Bayesian answers may differ radically fromfrequentist answers• almost never do we seriously consider thatθ = θ0exactly• more reasonable:H0: θ ∈ (θ0− b, θ0+ b)for some small b“region of indifference”16Consider Bayesian test of point null soas to compare with frequentistH0: θ = θ0H1: θ 6= θ0• cannot use a continuous prior on θWhy?• reasonable approach to constructing a prior– put positive prior probability on θ0P r(θ = θ0) = π0> 0– gi ve {θ : θ 6= θ0} the prior(1 − π1)g1(θ)where∗ g1is a proper de nsity17Bayesian analysis• let f(y|θ) denote the sampling density of y• then the marginal likelihood ism(y) = f(y|θ0)π0+ m1(y)(1 − π0)wherem1(y) =Zθ6=θ0f(y|θ)g1(θ)dθ18• so posterior proba bility of H0isP r(θ = θ0|y) =f(y|θ0)π0f(y|θ0)π0+ m1(y)(1 − π0)• posterior odds in favor of H0vs H1π01 − π0f(y|θ0)m1(y)• and the Bayes factor in favor of H0vs H1isBF01=f(y|θ0)m1(y)19Example: child’s intelligence• sampling distribution of dataf(y|θ) = N(θ, σ2= 100)• hypoth eses to be testedH0: θ = 100H1: θ 6= 100• priorsP r(θ = 100) = π0= 0.5g1(θ) = N (µ0, σ20) = N (100, 100)– no te: prior mean µ0= θ0(value from H0)– pri o r variance σ20equals variance of sam-pling di stribution20Example continued• What statistical test would a frequentist usewhen sampling distribution is assumed to benormal with known variance?• results for frequentist test with different pos-sible data values yfrequentisty z p-value P r(H0|y)116.45 1.645 0.1 0.42119.60 1.960 0.05 0.35125.76 2.576 0.01 0.21132.91 3.291 0.001 0.08621Similar table for different sample sizes• Table 4.2, p. 151 fromBerger, JO (1985) Statistical Decision The-ory and Bayesian Analysis, 2nd ed., NewYork, Springer-Verla g• applies when σ2is assumed kn own, µ0= θ0,π0= 0.5, σ20=
View Full Document