DOC PREVIEW
Berkeley COMPSCI 294 - The Likelihood Principle

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ISyE8843A, Brani Vidakovic Handout 21 The Likelihood PrincipleLikelihood principle concerns foundations of statistical inference and it is often invoked in arguments aboutcorrect statistical reasoning.Let f(x|θ) be a conditional distribution for X given the unknown parameter θ. For the observed data,X = x, the function `(θ) = f(x|θ), considered as a function of θ, is called the likelihood function.The name likelihood implies that, given x, the value of θ is more likely to be the true parameter than θ0if f(x|θ) > f(x|θ0).Likelihood Principle. In the inference about θ, after x is observed, all relevantexperimental information is contained in the likelihood function for the observed x.Furthermore, two likelihood functions contain the same information about θ if theyare proportional to each other.Remark. The maximum-likelihood estimation does satisfy the likelihood principle.Figure 1: Leonard Jimmie Savage; Born: November 20, 1917, Detroit, Michigan; Died: November 1, 1971,New Haven, ConnecticutThe following example quoted by Lindley and Phillips (1976) is an argument of Leonard Savage dis-cussed at Purdue Symposium 1962. It shows that the inference can critically depend on the likelihoodprinciple.Example 1: Testing fairness. Suppose we are interested in testing θ, the unknown probability of heads forpossibly biased coin. Suppose,H0: θ = 1/2 v.s. H1: θ > 1/2.1An experiment is conducted and 9 heads and 3 tails are observed. This information is not sufficient to fullyspecify the model f(x|θ). A rashomonian analysis follows:• Scenario 1: Number of flips, n = 12 is predetermined. Then number of heads X is binomial B(n, θ),with probability mass functionPθ(X = x) = f(x|θ) =µnx¶θx(1 − θ)n−x=µ129¶θ9(1 − θ)3= 220 · θ9(1 − θ)3.For a frequentist, the p-value of the test isP (X ≥ 9|H0) =12Xx=9µ12x¶(1/2)x(1 − 1/2)12−x=1 + 12 + 66 + 220212= 0.073,and if you recall the classical testing, the H0is not rejected at level α = 0.05.• Scenario 2: Number of tails (successes) α = 3 is predetermined, i.e, the flipping is continued until 3tails are observed. Then, X - number of heads (failures) until 3 tails appear is Negative Binomial1NB(3, 1−θ),f(x|θ) =µα + x − 1α − 1¶(1 − θ)α[1 − (1 − θ)]x=µ3 + 9 − 13 − 1¶(1 − θ)3θ9= 55 · θ9(1 − θ)3.For a frequentist, large values of X are critical and the p-value of the test isP (X ≥ 9|H0) =∞Xx=9µ3 + x − 12¶(1/2)x(1/2)3= 0.0327.sinceP∞x=k¡2+x2¢12x=8+5k+k22k.The hypothesis H0is rejected, and this change in decision not caused by observations.According to Likelihood Principle, all relevant information is in the likelihood `(θ) ∝ θ9(1 − θ)3, andBayesians could not agree more!Edwards, Lindman, and Savage (1963, 193) note: The likelihood principle emphasized in Bayesianstatistics implies, among other things, that the rules governing when data collection stops are irrelevant todata interpretation. It is entirely appropriate to collect data until a point has been proven or disproven, oruntil the data collector runs out of time, money, or patience.2 SufficiencySufficiency principle is noncontroversial, and frequentists and Bayesians are in agreement. If the inferenceinvolving the family of distributions and parameter of interest allows for a sufficient statistic then the suffi-cient statistic should be used. This agreement is non-philosophical, it is rather a consequence of mathematics(measure theoretic considerations).1Let p be the probability of success in a trial. The number of failures in a sequence of trials until rth success is observed isNegative Binomial N B(r, p) with probability mass functionP (X = x) =Ãr + x − 1r − 1!pr(1 − p)x, x = 0, 1, 2, . . .For r = 1 the Negative Binomial distribution becomes the Geometric distribution, N B(1, p) ≡ G( p) .2Suppose that a distribution of random variable X depends on the unknown parameter θ. A statisticsT (X) is sufficient if the conditional distribution of X given T(X) = t is free of θ.The Fisher-Neyman factorization lemma states that the likelihood can be represented as`(θ) = f(x|θ) = f(x)g(T (x), θ),Example. Let X1, . . . , Xnis a sample from uniform U(0, θ ) distribution with the density f (x|θ) =1θ1(0 ≤x ≤ θ). Then`(θ) =nYi=1f(Xi|θ) =1θn1(0 ≤ mini{Xi})1(maxi{Xi} ≤ θ)The statistics T = maxi{Xi}is sufficient. Here, f(x) = 1(0 ≤ mini{xi}) and g(T, θ) =1θn1(mini{xi} ≤θ).If the likelihood principle is adopted, all inference about θ should depend on sufficient statistics since`(θ) ∝ g(T (x), θ).Sufficiency Principle. Let the two different observations x and y have the samevalues T (x) = T(y), of a statistics sufficient for family f (·|θ). Then the inferencesabout θ based on x and y should be the same.3 Conditionality PerspectiveConditional perspective concerns reporting data specific measures of accuracy. In contrast to the frequentistapproach, performance of statistical procedures are judged looking at the observed data. The difference inapproach is illustrated in the following example.Example 2. Consider estimating θ in the modelPθ(X = θ − 1) = Pθ(X = θ + 1), θ ∈ R,on basis of two observations, X1and X2.The procedure suggested isδ(X) =½X1+X22, if X16= X2X1− 1, if X1= X2To a frequentist, this procedure has confidence of 75% for all θ, i.e., P (δ(X) = θ) = 0.75.The conditionalist would report the confidence of 100% if observed data in hand are different (easy tocheck!) or 50% if the observations coincide. Does it make sense to report the preexperimental accuracywhich is known to be misleading after observing the data?Conditionality Principle. If an experiment concerning the inference about θ is cho-sen from a collection of possible experiments, independently of θ, then any experi-ment not chosen is irrelevant to the inference.3Example: [From Berger (1985), a variant of Cox (1958) example.] Suppose that a substance to be analyzedis to be sent to either one of two labs, one in California or one in New York. Two labs seem equally equippedand qualified and a coin is flipped to decide which one will be chosen. The coin comes up tails, denoting thatCalifornia lab is to be chosen. After the results are returned back and report is to be written, should reporttake into account the fact that coin did not land up heads and that New York laboratory could have beenchosen. Common sense and conditional view point say NO, but the frequentist approach calls for averagingover all


View Full Document

Berkeley COMPSCI 294 - The Likelihood Principle

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download The Likelihood Principle
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Likelihood Principle and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Likelihood Principle 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?