Berkeley COMPSCI 294 - The Likelihood Principle (6 pages)

Previewing pages 1, 2 of 6 page document View the full content.
View Full Document

The Likelihood Principle



Previewing pages 1, 2 of actual document.

View the full content.
View Full Document
View Full Document

The Likelihood Principle

87 views


Pages:
6
School:
University of California, Berkeley
Course:
Compsci 294 - Special Topics
Special Topics Documents
Unformatted text preview:

ISyE8843A Brani Vidakovic 1 Handout 2 The Likelihood Principle Likelihood principle concerns foundations of statistical inference and it is often invoked in arguments about correct statistical reasoning Let f x be a conditional distribution for X given the unknown parameter For the observed data X x the function f x considered as a function of is called the likelihood function The name likelihood implies that given x the value of is more likely to be the true parameter than 0 if f x f x 0 Likelihood Principle In the inference about after x is observed all relevant experimental information is contained in the likelihood function for the observed x Furthermore two likelihood functions contain the same information about if they are proportional to each other Remark The maximum likelihood estimation does satisfy the likelihood principle Figure 1 Leonard Jimmie Savage Born November 20 1917 Detroit Michigan Died November 1 1971 New Haven Connecticut The following example quoted by Lindley and Phillips 1976 is an argument of Leonard Savage discussed at Purdue Symposium 1962 It shows that the inference can critically depend on the likelihood principle Example 1 Testing fairness Suppose we are interested in testing the unknown probability of heads for possibly biased coin Suppose H0 1 2 v s H1 1 2 1 An experiment is conducted and 9 heads and 3 tails are observed This information is not sufficient to fully specify the model f x A rashomonian analysis follows Scenario 1 Number of flips n 12 is predetermined Then number of heads X is binomial B n with probability mass function n x 12 9 n x P X x f x 1 1 3 220 9 1 3 x 9 For a frequentist the p value of the test is 12 X 12 1 12 66 220 P X 9 H0 1 2 x 1 1 2 12 x 0 073 x 212 x 9 and if you recall the classical testing the H0 is not rejected at level 0 05 Scenario 2 Number of tails successes 3 is predetermined i e the flipping is continued until 3 tails are observed Then X number of heads failures until 3 tails appear is Negative Binomial1 N B 3 1 x 1 3 9 1 x f x 1 1 1 1 3 9 55 9 1 3 1 3 1 For a frequentist large values of X are critical and the p value of the test is P X 9 H0 X 3 x 1 1 2 x 1 2 3 0 0327 2 x 9 2 x P 1 8 5k k2 since x k 2 2x 2k The hypothesis H0 is rejected and this change in decision not caused by observations According to Likelihood Principle all relevant information is in the likelihood 9 1 3 and Bayesians could not agree more Edwards Lindman and Savage 1963 193 note The likelihood principle emphasized in Bayesian statistics implies among other things that the rules governing when data collection stops are irrelevant to data interpretation It is entirely appropriate to collect data until a point has been proven or disproven or until the data collector runs out of time money or patience 2 Sufficiency Sufficiency principle is noncontroversial and frequentists and Bayesians are in agreement If the inference involving the family of distributions and parameter of interest allows for a sufficient statistic then the sufficient statistic should be used This agreement is non philosophical it is rather a consequence of mathematics measure theoretic considerations 1 Let p be the probability of success in a trial The number of failures in a sequence of trials until rth success is observed is Negative Binomial N B r p with probability mass function r x 1 r P X x p 1 p x x 0 1 2 r 1 For r 1 the Negative Binomial distribution becomes the Geometric distribution N B 1 p G p 2 Suppose that a distribution of random variable X depends on the unknown parameter A statistics T X is sufficient if the conditional distribution of X given T X t is free of The Fisher Neyman factorization lemma states that the likelihood can be represented as f x f x g T x Example Let X1 Xn is a sample from uniform U 0 distribution with the density f x 1 1 0 x Then n Y f Xi i 1 1 1 0 min Xi 1 max Xi i i n The statistics T maxi Xi is sufficient Here f x 1 0 mini xi and g T 1 n 1 mini xi If the likelihood principle is adopted all inference about should depend on sufficient statistics since g T x Sufficiency Principle Let the two different observations x and y have the same values T x T y of a statistics sufficient for family f Then the inferences about based on x and y should be the same 3 Conditionality Perspective Conditional perspective concerns reporting data specific measures of accuracy In contrast to the frequentist approach performance of statistical procedures are judged looking at the observed data The difference in approach is illustrated in the following example Example 2 Consider estimating in the model P X 1 P X 1 R on basis of two observations X1 and X2 The procedure suggested is X X1 X2 2 X1 1 if X1 6 X2 if X1 X2 To a frequentist this procedure has confidence of 75 for all i e P X 0 75 The conditionalist would report the confidence of 100 if observed data in hand are different easy to check or 50 if the observations coincide Does it make sense to report the preexperimental accuracy which is known to be misleading after observing the data Conditionality Principle If an experiment concerning the inference about is chosen from a collection of possible experiments independently of then any experiment not chosen is irrelevant to the inference 3 Example From Berger 1985 a variant of Cox 1958 example Suppose that a substance to be analyzed is to be sent to either one of two labs one in California or one in New York Two labs seem equally equipped and qualified and a coin is flipped to decide which one will be chosen The coin comes up tails denoting that California lab is to be chosen After the results are returned back and report is to be written should report take into account the fact that coin did not land up heads and that New York laboratory could have been chosen Common sense and conditional view point say NO but the frequentist approach calls for averaging over all possible data even the possible New York data The conditionality principle makes clear the implication of the likelihood principle that any inference should depend only on the outcome observed and not on any other outcome we might have observed and thus sharply contrasts with the method of likelihood inference from the Neyman Pearson or more generally from a frequentist approach In particular questions of unbiasedness minimum variance and risk consistency the whole apparatus of confidence intervals significance levels and power of tests etc violate the conditionality …


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view The Likelihood Principle and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Likelihood Principle and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?