DOC PREVIEW
Berkeley INTEGBI 200B - Likelihood: Frequentist vs Bayesian Reasoning

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 NM Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic Models and Likelihood A model is a mathematical formula which gives you the probability of obtaining a certain result. For example imagine a coin; the model is that the coin has two sides and each side has an equal probability of showing up on any toss. Therefore the probability of tossing heads is 0.50. Models often have parameters, these are numerical variables that can take different values. Let's imagine that our coin does not have a 50% chance of turning up heads, but instead that the coin has probability α of turning up heads; α is now the only parameter in our coin flipping model. To calculate the probability of multiple independent outcomes of our model we multiply the probability of each outcome together. For example the probability of getting heads on the first flip and tails on the second flip would be α(1-α). In fact we can write a general formula for any combination of heads and tails: THTHP )1()|,(ααα−∝ Often in the literature the distinction between a model and it's parameters is not entirely clear. Sometimes “model” refers to just the formula without the parameter values plugged in and some times “model” refers to the formula with a specific value for all the parameters. As a practical matter, the formula is usually held constant, and what we actually consider are different possible values for the parameters. Thus when we talk about the likelihoods of different models we are actually talking abut the likelihoods of different sets of parameter values. The cases where we actually consider different formulas are called Model Tests and we will deal with that at the end of the lecture. Generally speaking I will use the word “model” to mean both the model and the parameters; until we start talking about hypothesis testing, then “model” will refer to just the formula. The likelihood of a model is the probability of the data given the model. Likelihood of Model and Parameters = P(Data | Model (Parameters) ) Up until now we have talked about the probabilities of outcomes given a model. However, what we are really interested in is picking the correct model. Calculating the probability of the model is not a straight forward business, but calculating the likelihood is relatively easy. It is important to distinguish between probabilities and likelihoods. The probabilities of all the different possible outcomes of a model must add up to 1. On the other hand the likelihoods of all the different possible models to explain a set of data do not have to add up to one. In fact the sum of the likelihoods will often have dimensions, a bad property for a probability.Frequentist vs Bayesian Perspectives on Inference The probability of a model given the data is called the posterior probability, and there is a close relationship between the posterior probability of a model and its likelihood that flows from some basic probability math: P(A&B)=P(A|B)P(B) P(A&B)=P(B|A)P(A) P(Model|Data)P(Data)=P(Data|Model)P(Model) )()|()()|()|()|(221121MPMDataPMPMDataPDataMDataMP=P Maximum Likelihood relies on this relationship to conclude that if one model has a higher likelihood, then it should also have a higher posterior probability. Therefor the model with the highest likelihood should also have the highest posterior probability. Many common statistics, such as the mean as the estimate of the peak of a normal distribution are really Maximum Likelihood conclusions. Bayesian statistics on the other hand maintain that you can in fact calculate the posterior probability of each model using the formula below: ∑==iiiMPMDataPMPMDataPDataPMPMDataPDataMP)()|()()|()()()|()|(00000 There are two things to note about this formula. The denominator is calculated as a sum over all the models. This can be very cumbersome to calculate, especially if there are many possible models. I will explain how this is dealt with shortly. The more controversial part of this formula is P(M), the prior probability of themodel. The idea is that this represents your idea about the probability of a model, before you consider the data. Determining what this value should be can be very controversial. One option is to choose an uninformative or flat prior, for which every model has equal probability; this is not always as straight forward as it appears. Another option is to choose an informative or strong prior, this contains information about the world that you have before investigating your data. To return to our coin example, we have a very strong prior belief that α=0.50. We have a lot of experience with coin flipping and they seam to turn up heads about half the time. Furthermore there are good a priori reasons based on our intuitive understanding of physics to believe that each side has an equal chance of turning up. Below I show plots of likelihoods and posterior probabilities with both a flat and a strong prior. The data for these plots are derived from a “coin” with α=0.7.There are several important things to note about these plots: 1) Although the posterior probabilities and the likelihoods have different scales, the shape of the likelihood plot and the posterior probability plot with flat priors are identical. 2) With lots of data, all the methods do a good job of estimating α; and they all do a pretty poor job with only a little data. 3) With intermediate amounts of data the maximum likelihood estimate of α is much closer to the actual value of α than the Bayesian estimate with strong priors. This could be seen in two ways. On face value this argues for the ML estimate. However, the Bayesian would argue that you really do have a good reason to believe that α is closer to 0.5, and a hundred coin flips should hardly effect your opinion that much. However, wasn't our prior a little arbitrary. I mean sure we can agree that most coins have a 50% chance of turning up heads; but why a normal distribution and how did you pick the variance? If you went out in the world and sampled millions of coins you could estimate what the prior distribution really is, but who's gonna do that? We could go on like this forever, but there is an even more fundamental difference between what is called the Frequentist and


View Full Document

Berkeley INTEGBI 200B - Likelihood: Frequentist vs Bayesian Reasoning

Documents in this Course
Quiz 2

Quiz 2

4 pages

Quiz 1

Quiz 1

4 pages

Quiz 1

Quiz 1

4 pages

Quiz

Quiz

2 pages

Quiz 1

Quiz 1

4 pages

Quiz

Quiz

4 pages

Load more
Download Likelihood: Frequentist vs Bayesian Reasoning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Likelihood: Frequentist vs Bayesian Reasoning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Likelihood: Frequentist vs Bayesian Reasoning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?