Probability and Biology Probability comes up in everyday life predicting the weather lotteries or sports betting strategies for card games understanding risks of passing genetic diseases to children assessing your own risks of diseases associated in part with genetic causes Probability Bret Larget Department of Statistics Statistics 371 Fall 2004 1 University of Wisconsin Madison Random Sampling Probability and Biology September 9 Why 2004 should we know something about probability Most of the formal methods of statistical inference we will use in this class are based on the assumption that the individual units in the sample are sampled at random from the population of interest Ignore for the present that in practice individuals are almost never sampled at random in a very formal sense from the population of interest Taking a simple random sample of size n is equivalent to the process of 1 representing every individual from a population with a single ticket 2 putting the tickets into large box 3 mixing the tickets thoroughly 4 drawing out n tickets without replacement Stratified random sampling and cluster sampling are examples of random sampling processes that are not simple Data analysis for these types of sampling strategies go beyond the scope of this course Statistics 371 Fall 2003 Statistics 371 Fall 2004 2 Some biological processes seem to be directly affected by chance outcomes Examples include formation of gametes and occurrence of genetic mutations Formal statistical analysis of biological data assumes that variation not explained by measured variables is caused by chance Chance might be used in the design of an experiment such as the random allocation of treatments or random sampling of individuals Probability is the language with which we express and interpret assessment of uncertainty in a formal statistical analysis Formal statistical analysis depends on modeling observed data as the realization of a random process Statistics 371 Fall 2004 1 Inference from Samples to Populations Simple Random Sampling Statistical inference involves making statements about populations on the basis of analysis of sampled data The Simple random sampling model is useful because it allows precise mathematical description of the random distribution of the discrepancy between statistical estimates and population parameters This is known as chance error due to random sampling When using the random sampling model it is important to ask what is the population to which the results will be generalized The use statistical methods that assume random sampling on data that is not collected as a random sample is prone to sampling bias in which individuals do not have the same chance of being sampled Sampling bias can lead to incorrect statistical inferences because the sample is unrepresentative of the population in important ways Statistics 371 Fall 2004 5 Probability The defining characteristic of the process of simple random sampling is that every possible sample of size n has the same chance of being selected In particular this means that a every individual has the same chance of being included in the sample and that b members of the sample are chosen independently of each other Note that point a above is insufficient to define a simple random sample As an example consider sampling one couple at random from a set of ten couples Each person would have a one in ten chance of being in the sample but the sampling is not independent Possible samples of two people from the population who are not in a couple have no chance of being sampled while each couple has a one in ten chance of being sampled Statistics 371 Fall 2004 3 Using R to Take a Random Sample Probability is a numerical measure of the likelihood of an event Probabilities are always between 0 and 1 inclusive Notation The probability of an event E is written Pr E Suppose that you have a numbered set of individuals numbered from 1 to 98 and that I wanted to sample ten of these Here is some R code that will do just that sample 1 98 10 1 19 74 3 51 70 75 14 31 76 86 Examples In the sample function the first argument is the set from which to sample in this case the integers from 1 to 98 and the second argument is the sample size If a fair coin is tossed the probability of a head is Pr Heads 0 5 If bucket contains 34 white balls and 66 red balls and a ball is drawn at random the probability that the drawn ball is white is In the output the 1 is R s way of saying that that row of output begins with the first element Pr white 34 100 0 34 The same code executed again results in a different random sample Statistics 371 Fall 2004 6 Statistics 371 Fall 2004 4 Examples of Interpretations of Interpretations of Probability Probability Coin tossing it is reasonable to consider tossing a coin many times where each coin toss can be thought of as a repetition of the same basic chance operation The probability of heads can be thought of a the long run relative frequency of heads Packer Football the outcome of the next Packer game is uncertain but it is less reasonable to think about the outcome Packers win lose or tie as something that could be repeated indefinitely The long run relative frequency interpretation of probability does not allow for an interpretation of the probability of an event that will occur only once Evolution the statement molluscs form a monophyletic group means that all living individuals classified as molluscs have a common ancestor that is not an ancestor of any nonmolluscs It is uncertain whether or not this statement is true Statistics 371 Fall 2004 8 Comparing Bayesian and Frequentist Statistics 371 Fall 2004 7 Interpretations of Probability Approaches A Bayesian approach to statistical inference allows one to quantify uncertainty in a statement with a probability and describes how to update the probability in light of new data A frequency approach to statistical inference does not allow direct quantification of uncertainty with probabilities for events that happen only once A frequentist approach would ask instead if I assume that the event is true how likely is an observed outcome If the probability of the observed outcome is low enough relative to some alternative this would be seen as evidence against the hypothesis Statistics 371 Fall 2004 The frequency interpretation of probability defines the probability of an event E as the relative frequency with which event E would occur in an indefinitely long sequence of independent
View Full Document