Probability and Biology Probability Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 Question Why should biologists know about probability Answer 2 Formal statistical analysis of biological data models unexplained variation as caused by chance 15th September 2005 Probability and Biology Question Why should biologists know about probability Answer 1 Some biological processes seem to be directly affected by chance outcomes Examples I formation of gametes I recombination events I occurance of genetic mutations Probability and Biology Question Why should biologists know about probability Answer 3 In many designed experiments probability is used for I the random allocation of treatments or I the random sampling of individuals Probability and Biology Random Sampling I The formal methods of statistical inference taught in this course assume random sampling from the population of interest I Ignore for the present that in practice individuals are almost never sampled at random in a very formal sense from the population of interest Question Why should biologists know about probability Answer 4 Probability is the language with which we express and interpret assessment of uncertainty in a formal statistical analysis Probability and Biology Simple Random Samples Question Why should biologists know about probability Answer 5 Probability comes up in everyday life I predicting the weather I gambling I strategies for games I understanding risks of passing genetic diseases to children I assessing your own risk of disease associated in part with genetic causes I The process of taking a simple random sample of size n is equivalent to 1 representing every individual from a population with a single ticket 2 putting the tickets into large box 3 mixing the tickets thoroughly 4 drawing out n tickets without replacement Other Random Sampling Strategies Insufficient criterion for SRS I I Stratified random sampling and cluster sampling are examples of random sampling processes that are not simple I Data analysis for these types of sampling strategies go beyond the scope of this course Simple Random Sampling Definition A simple random sample of size n is a random sample taken so that every possible sample of size n has the same chance of being selected In a simple random sample I every individual has the same chance of being included in the sample I The condition that every individual has the same chance of being included in the sample is insufficient to imply a simple random sample For example consider sampling one couple at random from a set of ten couples 1 Each person would have a one in ten chance of being in the sample 2 However each possible set of two people does not have the same chance of being sampled 3 Pairs of people from the population who are not coupled have no chance of being sampled 4 while each pair of people in a couple has a one in ten chance of being sampled Using R to Take a Random Sample Suppose that you have a numbered set of individuals numbered from 1 to 104 and that I wanted to sample ten of these Here is some R code that will do just that sample 1 104 10 1 9 11 55 100 67 62 68 25 19 54 I the first argument is the set from which to sample in this case the integers from 1 to 104 I the second argument is the sample size I every pair of individuals has the same chance of being included in the sample I the 1 is R s way of saying that that row of output begins with the first element I in fact every set of k individuals has the same chance of being included in the sample I executing the same R code again results in a different random sample Inference from Samples to Populations I Statistical inference involves making statements about populations on the basis of analysis of sampled data I The simple random sampling model is useful because it allows precise mathematical description of the random distribution of the discrepancy between statistical estimates and population parameters I This is known as chance error due to random sampling I When using the random sampling model it is important to ask what is the population to which the results will be generalized Sampling Bias I I Random Experiments Definition A random experiment is a process with outcomes that are uncertain Example Rolling a single six sided die once The outcome which number lands on top is uncertain before the die roll Outcome Space Using methods based on random sampling on data not collected as a random sample is prone to sampling bias in which individuals do not have the same chance of being sampled Definition Sampling bias can lead to incorrect statistical inferences because the sample is unrepresentative of the population in important ways Example In a single die roll the set of possible outcomes is The outcome space is the set of possible simple outcomes from a random experiment 1 2 3 4 5 6 Events Examples Definition An event is a set of possible outcomes Example In a single die roll possible events include I A the die roll is even I B the die roll is a 6 I C the die roll is 4 or less Probability If a fair coin is tossed the probability of a head is P Heads 0 5 If bucket contains 34 white balls and 66 red balls and a ball is drawn uniformly at random the probability that the drawn ball is white is P white 34 100 0 34 Frequentist Interpretation of Probability Definition The probability of an event E denoted P E is a numerical measure between 0 and 1 that represents the likelihood of the event E in some probability model Probabilities assigned to events must follow a number of rules Example The probability P the die roll is a 6 equals 1 6 under a probability model that gives equal probability to each possible result but could be different under a different model The frequentist interpretation of probability defines the probability of an event E as the relative frequency with which event E would occur in an indefinitely long sequence of independent repetitions of a chance operation Subjective Interpretation of Probability A subjective interpretation of probability defines probability as an individual s degree of belief in the likelihood of an outcome This school of thought allows the use of probability to discuss events that are not hypothetically repeatable Frequentist Statistics I The textbook follows a frequency interpretation of probability I Frequentist methods treat population parameters as fixed but unknown I Bayesian Statistics I Statistical methods based on
View Full Document