Lecture 15 Categorical data andchi-square tests• Continuous variable : height, weight, geneexpression level, lethal dosage of anticancercompound, etc --- ordinal• Categorical variable : sex, profession, politicalparty, blood type, eye color, phenotype, genotype• Questions : do smoke cause lung cancer? Dosmokers have a high lung cancer rate?• Do the 4 nucleotides, A, T, G, C, occur equallylikely?•Sample space : the set of possiblebasic outcomes• To study categorical variables, the first thing is toknow what the categories are.• face of coin : head, tail• face of a die : 1, 2,3, 4, 5, 6• Nucleotide : A, T, G,C• Sex : male, female• Blood type: A,B, O, AB The set of possible outcome of a categoricalvariable forms a sample spaceWhen two categorical variables are involved, then thesample space is the set of all possible combinations.Subjective probability andassumption of independence• Symmetry : if two outcomes are deemedsymmetrical, then they should be assigned with anequal probability• Sum of probability is equal to 1• If two variables are independent, then you canmultiply the probability.• Statistical questions : can symmetry be assumed?Can independence be assumed?• Solution : Collect data and conduct a chi-squaretest.Examples• A random sample of 100 nucleotides is obtained.There are 24 A, 21 T, 30 G, 25 C.• Are the data compatible with the assumption ofequal occurrence?• Suppose G and C are mixed up by error. So wehave 24 A, 21T, 55 G/C. What is the answer
View Full Document