MIT 18 443 - Goodness-of-fit for composite hypotheses

Unformatted text preview:

Section 11 Goodness of t for composite hypotheses Example Let us consider a Matlab example Let us generate 50 observations from N 1 2 X normrnd 1 2 50 1 Then running a chi squared goodness of t test chi2gof H P STATS chi2gof X outputs H 0 P 0 8793 STATS chi2stat 0 6742 df 3 edges 3 7292 0 9249 0 0099 0 9447 1 8795 2 8142 5 6186 O 8 7 8 8 9 10 E 8 7743 7 0639 8 7464 8 8284 7 2645 9 3226 The test accepts the hypothesis that the data is normal Notice however that something is di erent Matlab grouped the data into 6 intervals so chi squared test from previous lecture should have r 1 6 1 5 degrees of freedom but we have df 3 The di erence is that now our hypothesis is not that the data comes from a particular given distribution but that the data comes from a family of distributions which is called a composite hypothesis Running H P STATS chi2gof X cdf z normcdf z mean X std X 1 would test a simple hypothesis that the data comes from a particular normal distribution N 2 and the output H 0 P 0 9838 STATS chi2stat 0 6842 71 df 5 edges 3 7292 0 9249 0 0099 0 9447 1 8795 2 8142 5 6186 O 8 7 8 8 9 10 E 8 6525 7 0995 8 8282 8 9127 7 3053 9 2017 has df 5 However we can not use this test because we estimate the parameters and 2 of this distribution using the data so this is not a particular given distribution in fact this is the distribution that ts the data the best so the T statistic in Pearson s theorem will behave di erently Let us start with a discrete case when a random variable takes a nite number of values B1 Br with probabilities p1 P X B1 pr P X Br We would like to test a hypothesis that this distribution comes from a family of distributions P In other words if we denote pj P X Bj we want to test H0 pj pj for all j r for some H1 otherwise If we wanted to test H0 for one particular xed we could use the statistic r j npj 2 T npj j 1 and use a simple chi squared goodness of t test The situation now is more complicated because we want to test if pj pj j r at least for some which means that we have many candidates for One way to approach this problem is as follows Step 1 Assuming that hypothesis H0 holds i e P P for some we can nd an estimate of this unknown and then Step 2 try to test if indeed the distribution P is equal to P by using the statistics r j npj 2 T npj j 1 in chi squared goodness of t test This approach looks natural the only question is what estimate to use and how the fact that also depends on the data will a ect the convergence of T It turns out that if we let be the maximum likelihood estimate i e that maximizes the likelihood function p1 1 pr r 72 then the statistic T r j npj 2 j 1 npj d 2r s 1 11 0 1 converges to 2r s 1 distribution with r s 1 degrees of freedom where s is the dimension of the parameter set Of course here we assume that s r 2 so that we have at least one degree of freedom Very informally by dimension we understand the number of free parameters that describe the set p1 pr Then the decision rule will be H1 T c H2 T c where the threshold c is determined from the condition P H0 H0 P T c H0 2r s 1 c where 0 1 is the level of sidni cance Example 1 Suppose that a gene has two possible alleles A1 and A2 and the combina tions of these alleles de ne three genotypes A1 A1 A1 A2 and A2 A2 We want to test a theory that Probability to pass A1 to a child Probability to pass A2 to a child 1 and that the probabilities of genotypes are given by p1 P A1 A1 2 p2 P A1 A2 2 1 p3 P A2 A2 1 2 11 0 2 Suppose that given a random sample X1 Xn from the population the counts of each genotype are 1 2 and 3 To test the theory we want to test the hypothesis H0 p1 p1 p2 p2 p3 p3 for some 0 1 H1 otherwise First of all the dimension of the parameter set is s 1 since the distributions are determined by one parameter To nd the MLE we have to maximize the likelihood function p1 1 p2 2 p3 3 or equivalently maximize the log likelihood log p1 1 p2 2 p3 3 1 log p1 2 log p2 3 log p3 1 log 2 2 log 2 1 3 log 1 2 73 If we compute the critical point by setting the derivative equal to 0 we get 2 1 2 2n Therefore under the null hypothesis H0 the statistic T d 1 np1 2 2 np2 2 3 np3 2 np1 np2 np3 2r s 1 23 1 1 12 converges to 21 distribution with one degree of freedom Therefore in the decision rule H1 T c H2 T c threshold c is determined by the condition H0 H0 21 T c P For example if 0 05 then c 3 841 Example 2 A blood type O A B AB is determined by a combination of two alleles out of A B O and allele O is dominated by A and B Suppose that p q and r 1 p q are the population frequencies of alleles A B and O correspondingly If alleles are passed randomly from the parents then the probabilities of blood types will be Blood type Allele combinations Probabilities Counts O OO r2 1 121 2 A AA AO p 2pr 2 120 3 79 B BB BO q 2 2pr AB AB 2pq 4 33 We would like to test this theory based on the counts of each blood type in a random sample of 353 people We have four groups and two free parameters p and q so the chi squared statistics T under the null hypotheses will have 24 2 1 21 distribution with one degree of freedom First we have to nd the MLE of parameters p and q The log likelihood is 1 log r 2 2 log p2 2pr 3 log q 2 2qr 4 log 2pq 2 1 log 1 p q 2 log 2p p2 2pq 3 log 2q q 2 2pq 4 log 2pq Unfortunately if we set the derivatives with respect to p and q equal to zero we get a system of two equations that is hard to solve explicitly So instead we …


View Full Document

MIT 18 443 - Goodness-of-fit for composite hypotheses

Download Goodness-of-fit for composite hypotheses
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Goodness-of-fit for composite hypotheses and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Goodness-of-fit for composite hypotheses 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?