CMU CS 10701 - Hypothesis testing - D1988286

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Hypothesis testing

DOC PREVIEW

CMU CS 10701 - Hypothesis testing

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Chapter 3 HYPOTHESIS TESTING The purpose of pattern recognition is to determine to which category or class a given sample belongs Through an observation or measurement process we obtain a set of numbers which make up the observation vector The observation vector serves as the input to a decision rule by which we assign the sample to one of the given classes Let us assume that the observation vector is a random vector whose conditional density function depends on its class If the conditional density function for each class is known then the pattern recognition problem becomes a problem in statistical hypothesis testing 3 1 Hypothesis Tests for Two Classes In this section we discuss two class problems which arise because each sample belongs to one of two classes o1or 0 2 The conditional density functions and the a priori probabilities are assumed to be known The Bayes Decision Rule for Minimum Error Bayes test Let X be an observation vector and let it be our purpose to determine whether X belongs to o1or 02 A decision rule based simply on probabilities may be written as follows 51 52 Introduction to Statistical Pattern Recognition where qi X is a posteriori probability of 0 given X Equation 3 1 indicates that if the probability of o1 given X is larger than the probability of 02 X is classified to o1 and vice versa The a posteriori probability q X may be calculated from the a priori probability Pi and the conditional density function pi X using Bayes theorem as 3 2 where p X is the mixture density function Since p X is positive and common to both sides of the inequality the decision rule of 3 1 can be expressed as 3 3 or 3 4 The term X is called the likelihood ratio and is the basic quantity in hypothesis testing We call P21P the threshold value of the likelihood ratio for the decision Sometimes it is more convenient to write the minus log likelihood ratio rather than writing the likelihood ratio itself In that case the decision rule of 3 4 becomes h X lnt X InpI X lnp2 X 3 In 0 2 P p2 3 5 The direction of the inequality is reversed because we have used the negative logarithm The term h X is called the discriminantfunction Throughout this book we assume P I P 2 and set the threshold In P IIP 0 for simplicity unless otherwise stated Equation 3 1 3 4 or 3 5 is called the Bayes test for minimum error Bayes error In general the decision rule of 3 3 or any other decision rule does not lead to perfect classification In order to evaluate the performance of a decision rule we must calculate the probability of error that is the probability that a sample is assigned to the wrong class The conditional error given X r X due to the decision rule of 3 1 is either 9 I X or q X whichever smaller That is 3 Hypothesis Testing 53 r X mink1 X q2 X I 3 4 The total error which is called the Bayes error is computed by E r X where Equation 3 7 shows several ways to express the Bayes error E The first line is the definition of E The second line is obtained by inserting 3 6 into the first line and applying the Bayes theorem of 3 2 The integral regions L and L 2 of the third line are the regions where X is classified to o1and o2by this decision rule and they are called the ol and o regions In L I P I p I X P 2p2 X and therefore r X P 2 p 2 X p X Likewise r X P I pI X p X in L2 because P l pI X P g 2 X in L2 In 3 8 we distinguish two types of errors one results from misclassifying samples from w1 and the other results from misclassifying samples from 0 2 The total error is a weighted sum of these errors Figure 3 1 shows an example of this decision rule for a simple onedimensional case The decision boundary is set at x r where P l p I x P 2 p 2 x and s r and x t are designated to L I and L2 respectively The resulting errors are P R C P 2 2 A and E A B C where A B and C indicate the areas for example B I P I p 8 dx This decision rule gives the smallest probability of error This may be demonstrated easily from the one dimensional example of Fig 3 1 Suppose that the boundary is moved from r to t setting up the new wI and o2 regions as L and L Then the resulting errors are P E C P 2 Ai B D and 6 A B C D which is larger than E by D The same is true when the Introduction to Statistical Pattern Recognition 54 b I L2 I 1 c2 Fig 3 1 Bayes decision rule for minimum error boundary is shifted to the left This argument can be extended to a general ndimensional case The computation of the Bayes error is a very complex problem except in some special cases This is due to the fact that E is obtained by integrating high dimensional density functions in complex regions as seen in 3 8 Therefore it is sometimes more convenient to integrate the density function of h h X of 3 5 which is one dimensional 3 9 3 10 where ph hI mi is the conditional density of h for mi However in general the density function of h is not available and very difficult to compute Example 1 When the p i X s are normal with expected vectors Mi and covariance matrices C the decision rule of 3 5 becomes h X In 1 X 3 1 1 Equation 3 1 1 shows that the decision boundary is given by a quadratic form in X When C I C2 C the boundary becomes a linear function of X as 55 3 Hypothesis Testing x2 x2 Fig 3 2 Decision boundaries for normal distributions a XI f b XI 3 12 Figure 3 2 shows two dimensional examples for X IzC2 and C I Example 2 Let us study a special case of 3 1 1 where 1 p Pi 1 py 3 13 M i O and X i py Pi Pi 1 This type of covariance matrix is often seen for example when stationary rundom processes are time sampled to form random vectors The explicit expressions for Z and I ZiI are known for this covariance matrix as Introduction to Statistical Pattern Recognition 56 p 0 1 p p 0 1 3 14 l Pi2 IZ I 1 p pi y 3 15 Therefore the quadratic equation of 3 11 becomes 1 p P1 x i x i l n 1 In 7 In l p2 o p2 3 16 where the second term …

View Full Document

CMU CS 10701 - Hypothesis testing

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

CMU CS 10701 - Hypothesis testing

Sign up for free to view:

Please select your school