DOC PREVIEW
CMU CS 10701 - Hypothesis testing

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 3 HYPOTHESIS TESTING The purpose of pattern recognition is to determine to which category or class a given sample belongs. Through an observation or measurement pro- cess, we obtain a set of numbers which make up the observation vector. The observation vector serves as the input to a decision rule by which we assign the sample to one of the given classes. Let us assume that the observation vector is a random vector whose conditional density function depends on its class. If the conditional density function for each class is known, then the pattern recog- nition problem becomes a problem in statistical hypothesis testing. 3.1 Hypothesis Tests for Two Classes In this section, we discuss two-class problems, which arise because each sample belongs to one of two classes, o1 or 02. The conditional density func- tions and the a priori probabilities are assumed to be known. The Bayes Decision Rule for Minimum Error Bayes test: Let X be an observation vector, and let it be our purpose to determine whether X belongs to o1 or 02. A decision rule based simply on probabilities may be written as follows: 5152 Introduction to Statistical Pattern Recognition where qi(X) is a posteriori probability of 0; given X. Equation (3.1) indicates that, if the probability of o1 given X is larger than the probability of 02, X is classified to o1 , and vice versa. The a posteriori probability q;(X) may be cal- culated from the a priori probability Pi and the conditional density function pi(X), using Bayes theorem, as (3.2) where p (X) is the mixture density function. Since p (X) is positive and com- mon to both sides of the inequality, the decision rule of (3.1) can be expressed as or (3.3) (3.4) The term [(X) is called the likelihood ratio and is the basic quantity in hypothesis testing. We call P21P the threshold value of the likelihood ratio for the decision. Sometimes it is more convenient to write the minus-log likeli- hood ratio rather than writing the likelihood ratio itself. In that case, the deci- sion rule of (3.4) becomes P, 02 p2 h(X)=-lnt(X)=-InpI(X)+lnp2(X) 3 In -. (3.5) The direction of the inequality is reversed because we have used the negative logarithm. The term h (X) is called the discriminant function. Throughout this book, we assume P I = P 2, and set the threshold In P IIP = 0 for simplicity, unless otherwise stated. Equation (3.1), (3.4), or (3.5) is called the Bayes test for minimum error. Bayes error: In general, the decision rule of (3.3, or any other decision rule, does not lead to perfect classification. In order to evaluate the perfor- mance of a decision rule, we must calculate the probability of error, that is, the probability that a sample is assigned to the wrong class. The conditional error given X, r(X), due to the decision rule of (3.1) is either 9 I (X) or q*(X) whichever smaller. That is,3 Hypothesis Testing 53 r-(X) = mink1(X),q2(X)I . (3.4) The total error, which is called the Bayes error, is computed by E { r(X)]. where Equation (3.7) shows several ways to express the Bayes error, E. is the definition of E. The second line is obtained by inserting The first line (3.6) into the first line and applying the Bayes theorem of (3.2). The integral regions L and L2 of the third line are the regions where X is classified to o1 and o2 by this decision rule, and they are called the ol- and o;?-regions. In LI, P IpI (X) > P 2p2(X), and therefore r (X) = P2p2(X)/p (X). Likewise, r-(X) = P Ip I (X)/p (X) in L2 because P lp I (X) < P g2(X) in L2. In (3.8), we distinguish two types of errors: one results from misclassifying samples from w1 and the other results from misclassifying samples from 02. The total error is a weighted sum of these errors. Figure 3-1 shows an example of this decision rule for a simple one- dimensional case. The decision boundary is set at x=r where P lp I (x) = P 2p2(x), and s < r and x > t are designated to L I and L2 respec- tively. The resulting errors are P = R + C, P 2~2 = A, and E = A + B + C, where A, B, and C indicate the areas, for example, B = I' P Ip (8) dx. This decision rule gives the smallest probability of error. This may be demonstrated easily from the one-dimensional example of Fig. 3- 1. Suppose that the boundary is moved from r to t', setting up the new wI - and o2-regions as L; and L;. Then, the resulting errors are P ]E; = C, P 2~i = A + B + D, and 6 =A + B + C + D, which is larger than E by D. The same is true when the54 Introduction to Statistical Pattern Recognition I *L2 I b* +--1-->c2 Fig. 3-1 Bayes decision rule for minimum error. boundary is shifted to the left. This argument can be extended to a general n- dimensional case. The computation of the Bayes error is a very complex problem except in some special cases. This is due to the fact that E is obtained by integrating high-dimensional density functions in complex regions as seen in (3.8). There- fore, it is sometimes more convenient to integrate the density function of h = h (X) of (3.5), which is one-dimensional: (3.9) (3.10) where ph(h I mi) is the conditional density of h for mi. However, in general, the density function of h is not available, and very difficult to compute. Example 1: When the pi(X)'s are normal with expected vectors Mi and covariance matrices C;, the decision rule of (3.5) becomes h (X) = - In 1(X) (3.1 1) Equation (3.1 1) shows that the decision boundary is given by a quadratic form in X. When CI = C2 = C, the boundary becomes a linear function of X as3 Hypothesis Testing 55 x2 x2 / Fig. 3-2 Decision boundaries for normal distributions: (a) XI f &; (b) XI = &. Figure 3-2 shows two-dimensional examples for XI zC2 and C I =&. Example 2: Let us study a special case of (3.1 1) where Mi =O and Xi = 1 p; ... py-' Pi 1 ' Pi py-' . . . Pi 1 (3.12) (3.13) This type of covariance matrix is often seen, for example, when stationary run- dom processes are time-sampled to form random vectors. The explicit expres- sions for Z;' and I Zi I are known for this covariance matrix as56 Introduction to Statistical Pattern Recognition -p; 0 . . . 1+p: -p; .. , 0 1 2 l+Pi -pi IZ; I = (1 - py . (3.14) (3.15) Therefore, the quadratic equation of (3.11) becomes 1-p: P1 l-p2 o? p2 - ~xixi+l + (n-1) In 7 >< In - , (3.16) where the second term shows the edge effect of terminating the observation of random processes within a finite length, and this effect


View Full Document

CMU CS 10701 - Hypothesis testing

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Hypothesis testing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hypothesis testing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hypothesis testing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?