New version page

# UCSD CSE 190 - Lecture 8

Pages: 8
Documents in this Course

6 pages

6 pages

26 pages

3 pages

2 pages

19 pages

16 pages

62 pages

9 pages

19 pages

8 pages

2 pages

4 pages

18 pages

2 pages

11 pages

7 pages

10 pages

43 pages

36 pages

2 pages

10 pages

4 pages

8 pages

9 pages

6 pages

6 pages

## This preview shows page 1-2-3 out of 8 pages.

View Full Document
Do you want full access? Go Premium and unlock all 8 pages.
Do you want full access? Go Premium and unlock all 8 pages.
Do you want full access? Go Premium and unlock all 8 pages.

Unformatted text preview:

Announcements HW1 assigned due thursday Most of last lecture was on the blackboard Nearest Neighbor and Linear Discriminant Functions perceptrons Biometrics CSE 190 a Lecture 8 CSE190a Fall 06 CSE190a Fall 06 3 Non Parametric Density Estimation Three necessary conditions should apply if we want pn x x to converge to p x 1 lim Vn 0 Given a collection of n samples estimate the probability density n 2 lim k n n 3 lim k n n 0 Parzen Windows K th nearest neighbor n There are two different ways of obtaining sequences of regions that that satisfy these conditions a Shrink an initial region where Vn 1 1 n and show that Main ideas pn x p x 1 As number of samples n approaches infinity estimated density should approach true density 2 Approximated density should be reasonable for finite n n This is called the ParzenParzen window estimation method method b Specify kn as some function of n such as kn n the volume Vn is grown until it encloses kn neighbors of x This is called the kn nearest neighbor estimation method method Pattern Classification Ch4 Part 1 CSE190a Fall 06 4 5 Parzen Windows ParzenParzen window approach to estimate densities assume that the region Rn is a dd dimensional hypercube Vn hnd hn length of the edge of n Let u be the following window function 1 1 j 1 d 2 0 otherwise u uj x x xi hn is equal to unity if xi falls within the hypercube of volume Vn centered at x and equal to zero otherwise Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 1 6 7 The number of samples in this hypercube is x xi k n hn i 1 i n Parzen Window Example By substituting kn in equation 7 we obtain the following estimate pn x 1 i n 1 n i 1 Vn Draw samples from a Normal distribution N 0 1 Let u 1 2 exp u2 2 hn h1 n n 1 Thus x xi hn pn x 1 i n 1 x xi n i 1 hn hn is an average of normal densities centered at the samples xi Pn x x estimates p x as an average of functions of x and the samples xi i 1 n These functions can be general Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 8 9 Case where p x 1 U a b 2 T c d unknown density mixture of a uniform and a triangle density Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 10 Probabalistic Neural network contains Parzen Window 11 Kn Nearest neighbor estimation Goal a solution for the problem of the unknown best best window function Let the cell volume be a function of the training data Center a cell about x and let it grows until it captures kn samples kn f n kn are called the kn nearestnearest neighbors of x 2 possibilities can occur Density is high near x therefore the cell will be small which provides a good resolution Density is low therefore the cell will grow large and stop until until higher density regions are reached We can obtain a family of estimates by setting kn n Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 2 12 13 Illustration For n 1 and kn n 1 the estimate becomes Pn x x kn n Vn 1 V1 1 2 x2 x Yikes Well not so good as the probability goes to infinity at x1 but at least we do not have holes in the density Things get better as n gets bigger And we still don don t have holes in the density even for higher dimensions Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 14 15 The nearest neighbor rule Let Dn x1 x2 xn be a set of n labeled prototypes Let x Dn be the closest prototype to a test point x then the nearestnearest neighbor rule for classifying x is to assign it the label associated with x The nearestnearest neighbor rule leads to an error rate greater than the minimum possible the Bayes rate If the number of prototype is large unlimited the error rate of the nearestnearest neighbor classifier is never worse than twice the Bayes rate it can be demonstrated If n it is always possible to find x sufficiently close so that P P i x P P i x Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 16 17 The k nearestnearest neighbor rule Goal Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 3 18 Whitening Transform See blackboard Pattern Classification Ch4 Part 1 Linear discriminant functions and decisions surfaces Linear Discriminant Functions Sections 5 1 5 2 21 Definition It is a function that is a linear combination of the components of x g x wtx w0 1 where w is the weight vector and w0 the bias A twotwo category classifier with a discriminant function of the form 1 uses the following rule Decide 1 if g x 0 and 2 if g x 0 Decide 1 if wtx w0 and 2 otherwise If g x 0 x is assigned to either class Pattern Classification Ch4 Part 1 22 23 The equation g x 0 defines the decision surface that separates points assigned to the category 1 from points assigned to the category 2 When g x is linear the decision surface is a hyperplane Algebraic measure of the distance from x to the hyperplane interesting result Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part 1 4 24 25 x xp r w w since w is colinear with x x p and sin ce g x 0 and w t w w therefore r w 1 w 2 g x w in particular d 0 H w0 w In conclusion a linear discriminant function divides the feature space by a hyperplane decision surface The orientation of the surface is determined by the normal vector w and the location of the surface is determined by the bias Pattern Classification Ch4 Part 1 26 The multimulti category case We define c linear discriminant functions gi x w it x w i 0 Pattern Classification Ch4 Part 1 27 i 1 c and assign x to i if gi x x gj x x j i in case of ties the classification is undefined In this case the classifier is a linear machine machine A linear machine divides the feature space into c decision regions regions with gi x x being the largest discriminant if x is in the region Ri For a two contiguous regions Ri and Rj the boundary that separates them is a portion of hyperplane Hij defined by gi x x gj x x wi wj tx wi0 wj0 0 wi wj is normal to Hij and d x H ij gi g j wi w j Pattern Classification Ch4 Part 1 Pattern Classification Ch4 Part …

View Full Document