UCSD CSE 190 - Non-Parametric Density Estimation Techniques - D2634313

Home> Schools> University of California, San Diego> Computer Science & Engineering (CSE) > CSE 190> Non-Parametric Density Estimation Techniques

DOC PREVIEW

UCSD CSE 190 - Non-Parametric Density Estimation Techniques

School name University of California, San Diego

Course Cse 190- Topics/Computer Sci & Engineer

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1CSE190a Fall 06Non-Parametric Density Estimation TechniquesBiometricsCSE 190-aLecture 7CSE190a Fall 06Announcements• HW1 assigned• Most of this lecture was on the blackboard. These slides cover the same material as presented in DHSPattern Pattern ClassificationClassificationAll materials in these slides were taken from All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Pattern Classification (2nd ed) by R. O. DudaDuda, P. E. Hart and D. G. Stork, John Wiley , P. E. Hart and D. G. Stork, John Wiley & Sons, 2000& Sons, 2000with the permission of the authors and the with the permission of the authors and the publisherpublisherChapter 4 (Part 1):Chapter 4 (Part 1):NonNon--Parametric Classification Parametric Classification (Sections 4.1(Sections 4.1--4.3)4.3)••IntroductionIntroduction••Density EstimationDensity Estimation••ParzenParzenWindowsWindowsPattern Classification, Ch4 (Part 1)4IntroductionIntroduction••All Parametric densities are All Parametric densities are unimodalunimodal(have a single local (have a single local maximum), whereas many practical problems involve multimaximum), whereas many practical problems involve multi--modal densitiesmodal densities••Nonparametric procedures can be used with arbitrary Nonparametric procedures can be used with arbitrary distributions and without the assumption that the forms of distributions and without the assumption that the forms of the underlying densities are knownthe underlying densities are known••There are two types of nonparametric methods:There are two types of nonparametric methods:••Estimating PEstimating P(x | (x | ωωj j ) ) ••Bypass probability and go directly to aBypass probability and go directly to a--posteriori probability posteriori probability estimationestimationPattern Classification, Ch4 (Part 1)5Density EstimationDensity Estimation••Basic idea:Basic idea:••Probability that a vector x will fall in region R is:Probability that a vector x will fall in region R is:••P is a smoothed (or averaged) version of the density P is a smoothed (or averaged) version of the density function p(x) if we have a sample of size n; therefore, the function p(x) if we have a sample of size n; therefore, the probability that k points fall in R is then:probability that k points fall in R is then:and the expected value for k is:and the expected value for k is:E(k) = E(k) = nPnP(3)(3)∫ℜ= (1) 'dx)'x(pP(2) )P1(P knPknkk−−⎟⎟⎠⎞⎜⎜⎝⎛=2Pattern Classification, Ch4 (Part 1)6ML estimation of ML estimation of P = P = θθis reached foris reached forTherefore, the ratio k/n is a good estimate for Therefore, the ratio k/n is a good estimate for the probability P and hence for the density the probability P and hence for the density function p. function p. p(x)p(x)is continuous and that the region is continuous and that the region RRis so is so small that p does not vary significantly within small that p does not vary significantly within it, we can write:it, we can write:where is a point within where is a point within RRand V the volume enclosed by and V the volume enclosed by RR..)|P(MaxkθθPnkˆ≅=θ(4) V)x(p'dx)'x(p∫ℜ≅Pattern Classification, Ch4 (Part 1)7Combining equation (1) , (3) and (4) yields:Combining equation (1) , (3) and (4) yields:Vn/k)x(p ≅Pattern Classification, Ch4 (Part 1)8Density Estimation (cont.)Density Estimation (cont.)••Justification of equation (4)Justification of equation (4)We assume that We assume that p(x)p(x)is continuous and that region is continuous and that region RRis is so small that p does not vary significantly within so small that p does not vary significantly within RR. Since . Since p(x) =p(x) =constant, it is not a part of the sum.constant, it is not a part of the sum.(4) V)x(p'dx)'x(p∫ℜ≅Pattern Classification, Ch4 (Part 1)9Where: Where: μμ((RR))is: is: a surface in the Euclidean space a surface in the Euclidean space RR22a volume in the Euclidean space a volume in the Euclidean space RR33a a hypervolumehypervolumein the Euclidean space in the Euclidean space RRnnSince Since p(x) p(x) ≅≅p(xp(x’’) =) =constant, therefore in the Euclidean constant, therefore in the Euclidean space space RR33::∫∫∫ℜℜℜℜℜ=== )()'x(p'dx)x(1)'x(p'dx)'x(p'dx)'x(pμnVk)x(p andV).x(p'dx)'x(p≅≅∫ℜPattern Classification, Ch4 (Part 1)10••Condition for convergenceCondition for convergenceThe fraction The fraction k/(nVk/(nV))is a space averaged value of is a space averaged value of p(x).p(x).p(x)p(x)is obtained only if V approaches zero.is obtained only if V approaches zero.This is the case where no samples are included in This is the case where no samples are included in RR: : it is an uninteresting case!it is an uninteresting case!In this case, the estimate diverges: it is an In this case, the estimate diverges: it is an uninteresting case!uninteresting case!fixed)n (if 0)x(plim0k ,0V===→∞=≠→)x(plim0k ,0VPattern Classification, Ch4 (Part 1)11••The volume V needs to approach 0 anyway if we want to The volume V needs to approach 0 anyway if we want to use this estimationuse this estimation••Practically, V cannot be allowed to become small since the numbePractically, V cannot be allowed to become small since the number of r of samples is always limitedsamples is always limited••One will have to accept a certain amount of variance in the ratiOne will have to accept a certain amount of variance in the ratio k/no k/n••Theoretically, if an unlimited number of samples is available, wTheoretically, if an unlimited number of samples is available, we can e can circumvent this difficultycircumvent this difficultyTo estimate the density of x, we form a sequence of regionsTo estimate the density of x, we form a sequence of regionsRR11, , RR22,,……containing x: the first region contains one sample, the containing x: the first region contains one sample, the second two samples and so on.second two samples and so on.Let Let VVnnbe the volume of be the volume of RRnn, , kknnthe number of samples falling in the number of samples falling in RRnnand and ppnn(x(x))be the nbe the nththestimate for estimate for p(x):p(x):ppnn(x(x) = () = (kknn/n)/V/n)/Vnn(7)(7)3Pattern Classification, Ch4 (Part 1)12Three necessary conditions should apply if we want Three necessary conditions should apply if we want ppnn(x(x))to converge to to converge to p(x)p(x)There are two different

View Full Document