Chapter 6 NONPARAMETRIC DENSITY ESTIMATION So far we have been discussing the estimation of parameters Thus if we can assume we have a density function that can be characterized by a set of parameters we can design a classifier using estimates of the parameters Unfortunately we often cannot assume a parametric form for the density function and in order to apply the likelihood ratio test we somehow have to estimate the density functions using an unstructured approach This type of approach is called nonparametric estimation while the former is called parametric estimation Since in nonparametric approaches the density function is estimated locally by a small number of neighboring samples the estimate is far less reliable with larger bias and variance than the parametric counterpart There are two kinds of nonparametric estimation techniques available one i s called the Par zen density estimate and the other is the k nearest neighbor densiry estimate They are fundamentally very similar but exhibit some different statistical properties Both are discussed in this chapter It is extremely difficult to obtain an accurate density estimate nonparametrically particularly in high dimensional spaces However our goal here is not to get an accurate estimate Our goal is by using these estimates to design a classifier and evaluate its performance For this reason the accuracy of the estimate is not necessarily a crucial issue Classification and performance evaluation will be discussed in Chapter 7 The intention of this 254 6 Nonparametric Density Estimation 255 chapter is to make the reader familiar with the fundamental mathematical properties related to nonparametric density estimation in preparation for the material presented in Chapter 7 6 1 Parzen Density Estimate Parzen Density Estimate In order to estimate the value of a density function at a point X we may set up a small local region around X L X Then the probability coverage or probability mass of L X may be approximated by p X v where v is the volume of L X This probability may be estimated by drawing a large number of samples N from p X counting the number of samples k falling in L X and computing k N Equating these two probabilities we may obtain an estimate of the density function as k X or p X v N n k X p X Nv n Note that with a fixed v k is a random variable and is dependent on X A fixed v does not imply the same v throughout the entire space and v could still vary with X However v is a preset value and is not a random variable Kernel expression The estimate of 6 1 has another interpretation Suppose that 3 samples X3 X and X are found in L X as shown in Fig 6 1 With I and N given i X becomes 3 Nv On the other hand if we set up a uniform kernel function IC with volume v and height l v around all existing samples the average of the values of these kernel functions at X is also 3 Nv That is 1 41 n l p x N N K X x 6 2 I As seen in Fig 6 1 only the kernel functions around the 3 samples X 3 X4 and X contribute to the summation of 6 2 Once 6 2 is adopted the shape of the kernel function could be selected more freely under the condition K X dX 1 For one dimensional cases we may seek optimality and select a complex shape However in a highdimensional space because of its complexity the practical selection of the ker 256 Introduction to Statistical Pattern Recognition Fig 6 1 Parzen kernel density estimate ne1 function is very limited to either a normal or uniform kernel In this book we will use the following kernel which includes both normal and uniform kernels as special cases where r is the gamma function and m is a parameter determining the shape of the kernel It may be verified that for any value of m the covariance matrix of the kernel density 6 3 is r 2 A The parameter rn determines the rate at which the kernel function drops off For m 1 6 3 reduces to a simple normal kernel As m becomes large 6 3 approaches a uniform hyperelliptical kernel always with a smooth roll off The matrix A determines the shape of the hyperellipsoid and I controls the size or volume of the kernel Other coefficients are selected to satisfy the two conditions mentioned previously 6 Nonparametric Density Estimation 257 j x d X 1 and Z r A where C is the covariance matrix of K X Convolution expression Equation 6 2 can be rewritten in convolution form as where p Tis an impulsive density function with impulses at the locations of existing N samples That is the estimated density p X is obtained by feeding p X through a linear noncausal filter whose impulse response is given by K X Therefore p X is a smoothed version of p X n Moments of p X The first and second order moments of 6 4 can be easily computed First let us compute the expected value of p X as n I N E P X J Zj6 X Z P Z dZ N I l N N ZP X p X 6 6 That is p X is an unbiased estimate of p X Then the expected value of p X of 6 4 may be computed as Also 258 Introduction to Statistical Pattern Recognition Therefore the variance of p X is Approximations of moments In order to approximate the moments of p X let us expand p Y around X by a Taylor series up to the second order terms as A p E x V X Y X 21 v x Y x Y x 6 10 Then p X K X may be approximated by p X K X jp Y K Y x dY gp X jK Y X dY 1 tr 2 v2p X j Y X Y X K Y X d Y 6 11 where the first order term disappears because K is a symmetric function Since K Y x Y 1 and Y x Y x K Y x Y r 2 for K of 6 3 6 1 1 can be expressed by 6 12 where 6 13 Similarly p X d X Ep X jt Y X dY 1 v x Y x Y x 1Y x Y 2 Although K is a density function value not equal to 1 Let K 6 14 is not Therefore l d Y d Y has a 6 Nonparametric Density Estimation 259 w jl Y dY 6 15 Then w becomes a density function Therefore 6 14 becomes 6 16 where 6 17 and r B is the covariance matrix of X w Substituting 6 12 and 6 16 into 6 7 and 6 9 the moments of p X are approximated by E X I p 1 X 1 y a X r 2 2nd order approximation 1st order approximation P X 6 …
View Full Document