Unformatted text preview:

http://www.cubs.buffalo.eduPattern RecognitionApproximating class densities, Bayesian classifierhttp://www.cubs.buffalo.eduBayesian classification• Suppose we have 2 classes and we know probability density functions of their feature vectors. How some new pattern should be classified?• Bayes classification rule: classify x to the class which has biggest posterior probability)|( xwPiiw2121:?)|()|( wwxwPxwP>212211:?)()|()()|( wwwPwxpwPwxp>Using Bayes formula, we can rewrite classification rule:posteriorlikelihood priorhttp://www.cubs.buffalo.eduEstimating probability density function.• Parametric pdf estimation: model unknown probability density function of class by some parametric function and determine parameters based on training samples.)|(iwxpiw);(θxpiExample: Gaussian function2)(212/)2(1);(µπµ−−=xlexp• Non-parametric pdf estimation:1. Histogram2. K nearest neighbor 3. Kernel methods (Parzen kernels or windows)4. Other methods (estimating cumulative distribution function first, SVM density estimation, etc.)∑=−=NiihxxhNxp111)(ˆϕNis the number of training sampleshttp://www.cubs.buffalo.eduEstimating kernel width• Non-parametric pdf estimation:• Fixed kernels:• Adaptive kernels:or∑=−=NiihxxhNxp111)(ˆϕ∑=−=NiiiihxxhNxp111)(ˆϕ∑=−=NiiiiihxxhNxp111)(ˆϕhttp://www.cubs.buffalo.eduEstimating kernel widthRecall, we used maximum likelihood method for parametric pdfestimation:∏===NkkNxpxxxpXp121);(ˆmax)|,...,,(ˆmax);(ˆmaxθθθθθθCan we use same method for estimating the kernel width ?)(ˆxphNo, the max is not achievable:( )( )0 if max);(ˆmax11111→∞→=∏∏=∑=−=hhxpNkNihkxixhNhNkkhϕhttp://www.cubs.buffalo.eduEstimating kernel widthSolution: separate model data (kernel centers) from testing data- cross-validation technique)(ˆxp∏∑=≠−NkhkihkxixhN111maxϕhttp://www.cubs.buffalo.eduEstimating kernel widthTried maximum likelihood cross-validation and still diverges?)(ˆxp∞→≠−∏∑=NkhkihkxixhN111maxϕThis might happen if data is somewhat discrete:Solution - truly separate model data from testing data:∏∑=≠−NkkikihxxhxxhN111maxϕhttp://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a univariate normal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a bimodal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduExamples of pdf estimationParzen-window (kernel) estimates of a bivariatenormal density using different window widths and numbers of samples.http://www.cubs.buffalo.eduError in pdf estimation)(xp)(ˆxpx2)}()(ˆ{)ˆ( xpxpEpMSEx−=Discrepancy between true density and its estimation :)(xp)(ˆxp∫−= dxxpxpEpMISE2)}()(ˆ{)ˆ(- Mean Square Error- Mean Integrated Square Error[ ][ ][ ]22222222222}ˆˆ{ˆ}ˆ{}ˆ{}ˆ{2}ˆ{}ˆ{2}ˆ{}ˆ2ˆ{}ˆ{)ˆ(ppEEppEpEpEpppEpEpppEpEppppEppEpMSEx−+−=−++−=+−=+−=−=http://www.cubs.buffalo.eduBias and variance of estimation error[][]22}ˆˆ{ˆ)ˆ( ppEEppEpMSEx−+−=)(xp))(ˆ( xpE)(ˆxpBias VariancepEˆ- Average approximationBias is the difference between true density and average approximationVariance is the difference between average approximation and individual approximationsSmaller kernel width reduces bias, but increases variance.∫−= dyyphyxKhpE )(1ˆhttp://www.cubs.buffalo.eduBias and variance of estimation error{}{}5/15/125/125/22)()(−−−∫∫′′= ndxxpdttkhoptϕSilverman (Parzen):If some assumptions on the true density are made (e.g. ) then it is possible to analytically find the kernel width which gives smallest Optimal kernel width gets smaller when the number of training samples increases. For optimal kernel width also decreases:∞<′′∫dxxp2))(()ˆ(pMISEn)ˆ(pMISE{}5/45/12)()(~−∫′′ndxxpCMISEϕNote, that is unknown. Above formulas are useful for theory, but not for practical applications.)(xpFor multivariate pdf approximation:The performance decreases exponentially when the number of dimensions increases)4/(4~dnMISE+−http://www.cubs.buffalo.eduBayesian classification• Bayes classification rule: classify x to the class which has biggest posterior probability)|( xwPiiw2121:?)|()|( wwxwPxwP>• Bayes classification rule minimizes the total probability of misclassification. Cost of errors.• Errors happen when samples of class 1 are incorrectly classified to belong to class 2, and samples of class 2 are classified to belong to class 1.• The cost of making these errors can be different : 1λ- the cost of misclassifying samples of class 12λ- the cost of misclassifying samples of class 2http://www.cubs.buffalo.eduTotal cost (or risk) of classificationClassification algorithm splits feature space into two decision regions:1R- samples in this region are classified as being in class 12R- samples in this region are classified as being in class 2∫2)|(1Rdxwxp- the proportion of samples of class 1 being classified as class 2∫1)|(2Rdxwxp- the proportion of samples of class 2 being classified as class 1∫2)|()(11RdxwxpwP- the proportion of all input samples being class 1 but classified as being in class 2∫1)|()(22RdxwxpwP- the proportion of all input samples being class 2 but classified as being in class 1∫∫+=12)|()()|()(222111RRdxwxpwPdxwxpwPCostλλ-total costhttp://www.cubs.buffalo.eduMinimizing total cost of classificationSince and cover whole feature space1R2R1)|()|(2111=+∫∫RRdxwxpdxwxpThusCost is minimized if includes only points where ∫∫∫−+=+−=111))|()()|()(()()|()(})|(1){(11122211222111RRRdxwxpwPwxpwPwPdxwxpwPdxwxpwPCostλλλλλ1R0)|()()|()(111222<−wxpwPwxpwPλλhttp://www.cubs.buffalo.eduBayesian classificationBayesian classifier is an optimal classifier minimizing total classification cost. Such classifier is possible only if we have full knowledge about class distributions.If then classify as class 1.


View Full Document

UB CSE 666 - Approximating class densities, Bayesian classifier

Download Approximating class densities, Bayesian classifier
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Approximating class densities, Bayesian classifier and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Approximating class densities, Bayesian classifier 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?