Unformatted text preview:

Combining ClassifiersSynononyms for the TopicTypical ScenarioTwo ApproachesMixture Model of CombinationClassifier EnsembleMixture ModelProbability of producing a category label yConditional Mean ofParameter EstimationWinner Take AllComponent Classifiers Without DiscriminantsCombining ClassifiersSargur [email protected] for the Topic• “Mixture of Experts”• “Ensemble Classifiers”• “Modular Classifiers”• “Pooled Classifiers”• “Combining Models”Typical ScenarioApplication in Biometrics: Several ModalitiesTwo Approaches1. Component Classifiers with DiscriminantFunctionsUse a principled statistical approach2. Component Classifiers without DiscriminantFunctionsSeveral heuristics are availableMixture Model of Combination• Each classifier tuned to its sub-distribution• Each classifier outputs discriminant values• Discriminant values are combined using weights for the componentsClassifier Ensemble• Each component classifier trained in a different region of feature space• Each component classifier provides probability estimatesArchitecture for Combining ClassifiersMixture Model• k classifiers corresponding to k component densities• Classifier/component density(process indexed by r) chosen with probability • Where is a parameter vector that describes the state of the process),|(0θrxrPθ0rOverall probability of producing y (category label) is the sum over all the processes according to:Probability of producing a category label yMixture distribution()()()θθ01000 , , x,y rkrxyPxrPP∑==Θ[]θθθ001000,..,,k=Θwhereis the vector of all parametersVector of all relevant parametersDiscriminant Functions of Component Classifier r. allfor 11rgcjrj=∑=),|(0θrxrPConditional Mean ofGating subsystem with weight parameters(All discriminant values of component classifier r are multiplied by a scalar wr) :[]rkrrwxyµµ ε∑==Θ=1 ,),|(0θrxrPParameter EstimationFind parameters that maximize the log-likelihood function for n training patternsx1,..,xnin D:()()()⎟⎠⎞⎜⎝⎛=Θ∑∑==krriiinixyPxrP,l101,, lnθθDUse Gradient Descent on the Parameters()() ()[]krxyPxyrPlriirniiir,....,1for ,ln,,1=∂∂=∂Θ∂∑=θµµDGradient Descent moves prior probabilities to posterior probabilities()()irniiirwxyrPl−=∂Θ∂∑=1,,µDWinner Take All• Use the decision of the single component classifier that is most confident• Has the largest discriminant value,• Sub-optimal, easy to implement• Can work well if classifiers are experts in separate regions of the input space grjComponent Classifiers Without Discriminants• Wish to create ensemble classifier from highly trained component classifiers some of which may not compute discriminant functions• Output values– Analog values, e.g., neural network– Rank order, e.g., k-nearest neighbor– One of c, e.g., rule-based systemConversion of Analog values output by componentclassifiers to Discriminant Values using SOFTMAX transformation∑==cjigegejig1~~gi~Conversion of Rank Order values to Discriminant ValuesAssume discriminant function islinearly proportional to rank order on the listResulting gishould be normalizedso that they sum to oneOne of c:If the output is a one-of-c representation, in which a singleCategory is identified, we let gj=1.0 and 0.0 otherwiseConversion to Discriminant


View Full Document
Download Combining Classifiers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Combining Classifiers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Combining Classifiers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?