ReviewComparison of Different Classification ModelsK Nearest Neighbor (kNN) ApproachK Nearest Neighbor Approach (KNN)Slide 5Slide 6Weighted K Nearest NeighborWeight K Nearest NeighborGaussian Generative ModelSlide 10Slide 11Slide 12Slide 13Slide 14Logistic Regression ModelSlide 16Non-linear Logistic Regression ModelSlide 18Slide 19Localized Logistic Regression ModelSlide 21Conditional Exponential ModelMaximum Entropy ModelSupport Vector MachineSlide 25Slide 26Slide 27Logistic Regression Model vs. Support Vector MachineSlide 29Kernel TricksFisher KernelKernel Methods in Generative ModelMulti-class SVMError Correct Output Code (ECOC)Ordinal RegressionSlide 36Decision TreeSlide 38Slide 39Slide 40Generalize Decision TreeReviewRong JinComparison of Different Classification ModelsThe goal of all classifiersPredicating class label y for an input xEstimate p(y|x)K Nearest Neighbor (kNN) Approach (k=1)(k=4) Probability interpretation: estimate p(y|x) as( ){ }, | , ( )( | ) , ( ) is the neighborhood around | ( ) |i i i ix y y y x N xp y x N x xN x= �=K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �Weighted K Nearest NeighborLeave one out + maximum likelihoodEstimate leave one out probability Leave one out likelihood of training dataSearch the optimal 2 by maximizing the leave one out likelihood( ) ( , )1 ( ) ( , )( | )( ) 1 ( )i j j ii j i j j iij ji j i ji j iw x y yw x y yp y xw x w xdd��- += =- +��� �%LOO1 11 ( ) ( , )log ( | ) log1 ( )n ni j j iij jj ji jiw x y yl p y xw xd= =� �- += =� �� �- +� ��� ��%Weight K Nearest NeighborLeave one out + maximum likelihoodEstimate leave one out probability Leave one out likelihood of training dataSearch the optimal 2 by maximizing the leave one out likelihood( ) ( , )1 ( ) ( , )( | )( ) 1 ( )i j j ii j i j j iij ji j i ji j iw x y yw x y yp y xw x w xdd��- += =- +��� �%LOO1 11 ( ) ( , )log ( | ) log1 ( )n ni j j iij jj ji jiw x y yl p y xw xd= =� �- += =� �� �- +� ��� ��%Gaussian Generative Modelp(y|x) ~ p(x|y) p(y): posterior = likelihood priorEstimate p(x|y) and p(y)Allocate a separate set of parameters for each class {1, 2,…, c}p(xly; ) p(x;y)Maximum likelihood estimation222( )1( | ) exp22yyyxp x ymsps� �-� �= -� �� ��( )2221 1( )1log ( | ) log log 2 log22ii i iiN Ni yi i y y yi iyxl p x y p pmpss= =-= + = - +� �Gaussian Generative Modelp(y|x) ~ p(x|y) p(y): posterior = likelihood priorEstimate p(x|y) and p(y)Allocate a separate set of parameters for each class {1, 2,…, c}p(xly; ) p(x;y)Maximum likelihood estimation222( )1( | ) exp22yyyxp x ymsps� �-� �= -� �� ��( )2221 1( )1log ( | ) log log 2 log22ii i iiN Ni yi i y y yi iyxl p x y p pmpss= =-= + = - +� �Gaussian Generative ModelDifficult to estimate p(x|y) if x is of high dimensionalityNaïve Bayes:Essentially a linear modelHow to make a Gaussian generative model discriminative?(m,m) of each class are only based on the data belonging to that class lack of discriminative power1 2( | ; ) ( | ; ) ( | ; )... ( | ; )dp x y p x y p x y p x yq q q q�rGaussian Generative ModelMaximum likelihood estimation2222'' 1'22' 1''( )1exp22( | ) ( )( | )( )( | ') ( ')1exp22yyyyccyyyyyyxpp x y p yp y xxp x y p ypmspsmsps==� �-� �-� �� ��= =� �-� �-� �� ����( )122' '22 221 ' 1''log ( | )( )( )1log 2 log log exp22 22ii iiNi iiN ci yy i yy yi yy yyl p y xxp xpmmpss sps== ==� �� �� �--� � ��� �= - - - -� � ��� �� �� � ���� ���� �How to optimize this objective function?Gaussian Generative ModelBound optimization algorithm( ) ( ){ }( ) ( ){ }1 1 1' ' ' ' ' '1 1 12 ' ''2 ' 2'', , ,..., , , : parameter of current iteration' , , ,..., , , : parameter of last iteration( ) ( ')( )(2 )1log log22log2i i i i i ii i ic c cc c cy y y y i y yy y yyyp pp pl lp xppq m s m sq m s m sq qs m m m ms sps==- =� �- - -� �- -� �� �+' 2 21' ' ''2 2'2 2' 1 ' 1' '' '( ) ( )exp log exp2 22Nc cii y y i yy yy yyx p xm ms sps== =� �� �� �� �� �� � � �� � � �- -� �� � � �� � � �- - -� �� � � �� � � �� � � �� �� � � ���� � � ���� �Gaussian Generative Model2 ' ''2 ' 22' '22' 1'1'' ' 2' ''2'2' 1''( ) ( ')( )(2 )1log log22( )exp22( )exp22i i i i i ii i iy y y y i y yy y ycNy i yyyiycy i yyyyl lp xpp xp xq qs m m m ms smspsmsps===- �� �� �- - -� �� �- -� �� �� �� �� �� �-� �� �-� �� �� �� ��-� �� �-� �� �-� �� �� �� ������Using log 1x x� -We have decomposed the interaction of parameters between different classesQuestion: how to handle x with multiple features ?Logistic Regression ModelA linear decision boundary: wx+bA probabilistic model p(y|x)Maximum likelihood approach for estimating weights w and threshold b0 positive0 negativew x bw x b�+ > ��+ < �r rr r1( 1| )1 exp( ( ))p y xy w x b=� =+ - �+rr r( ) ( )1 1( ) ( )1 1( ) log ( | ) log ( | )1 1log log1 exp 1 expN Ntrain i ii iN Ni ii il D p x p xw x b w x b+ -+ -= =+ -= =+ -= + + -= +� � � �+ - � - + � +� � � �� �� �r rr
View Full Document