DOC PREVIEW
MSU CSE 847 - recap
Course Cse 847-
Pages 41

This preview shows page 1-2-3-19-20-39-40-41 out of 41 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 41 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ReviewComparison of Different Classification ModelsK Nearest Neighbor (kNN) ApproachK Nearest Neighbor Approach (KNN)Slide 5Slide 6Weighted K Nearest NeighborWeight K Nearest NeighborGaussian Generative ModelSlide 10Slide 11Slide 12Slide 13Slide 14Logistic Regression ModelSlide 16Non-linear Logistic Regression ModelSlide 18Slide 19Localized Logistic Regression ModelSlide 21Conditional Exponential ModelMaximum Entropy ModelSupport Vector MachineSlide 25Slide 26Slide 27Logistic Regression Model vs. Support Vector MachineSlide 29Kernel TricksFisher KernelKernel Methods in Generative ModelMulti-class SVMError Correct Output Code (ECOC)Ordinal RegressionSlide 36Decision TreeSlide 38Slide 39Slide 40Generalize Decision TreeReviewRong JinComparison of Different Classification ModelsThe goal of all classifiersPredicating class label y for an input xEstimate p(y|x)K Nearest Neighbor (kNN) Approach (k=1)(k=4) Probability interpretation: estimate p(y|x) as( ){ }, | , ( )( | ) , ( ) is the neighborhood around | ( ) |i i i ix y y y x N xp y x N x xN x= �=K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �K Nearest Neighbor Approach (KNN)What is the appropriate size for neighborhood N(x)?Leave one out approachWeight K nearest neighborNeighbor is defined through a weight functionEstimate p(y|x)How to estimate the appropriate value for 2?( ) ( , )( | )( )i iiiiw x y yp y xw xd=��222( ) exp2iix xw xs� �-� �= -� �� �Weighted K Nearest NeighborLeave one out + maximum likelihoodEstimate leave one out probability Leave one out likelihood of training dataSearch the optimal 2 by maximizing the leave one out likelihood( ) ( , )1 ( ) ( , )( | )( ) 1 ( )i j j ii j i j j iij ji j i ji j iw x y yw x y yp y xw x w xdd��- += =- +��� �%LOO1 11 ( ) ( , )log ( | ) log1 ( )n ni j j iij jj ji jiw x y yl p y xw xd= =� �- += =� �� �- +� ��� ��%Weight K Nearest NeighborLeave one out + maximum likelihoodEstimate leave one out probability Leave one out likelihood of training dataSearch the optimal 2 by maximizing the leave one out likelihood( ) ( , )1 ( ) ( , )( | )( ) 1 ( )i j j ii j i j j iij ji j i ji j iw x y yw x y yp y xw x w xdd��- += =- +��� �%LOO1 11 ( ) ( , )log ( | ) log1 ( )n ni j j iij jj ji jiw x y yl p y xw xd= =� �- += =� �� �- +� ��� ��%Gaussian Generative Modelp(y|x) ~ p(x|y) p(y): posterior = likelihood  priorEstimate p(x|y) and p(y)Allocate a separate set of parameters for each class  {1, 2,…, c}p(xly; )  p(x;y)Maximum likelihood estimation222( )1( | ) exp22yyyxp x ymsps� �-� �= -� �� ��( )2221 1( )1log ( | ) log log 2 log22ii i iiN Ni yi i y y yi iyxl p x y p pmpss= =-= + = - +� �Gaussian Generative Modelp(y|x) ~ p(x|y) p(y): posterior = likelihood  priorEstimate p(x|y) and p(y)Allocate a separate set of parameters for each class  {1, 2,…, c}p(xly; )  p(x;y)Maximum likelihood estimation222( )1( | ) exp22yyyxp x ymsps� �-� �= -� �� ��( )2221 1( )1log ( | ) log log 2 log22ii i iiN Ni yi i y y yi iyxl p x y p pmpss= =-= + = - +� �Gaussian Generative ModelDifficult to estimate p(x|y) if x is of high dimensionalityNaïve Bayes:Essentially a linear modelHow to make a Gaussian generative model discriminative?(m,m) of each class are only based on the data belonging to that class  lack of discriminative power1 2( | ; ) ( | ; ) ( | ; )... ( | ; )dp x y p x y p x y p x yq q q q�rGaussian Generative ModelMaximum likelihood estimation2222'' 1'22' 1''( )1exp22( | ) ( )( | )( )( | ') ( ')1exp22yyyyccyyyyyyxpp x y p yp y xxp x y p ypmspsmsps==� �-� �-� �� ��= =� �-� �-� �� ����( )122' '22 221 ' 1''log ( | )( )( )1log 2 log log exp22 22ii iiNi iiN ci yy i yy yi yy yyl p y xxp xpmmpss sps== ==� �� �� �--� � ��� �= - - - -� � ��� �� �� � ���� ���� �How to optimize this objective function?Gaussian Generative ModelBound optimization algorithm( ) ( ){ }( ) ( ){ }1 1 1' ' ' ' ' '1 1 12 ' ''2 ' 2'', , ,..., , , : parameter of current iteration' , , ,..., , , : parameter of last iteration( ) ( ')( )(2 )1log log22log2i i i i i ii i ic c cc c cy y y y i y yy y yyyp pp pl lp xppq m s m sq m s m sq qs m m m ms sps==- =� �- - -� �- -� �� �+' 2 21' ' ''2 2'2 2' 1 ' 1' '' '( ) ( )exp log exp2 22Nc cii y y i yy yy yyx p xm ms sps== =� �� �� �� �� �� � � �� � � �- -� �� � � �� � � �- - -� �� � � �� � � �� � � �� �� � � ���� � � ���� �Gaussian Generative Model2 ' ''2 ' 22' '22' 1'1'' ' 2' ''2'2' 1''( ) ( ')( )(2 )1log log22( )exp22( )exp22i i i i i ii i iy y y y i y yy y ycNy i yyyiycy i yyyyl lp xpp xp xq qs m m m ms smspsmsps===- �� �� �- - -� �� �- -� �� �� �� �� �� �-� �� �-� �� �� �� ��-� �� �-� �� �-� �� �� �� ������Using log 1x x� -We have decomposed the interaction of parameters between different classesQuestion: how to handle x with multiple features ?Logistic Regression ModelA linear decision boundary: wx+bA probabilistic model p(y|x)Maximum likelihood approach for estimating weights w and threshold b0 positive0 negativew x bw x b�+ > ��+ < �r rr r1( 1| )1 exp( ( ))p y xy w x b=� =+ - �+rr r( ) ( )1 1( ) ( )1 1( ) log ( | ) log ( | )1 1log log1 exp 1 expN Ntrain i ii iN Ni ii il D p x p xw x b w x b+ -+ -= =+ -= =+ -= + + -= +� � � �+ - � - + � +� � � �� �� �r rr


View Full Document

MSU CSE 847 - recap

Course: Cse 847-
Pages: 41
Download recap
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view recap and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view recap 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?