DOC PREVIEW
STEVENS CS 559 - Machine Learning Fundamentals and Applications 3rd Set of Notes

This preview shows page 1-2-3-4-5-6-39-40-41-42-43-79-80-81-82-83-84 out of 84 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications3rdSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215OverviewOverview•DHS Ch2 (cont ):Discriminant FunctionsDHS Ch. 2 (cont.): Discriminant Functions for the Normal Density• DHS Ch. 3: Maximum-Likelihood & BiP EiiBayesian Parameter Estimation Pattern Classification, Chapter 2 2DHS Chapter2DHS Chapter 2 Bayesian Decision Theory (S 2629)(Sections 2-6,2-9)Discriminant Functions for the Normal DensityDiscriminant Functions for the Normal DensityBayes Decision Theory –Discrete Features3Discriminant Functions for the Normal DiDensity•We saw that the minimum error-rateWe saw that the minimum errorrate classification can be achieved by the discriminant functiondiscriminant functiong(x) =lnP(x |)+lnP()gi(x) = lnP(x | i) + lnP(i)Cflii l•Case of multivariate normal)(lnln12ln)()(1)(1iiitiiPdxxxg4Pattern Classification, Chapter 2)(lnln22ln2)()(2)(iiiiiiPxxxg•Casei=2I(Istands for the identity matrix)Case i I(Istands for the identity matrix)function)ntdiscrimina(linear)(0itiiwxwxg1:wherefunction)nt discrimina(linear )(0iiiwxwxgcategory)thfor thethresholdthecalledis()(ln21 ; 202iiitiiiiwPwwProve it!category)th for thethresholdthecalled is (0iiw5Pattern Classification, Chapter 2Linear MachinesLinear Machines–A classifier that uses linear discriminantA classifier that uses linear discriminant functions is called “a linear machine”– The decision surfaces for a linear machine are pieces of hyperplanes defined by:gi(x) = gj(x)6Pattern Classification, Chapter 27Pattern Classification, Chapter 2– The hyperplane separatingRiand Rj)(d)(tt)()( :boundaryDecision )(and )(00xgxgwxwxgwxwxgjijtjjitii0)(0 xxwtw)()()(ln)(21220jiijijiPPxw)(220jijjijiPalways orthogonal to the line linking the means)(1xthen)(P)(Pif8Pattern Classification, Chapter 2)(2x then )(P)(P ifji0ji9Pattern Classification, Chapter 210Pattern Classification, Chapter 2• Case i=  (covariances of all classes are identical but arbitrary!)are identical but arbitrary!)HyperplaneseparatingRandR–Hyperplaneseparating Riand Rj1iiw).()()()(/)(ln)(2110 jijitjijijiiiPPx(the hyperplane separating Riand Rjis jijiijgenerally not orthogonal to the line between the means)11Pattern Classification, Chapter 212Pattern Classification, Chapter 213Pattern Classification, Chapter 2• Case i= arbitrary– The covariance matrices are different for each category)( 0itiitiwxwxWxxg1W:)(10iiiiwhereg w2 W1i1iiii)(lnln2121 w10 iiiitiiP(Hyperquadrics which are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloidshyperhyperboloids)hyperparaboloids, hyperhyperboloids)14Pattern Classification, Chapter 215Pattern Classification, Chapter 216Pattern Classification, Chapter 2Bayes Decision Theory – Discrete Features•Components ofxare binary or integer valued x canComponents of xare binary or integer valued, x can take only one of m discrete values v1, v2, …, vm• Case of independent binary features in 2 category problemproblemLet x = [x1, x2, …, xd]t where each xiis either 0 or 1, with probabilities:pi= P(xi= 1 | 1)qi= P(xi= 1 | 2)qi(i|2)17Pattern Classification, Chapter 2)1()|(11ppxPdxixiii)1()|()1()|(111qqxPppxPdxxiiiii:ratio Likelihood)1()|(12qqxPiii)11()( )|()|( 11qpqpxPxPdxixiii:functionnt Discrimina1)|(12qqxPiii)()(ln11ln)1(lng(x) 211PPqpxqpxdiiiiiii18Pattern Classification, Chapter 2)(01wxwxgidii)1(:qpwhereii:,...,1 )1()1(ln ianddipqqpwiiii)()(ln1ln :10Ppwanddi0g(x) if and 0g(x) if )(121210decidePqii19Pattern Classification, Chapter 2Exercise:DHS Problem212Exercise: DHS Problem 2.12Let ωmax(x) be the state of nature for which P(|)P(|)f lli1P(ωmax|x)>=P(ωi|x) for all i=1,…,c• Show that P(ωmax|x)>=1/c• Show that for the minimum-error-rate decision rule the average probability of error is given by • Use these two results to show that P(error)<= (1)/dxxpxPerrorP)()|(1)(max(c-1)/c• Describe a situation for which P(error)=(c-1)/cPattern Classification, Chapter 3 20Discriminant Functions for the Normal DiF lid #5Density•From slide #5• Case i= 2I:where )(0itiiwxwxg )(ln21 ; :where202iitiiiiPww)(2202iiiii21Discriminant Function ExampleDiscriminant Function ExampleO. Veksler 22Discriminant Function ExampleDiscriminant Function ExampleO. Veksler 23Discriminant Function ExampleDiscriminant Function ExampleO. Veksler 24Discriminant Function ExampleDiscriminant Function ExampleO. Veksler 25Chapter 3:Chapter 3:Maximum-Likelihood & Bayesian Parameter Estimation (part 1)Pattern Classification, Chapter 3 26IntroductionIntroduction• Data availability in a Bayesian frameworkyy– We could design an optimal classifier if we knew:• p(i) (priors)•p(x|)(class-conditional densities)•p(x | i) (class-conditional densities)Unfortunately, we rarely have this complete information!• Design a classifier from a training sampleggp– No problem with prior estimation– Samples are often too small for class-conditional estimation (large dimension of featurespace)estimation (large dimension of feature space)Pattern Classification, Chapter 3 27Parameter EstimationParameter Estimation•Use apriori information about the problemUse a priori information about the problem•E.g.: Normalityofp(x|i)E.g.: Normality of p(x | i)p(x | i) ~ N( i, i)p(|i)(ii)•Simplify problemSimplify problem– From estimating unknown distribution function–To estimating parameters gpPattern Classification, Chapter 3 28Estimation techniquesEstimation techniques•Maximum-Likelihood(ML) andBayesianMaximumLikelihood (ML) and Bayesian estimation•Results areoften identical, but theResults are often identical, but the approaches are fundamentally


View Full Document

STEVENS CS 559 - Machine Learning Fundamentals and Applications 3rd Set of Notes

Download Machine Learning Fundamentals and Applications 3rd Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine Learning Fundamentals and Applications 3rd Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine Learning Fundamentals and Applications 3rd Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?