DOC PREVIEW
STEVENS CS 559 - CS 559 4th Set of Notes

This preview shows page 1-2-3-4-5-6-39-40-41-42-43-79-80-81-82-83-84 out of 84 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 559: Machine LearningCS 559: Machine Learning Fundamentals and Applications4thSet of NotesInstructor: Philippos MordohaiWebpage: www.cs.stevens.edu/~mordohaiEmail:Philippos Mordohai@stevens eduE-mail: [email protected]: Lieb 215Pop-up Quiz3Popup Quiz 32OverviewOverview•DHSCh 3:Maximum-Likelihood &DHS Ch. 3: MaximumLikelihood & Bayesian Parameter Estimation (recap)•Expectation Maximization (EM)•Expectation Maximization (EM)– Covered in DHS, but will present differentlyPattern Classification, Chapter 3 3Parameter EstimationParameter Estimation•Bayesian decision theoryBayesian decision theory– We could design an optimal classifier if we knew:•p(i) (priors)p(i)(p )• p(x | i) (class-conditional densities)• Simplify problem: From estimating unknown distribution function to estimating parameters –Assume specific form for distributions (e.g. Gaussian)Pattern Classification, Chapter 3 4Parameter EstimationParameter Estimation• Parameters in ML estimation are fixed but unknown• Best parameters are obtained by maximizing the probability of obtaining the samples observed• Bayesian methods view the parameters as random variables having some known distributiong• In either approach, we use p(i| x) for our classificationruleclassification rulePattern Classification, Chapter 3 5LikelihoodLikelihood•Use set of independent samples toUse set of independent samples to estimate p(D | ) Let D = {xxx}–Let D = {x1, x2, …, xn}p(x1,…, xn| ) = 1,np(xk| ); |D| = nOur goal is to determine (value of  that best agrees with observed training data)ˆgg)• Note if D is fixed p(D| ) is not a densityPattern Classification, Chapter 3 6LikelihoodProblem: find such that:LikelihoodˆProblem: find such that:))1|θ,...,xMaxp(xp(D|θMax)))1nknθ|θp(xMax |θ,...,xMaxp(xp(D|θMax)1kk|p(Pattern Classification, Chapter 3 7• Use the informationprovided by the training samples to estimateprovided by the training samples to estimate  = (1, 2, …, c), each i(i = 1, 2, …, c) is associated with each categorygy• Suppose that D contains n samples, x1, x2,…, xn)|()|(1nkkkxpDp• p(D| ) is called the likelihood of  w.r.t the set of samples• ML estimate of  is, by definition the value that maximizesp(D|)ˆmaximizes p(D | )“It is the value of  that best agrees with the actually observed training sample”Pattern Classification, Chapter 3 8• Optimal estimation–Let=(12)tand letbe the gradient operatorLet  (1, 2, …, p)and let be the gradient operatortp21,...,,– We define l() as the log-likelihood functionl() = ln p(D | )– New problem statement:determine  that maximizes the log-likelihood)(maxargˆlPattern Classification, Chapter 3 9Example of ML estimation: unknown –p(xi| ) ~ N(, )(Samples are drawn from a multivariate normal population))()(21)2(ln21)|(ln1ktkdkxxxp)()|(ln221kkkkkxxp =  therefore:•The ML estimate formust satisfy:•The ML estimate for must satisfy:0)ˆ(1knkxPattern Classification, Chapter 3 101k•Multiplying by and rearranging, we obtain:py g yggnkkx1ˆJust the arithmetic average of the sampleskn1Just the arithmetic average of the samples of the training samples!Pattern Classification, Chapter 3 11Bayesian EstimationBayesian Estimation• Suppose we have some idea of the range where ppgthe parameters θ should be– Shouldn’t we formalize this prior knowledge in hope that it will lead to betterparameter estimation?that it will lead to better parameter estimation?•Letθbe a random variable withprior distributionLet θbe a random variable with prior distribution P(θ)– This is the key difference between ML and Bayesian ttitiparameter estimation– This key assumption allows us to fully exploit the information provided by the dataPattern Classification, Chapter 2 12MotivationMotivation•As in MLE supposep(x|θ)is completelyAs in MLE, suppose p(x|θ) is completely specified if θ is given•But nowθis a random variable with prior•But now θis a random variable with prior p(θ)U lik MLE(|θ)iditil–Unlike MLE case, p(x|θ) is a conditional densityAft b th d t D i B•After we observe the data D, using Bayes rule we can compute the posterior p(θ|D)Pattern Classification, Chapter 2 13MotivationMotivation•Recall that for the MAP classifier we find theRecall that for the MAP classifier we find the class ωithat maximizes the posterior p(ω|D)•Byanalogy a reasonable estimate ofθis the•By analogy, a reasonable estimate of θis the one that maximizes the posterior p(θ|D)•Butθis not our final goal our final goal isthe•But θis not our final goal, our final goal is the unknown p(x)•Therefore a better thing to do is to maximize•Therefore a better thing to do is to maximize p(x|D), this is as close as we can come to the unknown p(x) !unknown p(x) !Pattern Classification, Chapter 2 14Estimation of p(x|D)Estimation of p(x|D)• From the definition of joint distribution:• Using the definition of conditional gprobability:• But p(x|θ,D)=p(x|θ) since p(x|θ) is completely specified byθcompletely specified byθPattern Classification, Chapter 2 15Estimation of p(x|D)Estimation of p(x|D)• Using Bayes formula:gy• p(x|D) can be computedPattern Classification, Chapter 2 16Bayesian Estimation vs MLEBayesian Estimation vs. MLE•In practice, it may be hard to do integrationIn practice, it may be hard to do integration analytically and we may have to resort to numerical methods• Contrast this with the MLE solution which requires differentiation of likelihood to getqg– Differentiation is easy and can always be done analyticallyPattern Classification, Chapter 2 17BE vs MLEBE vs. MLE• BE: p(x|D) can be thought of as the weighted average of the proposed model all possible values ofθthe proposed model all possible values of θ• Contrast this with the MLE solution which always gives us a single model:• When we have many possible solutions, taking their sum averaged by their probabilities seems better than spitting out one solutionPattern Classification, Chapter 2 18Bayesian Parameter Estimation: GiCGaussian CaseGoal: Estimate  using


View Full Document

STEVENS CS 559 - CS 559 4th Set of Notes

Download CS 559 4th Set of Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 559 4th Set of Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 559 4th Set of Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?