DOC PREVIEW
Rutgers University CS 536 - Introduction to Machine Learning

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER4:ParametricMethodsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x | θ) and estimate θ,its sufficient statistics, using Xe.g., N ( µ, σ2) where θ = { µ, σ2}Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Maximum Likelihood Estimation Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏tp (xt|θ) Log likelihoodL(θ|X) = log l (θ|X) = ∑tlog p (xt|θ) Maximum likelihood estimator (MLE)θ*= argmaxθL(θ|X)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox(1 – po )(1 – x)L (po|X) = log ∏tpoxt(1 – po )(1 – xt) MLE: po = ∑txt / N Multinomial: K>2 states, xiin {0,1}P (x1,x2,...,xK) = ∏ipixiL(p1,p2,...,pK|X) = log ∏t ∏ipixitMLE: pi = ∑txit / NLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Gaussian (Normal) Distribution p(x) = N ( µ, σ2) MLE for µ and σ2:µσLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7Bias and VarianceUnknown parameter θEstimator di= d (Xi) on sample XiBias: bθ(d) = E [d] – θVariance: E [(d–E [d])2]Mean square error: r (d,θ) = E [(d–θ)2]= (E [d] – θ)2+ E [(d–E [d])2]= Bias2+ VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)  Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP= argmaxθp(θ|X) Maximum Likelihood (ML): θML= argmaxθp(X|θ) Bayes’: θBayes’= E[θ|X] = ∫ θ p(θ|X) dθLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Bayes’ Estimator: Example xt ~ N (θ, σo2) and θ ~ N ( µ, σ2) θML= m θMAP= θBayes’=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10Parametric ClassificationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11 Given the sample ML estimates are Discriminant becomesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Equal variancesSingle boundary athalfway between meansLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Variances are differentTwo boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Regression: From LogL to ErrorLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Linear RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Polynomial RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Other Error Measures Square Error:  Relative Square Error: Absolute Error: E (θ|X) = ∑t|rt-– g(xt|θ)| ε-sensitive Error: E (θ|X) = ∑t1(|rt-– g(xt|θ)|>ε) (|rt – g(xt|θ)| – ε)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)19Bias and VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20Estimating Bias and Variance M samples are used to fit gi (x), i =1,...,MLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Bias/Variance Dilemma Example: has no variance and high bias has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22biasvariancefgigfLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Polynomial RegressionBest fit “min error”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Best fit, “elbow”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex modelsAkaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter


View Full Document
Download Introduction to Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?