Rutgers University CS 536 - Introduction to Machine Learning - D2025111

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> Introduction to Machine Learning

DOC PREVIEW

Rutgers University CS 536 - Introduction to Machine Learning

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 26

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 26 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER4:ParametricMethodsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x | θ) and estimate θ,its sufficient statistics, using Xe.g., N ( µ, σ2) where θ = { µ, σ2}Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Maximum Likelihood Estimation Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏tp (xt|θ) Log likelihoodL(θ|X) = log l (θ|X) = ∑tlog p (xt|θ) Maximum likelihood estimator (MLE)θ*= argmaxθL(θ|X)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox(1 – po )(1 – x)L (po|X) = log ∏tpoxt(1 – po )(1 – xt) MLE: po = ∑txt / N Multinomial: K>2 states, xiin {0,1}P (x1,x2,...,xK) = ∏ipixiL(p1,p2,...,pK|X) = log ∏t ∏ipixitMLE: pi = ∑txit / NLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Gaussian (Normal) Distribution p(x) = N ( µ, σ2) MLE for µ and σ2:µσLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7Bias and VarianceUnknown parameter θEstimator di= d (Xi) on sample XiBias: bθ(d) = E [d] – θVariance: E [(d–E [d])2]Mean square error: r (d,θ) = E [(d–θ)2]= (E [d] – θ)2+ E [(d–E [d])2]= Bias2+ VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)  Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP= argmaxθp(θ|X) Maximum Likelihood (ML): θML= argmaxθp(X|θ) Bayes’: θBayes’= E[θ|X] = ∫ θ p(θ|X) dθLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Bayes’ Estimator: Example xt ~ N (θ, σo2) and θ ~ N ( µ, σ2) θML= m θMAP= θBayes’=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10Parametric ClassificationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11 Given the sample ML estimates are Discriminant becomesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Equal variancesSingle boundary athalfway between meansLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Variances are differentTwo boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Regression: From LogL to ErrorLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Linear RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Polynomial RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Other Error Measures Square Error:  Relative Square Error: Absolute Error: E (θ|X) = ∑t|rt-– g(xt|θ)| ε-sensitive Error: E (θ|X) = ∑t1(|rt-– g(xt|θ)|>ε) (|rt – g(xt|θ)| – ε)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)19Bias and VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20Estimating Bias and Variance M samples are used to fit gi (x), i =1,...,MLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Bias/Variance Dilemma Example: has no variance and high bias has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22biasvariancefgigfLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Polynomial RegressionBest fit “min error”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Best fit, “elbow”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex modelsAkaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter

View Full Document