INTRODUCTIONTOMachineLearningETHEM ALPAYDIN© The MIT Press, [email protected]://www.cmpe.boun.edu.tr/~ethem/i2mlLecture Slides forCHAPTER4:ParametricMethodsLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)3Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation: Assume a form for p (x | θ) and estimate θ,its sufficient statistics, using Xe.g., N ( µ, σ2) where θ = { µ, σ2}Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)4Maximum Likelihood Estimation Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏tp (xt|θ) Log likelihoodL(θ|X) = log l (θ|X) = ∑tlog p (xt|θ) Maximum likelihood estimator (MLE)θ*= argmaxθL(θ|X)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)5Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1} P (x) = pox(1 – po )(1 – x)L (po|X) = log ∏tpoxt(1 – po )(1 – xt) MLE: po = ∑txt / N Multinomial: K>2 states, xiin {0,1}P (x1,x2,...,xK) = ∏ipixiL(p1,p2,...,pK|X) = log ∏t ∏ipixitMLE: pi = ∑txit / NLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)6Gaussian (Normal) Distribution p(x) = N ( µ, σ2) MLE for µ and σ2:µσLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)7Bias and VarianceUnknown parameter θEstimator di= d (Xi) on sample XiBias: bθ(d) = E [d] – θVariance: E [(d–E [d])2]Mean square error: r (d,θ) = E [(d–θ)2]= (E [d] – θ)2+ E [(d–E [d])2]= Bias2+ VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)8Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP= argmaxθp(θ|X) Maximum Likelihood (ML): θML= argmaxθp(X|θ) Bayes’: θBayes’= E[θ|X] = ∫ θ p(θ|X) dθLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)9Bayes’ Estimator: Example xt ~ N (θ, σo2) and θ ~ N ( µ, σ2) θML= m θMAP= θBayes’=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)10Parametric ClassificationLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)11 Given the sample ML estimates are Discriminant becomesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)12Equal variancesSingle boundary athalfway between meansLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)13Variances are differentTwo boundariesLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)14RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)15Regression: From LogL to ErrorLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)16Linear RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)17Polynomial RegressionLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)18Other Error Measures Square Error: Relative Square Error: Absolute Error: E (θ|X) = ∑t|rt-– g(xt|θ)| ε-sensitive Error: E (θ|X) = ∑t1(|rt-– g(xt|θ)|>ε) (|rt – g(xt|θ)| – ε)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)19Bias and VarianceLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)20Estimating Bias and Variance M samples are used to fit gi (x), i =1,...,MLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)21Bias/Variance Dilemma Example: has no variance and high bias has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)22biasvariancefgigfLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)23Polynomial RegressionBest fit “min error”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)24Best fit, “elbow”Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)25Model Selection Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex modelsAkaike’s information criterion (AIC), Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)26Bayesian Model Selection Prior on models, p(model) Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter
View Full Document