Midterm Review Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 1, 2010 See practice exams at: http://www.cs.cmu.edu/~tom/10601_sp09/601-sp09-midterm-solutions.pdf http://select.cs.cmu.edu/class/10701-F09/exams.html Midterm is open book, open notes, NO computers Covers all material presented up through today’s class.Some Topics We’ve Covered Decision trees entropy, overfitting Probability basics rv’s, manipulating probabilities, Bayes rule, MLE, MAP, conditional indep. Instance-based learning nearest nbr., density estimation, Bayes optimal classifier Naïve Bayes conditional indep, # of parameters to estimate, Logistic regression form of P(Y|X) implied by N. Bayes, generative vs. discriminative Linear Regression minimizing sum sq. error ~ MLE regularization ~ MAP, non-linear Neural Networks gradient descent, learning hidden representations Model Selection overfitting, bias-variance Clustering k-means, mixture Gaussians, EM Hidden Markov Models time series model, backward-forward Bayesian Networks factored representation of joint distribution, encoding conditional independence assumptionsNaïve Bayes Logistic Regr. Linear Regr. Neural net Dec. Tree Gaussian Mixture model HMM Bayes Net kNN representation of P(Y|X) optimization objective convergence guarantee? other assumptions? decision surfaceFour Fundamentals for ML 1. Learning is an optimization problem 2. Learning is a parameter estimation problem 3. Error arises from three sources 4. Practical learning requires modeling assumptions, such as …Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? – naïve Bayes? logistic regression? linear regression?Learning is parameter estimation – the more training data, the more accurate the estimates – to measure accuracy of learned model, we must use test (not train) data – cross validationError arises from three sources – Bayes optimal error, bias, varianceBias and Variance given some estimator Y for some parameter θ, we note Y is a random variable (why?) the bias of estimator Y : the variance of estimator Y : consider when • θ is the probability of “heads” for my coin • Y = proportion of heads observed from 3 flipsPractical learning requires making assumptions – Why? – form of the f:X Y, or P(Y|X), or P(…) to be learned – priors on parameters MAP, regularization – Conditional independence Naive Bayes, Bayes netsFour Fundamentals for ML 1. Learning is an optimization problem – many algorithms are best understood as optimization algs – what objective do they optimize, and how? 2. Learning is a parameter estimation problem – the more training data, the more accurate the estimates – MLE, MAP, M(Conditional)LE, … – to measure accuracy of learned model, we must use test (not train) data 3. Error arises from three sources – Bayes optimal error, bias, variance 4. Practical learning requires modeling assumptions – Why? – form of the f:X Y, or P(Y|X) to be learned – priors on parameters: MAP, regularization – Conditional independence: Naive Bayes, Bayes nets,
View Full Document