Linear Regression Aarti Singh Machine Learning 10 701 15 781 Sept 27 2010 Discrete to Continuous Labels Classification Sports Science News X Document Y Topic Anemic cell Healthy cell X Cell Image Y Diagnosis Regression Stock Market Prediction Y X Feb01 2 Regression Tasks Weather Prediction Y Temp X 7 pm Estimating Contamination X new location Y sensor reading 3 Supervised Learning Goal Sports Science News Y X Feb01 Classification Probability of Error Regression Mean Squared Error 4 Regression Optimal predictor Conditional Mean Intuition Signal plus zero mean Noise model 5 Regression Optimal predictor Proof Strategy Dropping subscripts for notational convenience 0 6 Regression Optimal predictor Conditional Mean Intuition Signal plus zero mean Noise model Depends on unknown distribution 7 Regression algorithms Learning algorithm Linear Regression Lasso Ridge regression Regularized Linear Regression Nonlinear Regression Kernel Regression Regression Trees Splines Wavelet estimators 8 Empirical Risk Minimization ERM Optimal predictor Empirical Risk Minimizer Class of predictors Empirical mean Law of Large Numbers More later 9 ERM you saw it before Learning Distributions Max likelihood Min ve log likelihood empirical risk What is the class F Class of parametric distributions Bernoulli q Gaussian m s2 10 Linear Regression Least Squares Estimator Class of Linear functions b2 slope Uni variate case b1 intercept Multi variate case 1 where 11 Least Squares Estimator 12 Least Squares Estimator 13 Normal Equations p xp p x1 If p x1 is invertible When is invertible Recall Full rank matrices are invertible What is rank of What if is not invertible Regularization later 14 Geometric Interpretation Difference in prediction on training set 0 is the orthogonal projection of onto the linear subspace spanned by the columns of 15 Revisiting Gradient Descent Even when is invertible might be computationally expensive if A is huge Gradient Descent since J b is convex Initialize Update 0 if Stop when some criterion met e g fixed iterations or 16 Effect of step size Large Fast convergence but larger residual error Also possible oscillations Small Slow convergence but small residual error 17 Least Squares and MLE Intuition Signal plus zero mean Noise model log likelihood Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model 19 Regularized Least Squares and MAP What if is not invertible log likelihood log prior I Gaussian Prior 0 Ridge Regression Closed form HW Prior belief that is Gaussian with zero mean biases solution to small 20 Regularized Least Squares and MAP What if is not invertible log likelihood log prior II Laplace Prior Lasso Prior belief that is Laplace with zero mean biases solution to small 21 Ridge Regression vs Lasso Ridge Regression Lasso HOT Ideally l0 penalty but optimization becomes non convex s with constant J level sets of J s with constant l2 norm s with constant l1 norm 2 s with constant l0 norm 1 Lasso l1 penalty results in sparse solutions vector with more zero coordinates Good for high dimensional problems don t have to store all coordinates 22 Beyond Linear Regression Polynomial regression Regression with nonlinear features basis functions h Kernel regression Local Weighted regression Regression trees Spatially adaptive regression 23 Polynomial Regression Univariate 1 d case where Weight of each feature Nonlinear features 24 Polynomial Regression http mste illinois edu users exner java f leastsquares 25 Nonlinear Regression Basis coefficients Nonlinear features basis functions Fourier Basis Good representation for oscillatory functions Wavelet Basis Good representation for functions localized at multiple scales 26 Local Regression Basis coefficients Nonlinear features basis functions Globally supported basis functions polynomial fourier will not yield a good representation 27 Local Regression Basis coefficients Nonlinear features basis functions Globally supported basis functions polynomial fourier will not yield a good representation 28 What you should know Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric and Probabilistic Interpretation connection to MLE Regularized Linear Regression connection to MAP Ridge Regression Lasso Polynomial Regression Basis Fourier Wavelet Estimators Next time Kernel Regression Localized Regression Trees 29
View Full Document