DOC PREVIEW
CMU CS 10701 - Lecture6

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Linear Regression Aarti Singh Machine Learning 10 701 15 781 Sept 27 2010 Discrete to Continuous Labels Classification Sports Science News X Document Y Topic Anemic cell Healthy cell X Cell Image Y Diagnosis Regression Stock Market Prediction Y X Feb01 2 Regression Tasks Weather Prediction Y Temp X 7 pm Estimating Contamination X new location Y sensor reading 3 Supervised Learning Goal Sports Science News Y X Feb01 Classification Probability of Error Regression Mean Squared Error 4 Regression Optimal predictor Conditional Mean Intuition Signal plus zero mean Noise model 5 Regression Optimal predictor Proof Strategy Dropping subscripts for notational convenience 0 6 Regression Optimal predictor Conditional Mean Intuition Signal plus zero mean Noise model Depends on unknown distribution 7 Regression algorithms Learning algorithm Linear Regression Lasso Ridge regression Regularized Linear Regression Nonlinear Regression Kernel Regression Regression Trees Splines Wavelet estimators 8 Empirical Risk Minimization ERM Optimal predictor Empirical Risk Minimizer Class of predictors Empirical mean Law of Large Numbers More later 9 ERM you saw it before Learning Distributions Max likelihood Min ve log likelihood empirical risk What is the class F Class of parametric distributions Bernoulli q Gaussian m s2 10 Linear Regression Least Squares Estimator Class of Linear functions b2 slope Uni variate case b1 intercept Multi variate case 1 where 11 Least Squares Estimator 12 Least Squares Estimator 13 Normal Equations p xp p x1 If p x1 is invertible When is invertible Recall Full rank matrices are invertible What is rank of What if is not invertible Regularization later 14 Geometric Interpretation Difference in prediction on training set 0 is the orthogonal projection of onto the linear subspace spanned by the columns of 15 Revisiting Gradient Descent Even when is invertible might be computationally expensive if A is huge Gradient Descent since J b is convex Initialize Update 0 if Stop when some criterion met e g fixed iterations or 16 Effect of step size Large Fast convergence but larger residual error Also possible oscillations Small Slow convergence but small residual error 17 Least Squares and MLE Intuition Signal plus zero mean Noise model log likelihood Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model 19 Regularized Least Squares and MAP What if is not invertible log likelihood log prior I Gaussian Prior 0 Ridge Regression Closed form HW Prior belief that is Gaussian with zero mean biases solution to small 20 Regularized Least Squares and MAP What if is not invertible log likelihood log prior II Laplace Prior Lasso Prior belief that is Laplace with zero mean biases solution to small 21 Ridge Regression vs Lasso Ridge Regression Lasso HOT Ideally l0 penalty but optimization becomes non convex s with constant J level sets of J s with constant l2 norm s with constant l1 norm 2 s with constant l0 norm 1 Lasso l1 penalty results in sparse solutions vector with more zero coordinates Good for high dimensional problems don t have to store all coordinates 22 Beyond Linear Regression Polynomial regression Regression with nonlinear features basis functions h Kernel regression Local Weighted regression Regression trees Spatially adaptive regression 23 Polynomial Regression Univariate 1 d case where Weight of each feature Nonlinear features 24 Polynomial Regression http mste illinois edu users exner java f leastsquares 25 Nonlinear Regression Basis coefficients Nonlinear features basis functions Fourier Basis Good representation for oscillatory functions Wavelet Basis Good representation for functions localized at multiple scales 26 Local Regression Basis coefficients Nonlinear features basis functions Globally supported basis functions polynomial fourier will not yield a good representation 27 Local Regression Basis coefficients Nonlinear features basis functions Globally supported basis functions polynomial fourier will not yield a good representation 28 What you should know Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric and Probabilistic Interpretation connection to MLE Regularized Linear Regression connection to MAP Ridge Regression Lasso Polynomial Regression Basis Fourier Wavelet Estimators Next time Kernel Regression Localized Regression Trees 29


View Full Document

CMU CS 10701 - Lecture6

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Lecture6
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture6 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture6 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?