DOC PREVIEW
CMU CS 10701 - Lecture6

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Linear RegressionAarti SinghMachine Learning 10-701/15-781Sept 27, 20102Discrete to Continuous LabelsSportsScienceNewsClassificationRegressionAnemic cellHealthy cellStock Market PredictionY = ?X = Feb01 X = DocumentY = TopicX = Cell ImageY = DiagnosisRegression Tasks3Weather PredictionY = TempX = 7 pmEstimatingContaminationX = new locationY = sensor reading4Supervised LearningSportsScienceNewsClassification: Regression: Probability of ErrorGoal:Mean Squared ErrorY = ?X = Feb01RegressionOptimal predictor:5Intuition: Signal plus (zero-mean) Noise model(Conditional Mean)RegressionOptimal predictor:Proof Strategy:6Dropping subscriptsfor notational convenience≥ 0RegressionOptimal predictor:7Depends on unknown distributionIntuition: Signal plus (zero-mean) Noise model(Conditional Mean)Regression algorithmsLearning algorithm8Linear RegressionLasso, Ridge regression (Regularized Linear Regression)Nonlinear RegressionKernel RegressionRegression Trees, Splines, Wavelet estimators, …Empirical Risk Minimization (ERM)More later…9Empirical Risk Minimizer:Optimal predictor:Law of LargeNumbersClass of predictorsEmpirical mean• Learning DistributionsMax likelihood = Min -ve log likelihood empirical riskWhat is the class F ?Class of parametric distributionsBernoulli (q)Gaussian (m, s2)10ERM – you saw it before!Linear Regression11- Class of Linear functionsb1 - interceptb2 = slopeUni-variate case:Multi-variate case:1where ,Least Squares EstimatorLeast Squares Estimator12Least Squares Estimator13Normal Equations14If is invertible, When is invertible ? Recall: Full rank matrices are invertible. What is rank of ? What if is not invertible ? Regularization (later)p xp p x1 p x1Geometric Interpretation15Difference in prediction on training set:is the orthogonal projection of onto the linear subspace spanned by the columns of 0Revisiting Gradient Descent16Even when is invertible, might be computationally expensive if A is huge.Initialize: Update:0 if = Stop: when some criterion met e.g. fixed # iterations, or < ε.Gradient Descent since J(b) is convexEffect of step-size α17Large α => Fast convergence but larger residual errorAlso possible oscillationsSmall α => Slow convergence but small residual errorLeast Squares and MLE19Intuition: Signal plus (zero-mean) Noise modelLeast Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model !log likelihoodRegularized Least Squares and MAP20What if is not invertible ? log likelihood log priorPrior belief that β is Gaussian with zero-mean biases solution to “small” βI) Gaussian Prior0Ridge RegressionClosed form: HWRegularized Least Squares and MAP21What if is not invertible ? log likelihood log priorPrior belief that β is Laplace with zero-mean biases solution to “small” βLassoII) Laplace PriorRidge Regression vs Lasso22Ridge Regression: Lasso:Lasso (l1 penalty) results in sparse solutions – vector with more zero coordinatesGood for high-dimensional problems – don’t have to store all coordinates!βs with constant l1 normIdeally l0 penalty, but optimization becomes non-convexβs with constant l0 normβs with constant J(β)(level sets of J(β))βs with constant l2 normβ2β1HOT!Beyond Linear Regression23Polynomial regressionRegression with nonlinear features/basis functionsKernel regression - Local/Weighted regressionRegression trees – Spatially adaptive regressionhPolynomial Regression24Univariate (1-d) case:where ,Nonlinear featuresWeight ofeach feature25http://mste.illinois.edu/users/exner/java.f/leastsquares/Polynomial RegressionNonlinear Regression26Fourier Basis Wavelet BasisNonlinear features/basis functionsBasis coefficientsGood representation for oscillatory functionsGood representation for functionslocalized at multiple scalesLocal Regression27Nonlinear features/basis functionsBasis coefficientsGlobally supportedbasis functions (polynomial, fourier)will not yield a good representationLocal Regression28Nonlinear features/basis functionsBasis coefficientsGlobally supportedbasis functions (polynomial, fourier)will not yield a good representationWhat you should knowLinear RegressionLeast Squares EstimatorNormal EquationsGradient DescentGeometric and Probabilistic Interpretation (connection to MLE)Regularized Linear Regression (connection to MAP)Ridge Regression, LassoPolynomial Regression, Basis (Fourier, Wavelet) EstimatorsNext time- Kernel Regression (Localized)- Regression


View Full Document

CMU CS 10701 - Lecture6

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Lecture6
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture6 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture6 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?