Unformatted text preview:

ICS 278: Data Mining Lectures 7, 8: Regression AlgorithmsNotationExample: Multivariate Linear RegressionSlide 4Comments on Multivariate Linear RegressionLimitations of Linear RegressionFinding the k best variablesSearch ProblemEmpirical LearningComplexity versus Goodness of FitSlide 11Slide 12Slide 13Complexity and GeneralizationDefining what “best” meansUsing Validation DataSlide 17Slide 182 different (but related) issues hereNon-linear models, linear in parametersNon-linear (both model and parameters)Optimization of a non-linear score functionOther non-linear modelsTime-series prediction as regressionProbabilistic Interpretation of RegressionOther aspects of regressionClassification as a special case of RegressionPredicting an output between 0 and 1Logistic RegressionGeneralized Linear Models (GLMs) (McCullagh and Nelder, 1989)Tree-Structured RegressionMore on regression treesModel AveragingSlide 34Components of Data Mining AlgorithmsSlide 36Slide 37Slide 38Slide 39Slide 40SoftwareSuggested Reading in TextUseful ReferencesData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineICS 278: Data MiningLectures 7, 8: Regression AlgorithmsPadhraic SmythDepartment of Information and Computer ScienceUniversity of California, IrvineData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineNotation•Variables X, Y….. with values x, y (lower case)•Vectors indicated by X •Components of X indicated by Xj with values xj •“Matrix” data set D with n rows and p columns–jth column contains values for variable Xj –ith row contains a vector of measurements on object i, indicated by x(i)–The jth measurement value for the ith object is xj(i)•Unknown parameter for a model =  –Can also use other Greek letters, like –Vector of parameters = Data Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineExample: Multivariate Linear Regression•Task: predict real-valued Y, given real-valued vector X•Score function, e.g., S() = i [y(i) – f(x(i) ; ) ]2 •Model structure: f(x ; ) = 0 +  j xj•Model parameters =  = {0, 1, …… p }Data Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineS =  e2 = e’ e = (y – X a)’ (y – X a) = y’ y – a’ X’ y – y’ X a + a’ X’ X a = y’ y – 2 a’ X’ y + a’ X’ X aTaking derivative of S with respect to the components of a gives…. dS/da = -2X’y + 2 X’ X aSet this to 0 to find the extremum (minimum) of S as a function of a… - 2X’y + 2 X’ X a = 0  X’Xa = X’ yLetting X’X = C, and X’y = b, we have C a = b, i.e., a set of linear equationsWe could solve this directly by matrix inversion, i.e., a = C-1 b = ( X’ X )-1 X’ y…. but there are more numerically-stable ways to do this (e.g., LU-decomposition)Data Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineComments on Multivariate Linear Regression•prediction is a linear function of the parameters•Score function: quadratic in predictions and parameters Derivative of score is linear in the parameters Leads to a linear algebra optimization problem, i.e., Ca = b•Model structure is simple….–p-1 dimensional hyperplane in p-dimensions–Linear weights => interpretability•Useful as a baseline model –to compare more complex models toData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineLimitations of Linear Regression•True relationship of X and Y might be non-linear–Suggests generalizations to non-linear models•Complexity:–O(p3) - could be a problem for large p•Correlation/Collinearity among the X variables–Can cause numerical instability (C may be ill-conditioned)–Problems in interpretability (identifiability)•Includes all variables in the model…–But what if p=100 and only 3 variables are related to Y?Data Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineFinding the k best variables•Find the subset of k variables that predicts best:–This is a generic problem when p is large(arises with all types of models, not just linear regression)•Now we have models with different complexity..–E.g., p models with a single variable–p(p-1)/2 models with 2 variables, etc…–2p possible models in total–Note that when we add or delete a variable, the optimal weights on the other variables will change in general•k best is not the same as the best k individual variables•What does “best” mean here?–Return to this laterData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineSearch Problem•How can we search over all 2p possible models?–exhaustive search is clearly infeasible–Heuristic search is used to search over model space:•Forward search (greedy)•Backward search (greedy)•Generalizations (add or delete)–Think of operators in search space•Branch and bound techniques–This type of variable selection problem is common to many data mining algorithms•Outer loop that searches over variable combinations•Inner loop that evaluates each combinationData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineEmpirical Learning•Squared Error score (as an example: we could use other scores) S( ) = i [y(i) – f(x(i) ; ) ]2 where S() is defined on the training data D•We are really interested in finding the f(x; ) that best predicts y on future data, i.e., minimizing E [S] = E [y – f(x ; ) ]2 •Empirical learning–Minimize S() on the training data Dtrain–If Dtrain is large and model is simple we are assuming that the best f on training data is also the best predictor f on future test data DtestData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineComplexity versus Goodness of FitxyTraining dataData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineComplexity versus Goodness of FitxyxyToo simple?Training dataData Mining Lectures Lecture 7/8: Regression Padhraic Smyth, UC IrvineComplexity versus Goodness of FitxyxyxyToo simple?Too


View Full Document

UCI ICS 278 - Data Mining Lectures 7, 8

Download Data Mining Lectures 7, 8
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Mining Lectures 7, 8 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Mining Lectures 7, 8 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?