DOC PREVIEW
BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1)

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Linear RegressionDr. İbrahim ÇaparAssistant ProfessorDATA MININGLearning objectiveMultiple Linear RegressionExplanatory vs. predictive modeling with regressionAssessing predictive accuracySelecting a subset of predictorsMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionExplanatory ModelingGoal: Explain relationship between predictors (explanatory variables) and target Familiar use of regression in data analysisModel Goal: Fit the data well and understand the contribution of explanatory variables to the model“goodness-of-fit”: , residual analysis, p-valuesPredictive ModelingGoal: predict target values in other data where we have predictor values, but not target valuesClassic data mining contextModel Goal: Optimize predictive accuracyTrain model on training dataAssess performance on validation (hold-out) dataExplaining role of predictors is not primary purpose (but useful)An Example: Toyota CorollaAn Example: Toyota CorollaPrice: Sales price in EurosAge_08_04: Age in months as of 8/04KM: Odometer in kilometersFuel_Type: Either diesel, petrol, or compressed natural gas (CNG)HP: HorsepowerMet_color: Is it metallic color (1=yes, 0=no)Automatic: Is it automatic transmission (1=yes, 0=no)CC: Cylinder volumeDoors: Number of doorsQuarterly_Tax: Road tax amount)Weight: Weight of the car in kgDummy (Indicator) VariablesA dummy (indicator) variable indicates whether a characteristic is present or not.D = 1 if the observation has the attribute.D = 0 if the observation does not have it.In general, if there are m levels, use m – 1 dummy (indicator) variables.An example: Toyota CorollaFuel Type: Diesel, Petrol, or CNGFuel Type  Diesel 1 0Petrol 0 1CNG 0 0Result: Toyota CorollaExplanatory Modeling Result:= 88.3%= 88.1%Age_08_04: -120.98Fuel_Type_Diesel: 2700.58Predictive Modeling Result:DatasetsMetric Training ValidationMSE1,612,793 5,567,692 86.72% 58.84%Selecting Subsets of PredictorsGoal: Find parsimonious model (the simplest model that performs sufficiently well)More robustHigher predictive accuracyExhaustive SearchPartial Search AlgorithmsForwardBackwardStepwiseExhaustive SearchAll possible subsets of predictors assessed (single, pairs, triplets, etc.)Computationally intensiveJudge by “adjusted”)1(11122RpnnRadjForward SelectionStart with no predictorsAdd them one by one (add the one with largest contribution)Stop when the addition is not statistically significantBackward EliminationStart with all predictorsSuccessively eliminate least useful predictors one by oneStop when all remaining predictors have statistically significant contributionStepwiseLike Forward SelectionExcept at each step, also consider dropping non-significant predictorsPython ImplementationThe term ‘feature’ is commonly used instead of predictors or independent variablesScikit-learn (one of the mostly used machine learning package) does not support stepwise selection method.Instead of train and validation datasets, Scikit-learn use ‘train and test datasets’. Note that from our point of view, test dataset is identical to validation


View Full Document

BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1)

Download Multiple Linear Regression - 1 slide per page (1)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Linear Regression - 1 slide per page (1) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Linear Regression - 1 slide per page (1) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?