BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1) - D3428728

Home> Schools> Bowling Green State University - Main Campus> Applied Statistics (STAT) > STAT 4440> Multiple Linear Regression - 1 slide per page (1)

DOC PREVIEW

BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1)

School name Bowling Green State University - Main Campus

Course Stat 4440- Data Mining in Business Analytics

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Multiple Linear RegressionDr. İbrahim ÇaparAssistant ProfessorDATA MININGLearning objectiveMultiple Linear RegressionExplanatory vs. predictive modeling with regressionAssessing predictive accuracySelecting a subset of predictorsMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionMultiple Linear RegressionExplanatory ModelingGoal: Explain relationship between predictors (explanatory variables) and target Familiar use of regression in data analysisModel Goal: Fit the data well and understand the contribution of explanatory variables to the model“goodness-of-fit”: , residual analysis, p-valuesPredictive ModelingGoal: predict target values in other data where we have predictor values, but not target valuesClassic data mining contextModel Goal: Optimize predictive accuracyTrain model on training dataAssess performance on validation (hold-out) dataExplaining role of predictors is not primary purpose (but useful)An Example: Toyota CorollaAn Example: Toyota CorollaPrice: Sales price in EurosAge_08_04: Age in months as of 8/04KM: Odometer in kilometersFuel_Type: Either diesel, petrol, or compressed natural gas (CNG)HP: HorsepowerMet_color: Is it metallic color (1=yes, 0=no)Automatic: Is it automatic transmission (1=yes, 0=no)CC: Cylinder volumeDoors: Number of doorsQuarterly_Tax: Road tax amount)Weight: Weight of the car in kgDummy (Indicator) VariablesA dummy (indicator) variable indicates whether a characteristic is present or not.D = 1 if the observation has the attribute.D = 0 if the observation does not have it.In general, if there are m levels, use m – 1 dummy (indicator) variables.An example: Toyota CorollaFuel Type: Diesel, Petrol, or CNGFuel Type  Diesel 1 0Petrol 0 1CNG 0 0Result: Toyota CorollaExplanatory Modeling Result:= 88.3%= 88.1%Age_08_04: -120.98Fuel_Type_Diesel: 2700.58Predictive Modeling Result:DatasetsMetric Training ValidationMSE1,612,793 5,567,692 86.72% 58.84%Selecting Subsets of PredictorsGoal: Find parsimonious model (the simplest model that performs sufficiently well)More robustHigher predictive accuracyExhaustive SearchPartial Search AlgorithmsForwardBackwardStepwiseExhaustive SearchAll possible subsets of predictors assessed (single, pairs, triplets, etc.)Computationally intensiveJudge by “adjusted”)1(11122RpnnRadjForward SelectionStart with no predictorsAdd them one by one (add the one with largest contribution)Stop when the addition is not statistically significantBackward EliminationStart with all predictorsSuccessively eliminate least useful predictors one by oneStop when all remaining predictors have statistically significant contributionStepwiseLike Forward SelectionExcept at each step, also consider dropping non-significant predictorsPython ImplementationThe term ‘feature’ is commonly used instead of predictors or independent variablesScikit-learn (one of the mostly used machine learning package) does not support stepwise selection method.Instead of train and validation datasets, Scikit-learn use ‘train and test datasets’. Note that from our point of view, test dataset is identical to validation

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1)

Sign up for free to view:

Please select your school