BGSU STAT 4440 - Multiple Linear Regression - 1 slide per page (1) (22 pages)

Previewing pages 1, 2, 21, 22 of 22 page document View the full content.
View Full Document

Multiple Linear Regression - 1 slide per page (1)



Previewing pages 1, 2, 21, 22 of actual document.

View the full content.
View Full Document
View Full Document

Multiple Linear Regression - 1 slide per page (1)

32 views


Pages:
22
School:
Bowling Green State University - Main Campus
Course:
Stat 4440 - Data Mining in Business Analytics

Unformatted text preview:

DATA MINING Multiple Linear Regression Dr brahim apar Assistant Professor Learning objective Multiple Linear Regression Explanatory vs predictive modeling with regression Assessing predictive accuracy Selecting a subset of predictors Multiple Linear Regression Multiple Linear Regression Multiple Linear Regression Multiple Linear Regression Multiple Linear Regression Explanatory Modeling Goal Explain relationship between predictors explanatory variables and target Familiar use of regression in data analysis Model Goal Fit the data well and understand the contribution of explanatory variables to the model goodness of fit residual analysis p values Predictive Modeling Goal predict target values in other data where we have predictor values but not target values Classic data mining context Model Goal Optimize predictive accuracy Train model on training data Assess performance on validation hold out data Explaining role of predictors is not primary purpose but useful An Example Toyota Corolla An Example Toyota Corolla Price Sales price in Euros Age 08 04 Age in months as of 8 04 KM Odometer in kilometers Fuel Type Either diesel petrol or compressed natural gas CNG HP Horsepower Met color Is it metallic color 1 yes 0 no Automatic Is it automatic transmission 1 yes 0 no CC Cylinder volume Doors Number of doors Quarterly Tax Road tax amount Weight Weight of the car in kg Dummy Indicator Variables A dummy indicator variable indicates whether a characteristic is present or not D 1 if the observation has the attribute D 0 if the observation does not have it In general if there are m levels use m 1 dummy indicator variables An example Toyota Corolla Fuel Type Diesel Petrol or CNG Fuel Type Diesel Petrol CNG 1 0 0 0 1 0 Result Toyota Corolla Explanatory Modeling Result 88 3 88 1 Age 08 04 120 98 Fuel Type Diesel 2700 58 Predictive Modeling Result Metric MSE Datasets Training Validation 1 612 793 5 567 692 86 72 58 84 Selecting Subsets of Predictors Goal Find parsimonious model the simplest model that performs sufficiently well More robust Higher predictive accuracy Exhaustive Search Partial Search Algorithms Forward Backward Stepwise Exhaustive Search All possible subsets of predictors assessed single pairs triplets etc Computationally intensive Judge by adjusted 2 adj R n 1 1 1 R 2 n p 1 Forward Selection Start with no predictors Add them one by one add the one with largest contribution Stop when the addition is not statistically significant Backward Elimination Start with all predictors Successively eliminate least useful predictors one by one Stop when all remaining predictors have statistically significant contribution Stepwise Like Forward Selection Except at each step also consider dropping nonsignificant predictors Python Implementation The term feature is commonly used instead of predictors or independent variables Scikit learn one of the mostly used machine learning package does not support stepwise selection method Instead of train and validation



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Multiple Linear Regression - 1 slide per page (1) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Linear Regression - 1 slide per page (1) and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?