DOC PREVIEW
UT SW 388R7 - Logistic Regression – Complete Problems

This preview shows page 1-2-3-4-5-34-35-36-37-68-69-70-71-72 out of 72 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Logistic Regression – Complete ProblemsOutliers and Influential CasesStandardized residualsInfluential casesStrategy for Outliers and Influential CasesSplit-sample Validation80-20 Cross-validationProblem 1Dissecting problem 1 - 1Dissecting problem 1 - 2Dissecting problem 1 - 3Dissecting problem 1 - 4LEVEL OF MEASUREMENT - 1LEVEL OF MEASUREMENT - 2Request simultaneous logistic regressionSelecting the dependent variableSelecting the independent variablesSpecifying the method for including variablesRequesting statistics needed for identifying outliers and influential casesSaving statistics needed for identifying outliers and influential casesCompleting the logistic regression requestNumber of cases including outliers and influential casesClassification accuracy for all casesThe variables for identifying outliers and influential casesOmitting the outliers and influential casesSpecifying the condition to omit outliersThe formula for omitting outliersCompleting the request for the selectionAn omitted outlier and influential caseRunning the logistic regression omitting outliersOpening the save options dialogClearing the request to save diagnostic dataRequesting the outputClassification accuracy after omitting outliersSELECTION OF MODEL FOR INTERPRETATIONRestoring all cases to the data setSelecting all casesRunning the logistic regression again with all cases includedCompleting the request for logistic regressionSample size – ratio of cases to variablesOVERALL RELATIONSHIP BETWEEN INDEPENDENT AND DEPENDENT VARIABLESNUMERICAL PROBLEMSRELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 1RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 2RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 3CLASSIFICATION USING THE LOGISTIC REGRESSION MODEL: by chance accuracy rateCLASSIFICATION USING THE LOGISTIC REGRESSION MODEL: criteria for classification accuracyValidation analysis: set the random number seedSet the random number seedValidation analysis: compute the split variableThe formula for the split variableRepeat the regression with validation sampleActivating the command for subsets of casesUsing "split" as the selection variableSetting the value of split to select casesCompleting the value selectionRequesting output for the validation sampleSPLIT-SAMPLE VALIDATION - 1SPLIT-SAMPLE VALIDATION - 2SPLIT-SAMPLE VALIDATION - 3SPLIT-SAMPLE VALIDATION - 4SPLIT-SAMPLE VALIDATION - 5Answering the question in problem 1 - 1Answering the question in problem 1 - 2Steps in binary logistic regression: level of measurement and initial sample sizeSlide 66Steps in binary logistic regression: picking model for interpretationSteps in logistic regression: overall relationship and numerical problemsSteps in logistic regression: relationships between IV's and DVSteps in logistic regression: classification accuracy and validationSteps in logistic regression: validation supports generalizabilitySteps in logistic regression: adding cautionsSW388R7Data Analysis & Computers IISlide 1Logistic Regression – Complete ProblemsOutliers and Influential CasesSplit-sample ValidationSample ProblemsSW388R7Data Analysis & Computers IISlide 2Outliers and Influential CasesLogistic regression models the relationship between a set of independent variables and the probability that a case is a member of one of the categories of the dependent variable (In SPSS, the modeled category is the one with the higher numeric code.) If the probability is greater than 0.5, the case is classified in the modeled category. If the probability is less than 0.50, the case is classified in the other category. The actual probability of the modeled event for any case is either 1.0 or 0.0, i.e. a case is in the modeled category or it is not.The residual is the difference between the actual probability and the predicted probability for a case. If the predicted probability for a case that actually belonged to the modeled category was 0.80, the residual would be 1.00 – 0.80 = 0.20.SW388R7Data Analysis & Computers IISlide 3Standardized residualsThe residual can be standardized by dividing it by an estimate of its standard deviation. Since the dependent variable is dichotomous or binary, the standard deviation for proportions is used.If a case has a standardized residual larger than 3.0 or smaller than -3.0, it is considered an outlier, and a candidate for exclusion from the analysis.SW388R7Data Analysis & Computers IISlide 4Influential casesCook's distance is computed by SPSS as a measure of the influence which a case has on the solution. This is the same statistic use used as measure of influence in multiple regression.However, the criteria for determining that a case is an influential case in logistic regression differs from the criteria in multiple regression.In logistic regression, a case is identified as influential if its Cook's distance is greater than 1.0. This is based on a statement in Hosmer and Lemeshow, Applied Logistic Regression: "In our experience the influence diagnostic must be larger than 1.0 for an individual covariate pattern to have an effected on the estimated coefficients." page 180.SW388R7Data Analysis & Computers IISlide 5Strategy for Outliers and Influential CasesOur strategy for evaluating the impact of outliers and influential cases on our logistic regression model will parallel what we have done for multiple regression and discriminant analysis:First, we run a baseline model including all casesSecond, we run a model excluding outliers (whose standardized residual is greater than 3.0 or less than 3.0) and influential cases (whose Cook's distance is greater than 1.0).If the model excluding outliers and influential cases has a classification accuracy rate that is better than the baseline model, we will interpret the revised model. If the accuracy rate of the revised model without outliers and influential cases is less than 2% more accurate, we will interpret the baseline model.SW388R7Data Analysis & Computers IISlide 6Split-sample ValidationSPSS does not calculate a leave-one-out cross validation since this would require repeating the entire logistic regression computations for each case in the sample. Moreover, when I computed the split-half validation for all of the logistic regression problems in the homework, everyone failed the validation analysis, primarily for the statistical tests of significance for


View Full Document

UT SW 388R7 - Logistic Regression – Complete Problems

Documents in this Course
Load more
Download Logistic Regression – Complete Problems
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic Regression – Complete Problems and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression – Complete Problems 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?