UI STAT 4510 - Applied Linear Regression Models

Unformatted text preview:

SENIC-(Applied Linear Regression Modelsby John Neter, Michael H Kutner, William Wasserman, Christopher J. Nachtsheim)Data: This data consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an identification number and provides information on 11other variables for a single hospital. The data presented here are for the 1975-76 study period. The 12 variables are: VariablenumberVariable name Description1 Identification number1-1132 Length of stay Average length of all patients in Hospital (in days)3 Age Average age of patients (in years)4 Infection risk Average estimated probability of Acquiring infection in hospital (in percent)5 Routine culturing ratioRatio of number of cultures performedTo number of patients without signs or symptoms of hospital-acquired infection, times 100. 6 Routine chest X-ray ratioRatio of number of X-rays performed to number of patients without signs or symptoms of pneumonia, times 1007 Number of beds Average number of beds in hospital during study period8 Medical school affiliation1=Yes, 2=No9 Region Geographic region, where: 1=NE, 2=NC, 3=S, 4=W10 Average daily censusAverage number of patients in hospital per day during study period11 Number of nurses Average number of full-time equivalent registered and licensed practical nurses during study period (number full time plus one half the number part time)12 Available facilitiesand servicesPercent of 35 potential facilities and services that are provided by the hospitalThe data is in http://www.stat.uiowa.edu/~stramer/S150/SENIC.MTWDo not open the file. Save it first. 11. Two models have been proposed for predicting the average length of patient stay in a hospital (Y). Model I utilizes as predictor variables age, infection risk, and available facilities and services. Model II uses as predictor variables number of beds, infection risk,and available facilities and services. a. For each of the two proposed models, fit first-order regression model with three predictor variables. b. Calculate 2Rfor each model. Is the model clearly preferable in terms of this measure?c. For each model, obtain the residuals. In terms of the residuals, is one model clearly more appropriate than the other? 2. For each geographic region regress infection risk against the predictor variables age, routine culturing ratio, average daily census, and available facilities and services. a. Use first-order regression model with four predictor variables. State the estimated regression functions. b. Are the estimated regression functions similar for the four regressions? Discuss. c. Calculate MSE and 2Rfor each region. Are these measures similar for the four regions? Discussd. Obtain the residuals for each fitted model. State your finding. 3. For predicting the average length of patient stay in a hospital (Y), it has been decided to include age and infection risk as predictor variables. a. For each of the following variables, calculate the coefficient of partial determination given that age and infection risk are included in the model: routine culturing ratio, average daily census, number of nurses, and available facilities and services. b. Using the F test statistic, test whether or not the variable determined to be best in part (a) is helpful in the regression model when age and infection risk are included in the model; use 0.05a =. 24. Length of stay is to be predicted, and the pool of potential predictor variables includesall other variables in the data set except medical school affiliation and region. It is believed that a model with log Yas the response variable and the predictor variables in the first-order terms with no interaction terms will be appropriate. Consider cases 57-113 to constitute the mode-building data set to be used for the following analysis. a. Obtain the correlation matrix of the X variables. Is there evidence of strong linear pair wise association among the predictor variables here? b. Find the best subset according to the R-square-adjusted criterion. 3Consumer Expenditure and Money StockThe following table gives quarterly data from 1952 to 1956 on consumer expenditure (Y) and the stock money (X), both measured in billions of current dollars for the United States. Year Quarter Expenditure Stock1952 1 214.6 159.31952 2 217.7 161.21952 3 219.6 162.81952 4 227.2 164.61953 1 230.9 165.91953 2 233.3 167.91953 3 234.1 168.31953 4 232.3 169.71954 1 233.7 170.51954 2 236.5 171.61954 3 238.7 173.91954 4 243.2 176.11955 1 249.4 178.01955 2 254.3 179.11955 3 260.9 180.21955 4 263.3 181.21956 1 265.6 181.61956 2 268.2 182.51956 3 270.4 183.31956 4 275.6 184.3a. Regress Y on X and summarize the results. b. Compute the Durbin-Watson statistic D. What conclusion regarding the presence of autocorrelation would you draw from D? c. Apply the first iteration of the Cochrane -Orcutt procedure to the data. What is your conclusion?


View Full Document

UI STAT 4510 - Applied Linear Regression Models

Download Applied Linear Regression Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Applied Linear Regression Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Applied Linear Regression Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?