DOC PREVIEW
UCLA STATS 101A - stats101a hw4 turn in

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Brittany Oliva UID 003933164 Stats 101a HW4 Questions 1-5 see attached work (written by hand and then scanned)Midterm problems cont. Question 6 (problem 1 from book section 3.4) a) Conclusion: The regression coefficient of the predictor variable, Distance is highly statistically significant and the model explains 99.4% of the variability in the Y-variable, Fare. Thus model (1) is a highly effective model for both understanding the effects of Distance on Fare and for predicting future values of Fare given the value of the predictor variable, Distance. My detailed critique: By merely looking at the significance of the regression coefficient of the predictor variable and R2 is not enough to decide whether the model is “a highly effective model” for understanding the effects of Distance on Fare and especially not for predicting future values – this analyst needs to take into account more of our results such as the standardized residuals vs Distance graph which shows a hoop shape. Doing more research we see the QQ plot seems to go off the line and even contains an outlier. Even with more of our output making our model seem “highly effective,” this would never mean a “cause and effect” result where we can predict future values of Fare given the value of Distance. Overall, we can see the assumptions of the linear model are violated here so I would have to disagree with the analyst, and thus would not be able to conclude this is the best model to use.b) The ordinary straight line regression model seems to fit the data well when looking at the Distance vs Fare graph (with some minor concerns), but looking at the standardized residuals vs Distance graph, we see a hoop-like shape which means there can be some bias in this model, which we can imagine can be corrected with a transformation which in turn could better our model. A transformation that could help fix our data here might be to take the log of the data. Question 7 (problem 3 from book section 3.4) PART A a) Using the transformation where I take the log of both X and Y I get the model: AdRevenue = 4.67 + 0.528 * Circulation Justification in choice of model: I tried out different transformations and the log of X and Y gave me a very good R2 of 88.1% compared to the original 89.45% (less than the original but still good) as well as simplifying my data. Looking at the graphs I saw that this transformation made my residuals vs fitted graph have no clear pattern, actually a better scatter than the original data as the original data had a clump of data. My QQ plot had the points closer to the line with fewer outliers (more like less serious outliers compared to the original data). b) 95% prediction interval for the advertising revenue per page for magazines with the following circulations: I. .5 million = (3.947, 4.668) fit = 4.308 II. 20 million = (5.885, 6.631) fit = 6.258 c) Weaknesses in my model: some weaknesses may be the possible outliers that could have a negative impact on the model. Also, the QQplot shows some points off the line only at the bottom and top of the line which means the model may not predict high or low values of x very well. PART B a) AdRevenue = 59.17 + 51.235x -2.505x2 + 0.0522x3 Justification: The R2 value is higher here than the original model (the SLR model) which means it has a lower RSS, all the predictors are significant, and the variance seems more constant here than in the previous model. Considering all of this I would find this model to be the stronger model. b) 95% prediction interval for the advertising page for magazines with the following circulations: III. .5 million = (14.92, 153.41) fit = 84.17 IV. 20 million = (418.18, 580.89) fit = 499.53 c) Weaknesses in my model: again the QQplot shows the points away from the line at the top and bottom suggesting the model may not predict low and high values of x very well. There seem to be outliers as well as leverage points in the graphs of plot(fit3) that can have a negative effect on the model. Also there may be a hint of non-constantvariance when looking at the plot of the residuals, as seen from what looks to me like some pattern. As with many instances, there may not always be a transformation to fix the non-constant variance issue as seen here. But many times it can make a difference for the better. PART C a) Part A model is a SLR predicting Y from X with no transformations and Part B model is a polynomial regression predicting Y from the first three powers of X. Part A model has R2 = 88.1% and Part B has R2 = 93.3%, both having pretty good graphs overall, given that there are few outliers for each. There are no outlying differences that dramaticcaly make one better than the other, but model B does seem to better adhere to the assumptions of linear regression, thus I would conclude that Part B is the better model. b) The model B interval predicts a lower Y value and is narrower as compared to the interval in model A. Even though the estimates are lower the from Model B interval, it’s the better choice because they are more precise/ narrower and they come from a model that is more valid (high R-sq and adheres more closely to the assumptions). Question 8 Part A R output for SuggestedRetailPrice (Y) ~ DealerCost (X) From below -> y = 1.088X – 61.9 > carsdata <- lm(SuggestedRetailPrice ~ DealerCost,data=cars04) > summary(carsdata) Call: lm(formula = SuggestedRetailPrice ~ DealerCost, data = cars04) Residuals: Min 1Q Median 3Q Max -1743.52 -262.59 74.92 265.98 2912.72 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -61.904248 81.801381 -0.757 0.45 DealerCost 1.088841 0.002638 412.768 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 587 on 232 degrees of freedom Multiple R-squared: 0.9986, Adjusted R-squared: 0.9986 F-statistic: 1.704e+05 on 1 and 232 DF, p-value: < 2.2e-16 Plots to check for linearity, normality, and constancy of variance (respectively)Shortcomings: QQ plot has deviations from the line at low and high quintiles which means the model might do a poor job at predicting low and high values of x. Also, there are outliers in the QQ plot, but they are not major so we do not need to remove them so no transformation or anything is needed here, and the residuals vs fitted plot seems to somewhat show a funnel


View Full Document

UCLA STATS 101A - stats101a hw4 turn in

Documents in this Course
Load more
Download stats101a hw4 turn in
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view stats101a hw4 turn in and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view stats101a hw4 turn in 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?