Stanford STATS 203 - Introduction to Regression and analysis of Variance - D327377

Home> Schools> Stanford University> Statistics (STATS) > STATS 203> Introduction to Regression and analysis of Variance

DOC PREVIEW

Stanford STATS 203 - Introduction to Regression and analysis of Variance

School name Stanford University

Course Stats 203- Introduction to Regression Models and Analysis of Variance

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Statistics 203Introduction to Regression and Analysis ofVarianceAssignment #3Due Tuesday, March 1Prof. J. TaylorUse R for all calculations. Provide copies of your code in theassignment.Q. 1) (MP, 9.26) Consider the modelyi= θ1− θ2e−θ3xi+ εi, 1 ≤ i ≤ n.This is called the Mitcherlich equation, and it is often used in chemicalengineering. For example, yimay be yield and ximay be reaction time.(a) Is this a nonlinear regression m odel?(b) Graph the expectation function for the parameter values θ1= 0.5, θ2=−0.1 and θ3= 0.1. Discuss the shape of the function.(c) Graph the expectation function for the parameter values θ1= 0.5, θ2=0.1 and θ3= 0.1. Discuss the shape of the function. Compare theshape with the shape in part (b).(d) The filehttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/chlorine.tablecontains the fraction of active chlorine in a chemical product as afunction of time after manufacturing. Plot the data and fit theMitcherlich law to the data, including the fitted curve on the originalplot.(e) Provide approximate confidence intervals for the parameters.Q. 2) Consider the one-sample problem: Yi∼ N(µ, σ2), 1 ≤ i ≤ n with the Yi’si.i.d. The MLE is of c oursebµ =1nnXi=1Yi.1If we constrain |µ|2≤ C and transform the problem to a penalized mini-mization problem we had to solvebµλ= argminµnXi=1(Yi− µ)2+ λµ2.(a) Find a design matrix X such thatbµ = (XtX)−1XtY.(b) Show thatbµλ=bµ1 + λ/n.(c) Find a design matrix X(λ) and a data vector Y (λ) such thatbµλ= (X(λ)tX(λ))−1X(λ)tY (λ).(Hint: Y (λ) will generally have to be of length n + 1 orgreater – i.e. you need to add an observation to the orig-inal Y , as well as an entry to the original X.)(d) Generalize this to the constrained regression problem for a vector ofnon-negative constraints λ = (λ0, . . . , λp−1)bβλ= argminβnXi=1(Yi− β0−p−1Xj=1βjXij)2+p−1Xj=0λjβ2j.(e) Write a function in R that takes two arguments, one the output oflm, the other a vector λ of length p as above and returnsbβλ. (Hint:the function model.matrix will likely be useful).Q. 3) Consider the viral load data from Assignment # 2 found athttp://www-stat/˜jtaylo/courses/stats203/data/vl.table(a) As it seems that the variance of viral load depends on their GSS,use weighted least squares (WLS) with appropriately chosen weightsto refit the model. Has this improved the diagnostic plots for thismodel? (Hint: you will have to use the Pearson residualsri= (Yi−bYi) ∗√wifor the diagnostic plots. The diagnosticplots can also help you choose appropriate weights.(b) Does this significantly affect the results compared to ordinary leastsquares (OLS), in terms of confidence intervals, or p-values? Reportboth the “weighted” and “unweighted” confidence intervals. Whichdo you feel are more accurate?2Q. 4) (NKNW, 14.10) A marketing research firm was engaged by an automi-bile manufacturer to conduct a pilot study to examine the feasibility ofusing logistic regression for ascertaining the likelihood that a family willpurchase a new car during the next year. A random sample of 33 subur-ban families was selected. Data an annual family income and the currentage of the oldest family automobile were obtained. A follow-up interviewconducted 12 months later was used to determine whether the familt ac-tually purchased a new car or did not purchase a new car. The data canbe found athttp://www-stat.stanford.edu/˜jtaylo/courses/stats203/data/car.table(a) Using a logistic regression model, find the MLEs of the parametersβ0, βincome, βag e.(b) State the response function.(c) Find exp(bβincome) and exp(bβag e) and interpret them.(d) What is the estimated probability that a family with annual incomeof 50,000$ and an oldest car of 3 years will purchase a new car nextyear?(e) Plot the standard diagnostic plots – are there any outliers, anythingunusual?(f) Use a partial deviance test to test whether the age of oldest familyautomobile can be dropped from the regression model; use α = 0.15.What is the approximate p-value?(g) Test whether the two-factor interaction effect between annual familyincome and age of oldest automobile should be added to the regres-sion model containing family income and age of oldest automobile asfirst-orer terms; use α = 0.05. What is the approximate p-value?(h) Repest the previous test using Pearson’s X2instead of partial

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

Stanford STATS 203 - Introduction to Regression and analysis of Variance

Sign up for free to view:

Please select your school