DOC PREVIEW
UCLA STATS 101A - HW5_Fall_2016_lec2

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The North Carolina Data:Background:Numerical Variables:Categorical Variables:we have 22 predictors and one response variable in this data set, some are numerical and some are qualitative.a) Using the North Carolina data set posted on week six of CCLE.We can ask R to create a correlation matrix for the numerical variable only:b) Use Use Birthweight to represent your response variable.c) As the second block, pick three other predictors:d) Present the summary of results for model one and model two;e) Pick a categorical predictor and then add it to your MRL in part (C)HW 5 Fall 2016Dr. Akram AlmohalwasOctober 31, 2016This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, andMS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.When you click theKnitbutton a document will be generated that includes both content as well as theoutput of any embedded R code chunks within the document. You can embed an R code chunk like this:The North Carolina Data:Background:In 2009, the state of North Carolina released to the public a large data set containing information on birthsrecorded in this state. This data set has been of interest to medical researchers who are interested in studyingthe relation between habits and practices of expectant mothers and the birth of their children.The unit of observation is each birth. There are 23 variables recorded in the data set. The following tableindicates whether each is categorical or numerical.Numerical Variables:Bithweight, Weeks, Apgar1, Fage, Mage, Feduc, Meduc, Totpreg, Visits, GainedCategorical Variables:Gender, Premie, LowBirthweight, Marital, Racemom, Racedad, Hispmom, Hispdad, Habit, MomPriorCond,BirthDef, DelivComp, BirthCompBmonth Birth MonthBday Birth DayDOW Day of WeekGender Gender of babyApgar5 Apgar Score at 5 minute (low scores indicate a need for medical attention.)Premie PrematureLowBirthWeight Low birth weightBirthWeight Weight of baby at birth (grams)Gestation Length of gestation (weeks)Fage Father’s age (years)Mage Mom’s age (years)Feduc Father’s Education (years)Meduc Mother’s Education (years)1TotPreg Total Number of Pregnancies (Number of pregnancies including current)Visits Pre-delivery doctor visitsMarital Marital StatusRacemom Race of momRacedad Race of dadHispmom Hispanic momHispdad Hispanic dadGained Weight gained by mom (kilograms)Smokes Mom’s smoking habitsncbirths <- read.delim("~/STAT 101A/Data Sets/births.txt")head(ncbirths)## Gender Premie LowBirthWeight Birthweight Weeks Apgar1 Fage Mage Feduc## 1459 Female No Not Low 129 41 9 41 32 15## 694 Female No Not Low 102 40 9 NA 17 NA## 97 Male No Not Low 117 40 9 22 19 12## 1372 Male No Not Low 96 37 8 40 41 17## 1769 Male No Not Low 134 39 9 28 29 12## 1559 Female No Not Low 110 39 9 44 36 16## Meduc TotPreg Visits Marital Racemom Racedad Hispmom Hispdad## 1459 17 2 19 Married Black Black NotHisp NotHisp## 694 12 1 NA Unmarried Black Unknown NotHisp Unknown## 97 14 1 19 Unmarried Black Black NotHisp NotHisp## 1372 17 5 14 Married White White OtherHisp NotHisp## 1769 16 2 13 Married White White NotHisp NotHisp## 1559 16 5 13 Married Black Black NotHisp NotHisp## Gained Habit MomPriorCond BirthDef DelivComp BirthComp## 1459 42 NonSmoker None None None None## 694 40 NonSmoker None None At Least One None## 97 29 NonSmoker None None None None## 1372 36 NonSmoker None None None None## 1769 42 NonSmoker None None None None## 1559 30 NonSmoker None None None None#dim(ncbirths)we have 22 predictors and one response variable in this data set, some are nu-merical and some are qualitative.a) Using the North Carolina data set posted on week six of CCLE.Create the table of correlation among all the quantitative variables. (Remember you cannot compute Pearson’scoefficient of correlation among qualitative variables). The following command will allow you to exclude thequalitative variables.2We can ask R to create a correlation matrix for the numerical variable only:1) We can round it to show 3 digits only2) The round function is for rounding to the nth digit3) The cor function is to find the correlation coefficients4)lapply is to apply the cor function to the list using the i.numeric as a condition “births” is this caseis the name of my data You may name your correlation matrix as corrmat and then use the librarycorrplot If you don’t have the cooplt package please install it:5) Use the scatterplot.matrix function to study the densities of the numerical predictors# install.packages("corrplot")cormat<- round(cor(ncbirths[,unlist(lapply(ncbirths, is.numeric))],use="pairwise.complete.obs"),3)library(corrplot)## Warning: package 'corrplot' was built under R version 3.2.5# do it!!i) Use the corrplot function to create a visual representation of the correlation matrix.ii) What did you notice?b) Use Use Birthweight to represent your response variable.1)As the first block of predictors, pick three numerical predictors based on the table of correlation; keepthe problem of multicollinearity in mind (Those predictors might be highly correlated with each other).2) Test the relevant assumptions, and create the multiple linear model.3)Create another MLR model based on standardized partial coefficients (see week six lecture notes forinformation on partial standardized residuals for the relevant library that you need to install and therelevant commands).c) As the second block, pick three other predictors:1) Add the other three predictors to the MLR model in part (b (1)).So, model two should now have six predictors (three belonging to block one and three belonging to blocktwo).2)Again, test the relevant assumptions. Make the linear model for model two based on standardizedcoefficients.d) Present the summary of results for model one and model two;The models that you reported in parts b and c above. Is it worth having six predictors instead of three? Howmuch does your Rˆ2 increase? Is the change significant? Interpret the results within context.3e) Pick a categorical predictor and then add it to your MRL in part(C)Does that categorical predictor enhance your Rˆ2?


View Full Document

UCLA STATS 101A - HW5_Fall_2016_lec2

Download HW5_Fall_2016_lec2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HW5_Fall_2016_lec2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HW5_Fall_2016_lec2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?