**Unformatted text preview:**

STATA Assignment 1 PH223 Catherine Callahan 1 How many observational units cases are in this dataset Make sure to include Stata screenshot showing how you got this answer Written statements STATA output needed 10 There are 1155 observational units cases in this dataset 2 How many variables are in this dataset Identify which variables in the dataset are categorical and which are numerical Make sure to include Stata screenshot showing how you got this answer Written statements STATA output needed 10 There are 21 variables in this dataset The categorical variables are smk gendr palevel health educ bmicat race diet sodiumhi milk and kidney The numerical variables are seqn ageyr alco totpa htcm bmi pctfat totchol ldl and sodium 3 For PCTFAT a Provide a thorough set of descriptive statistics describing the center and spread of the data including all the statistics we introduced in class below write the Mean Median Variance Standard Deviation IQR Make sure to also include STATA output Written statements STATA output needed 15 PCTFAT Mean 33 42096 Median 33 3 Variance 79 86476 Standard deviation 8 767255 IQR 13 8 40 6 26 8 4 b Create a histogram to describe these data Make sure to include a title Copy and paste STATA figure STATA output only 10 5 c Based on the above analyses 3a 3b write 2 3 sentences describing the data in terms of center mean median spread Standard deviation IQR and shape modality skew of the distribution Which measures of center and spread would be most appropriate Written statements only 10 In terms of central tendency of percent total fat distribution the mean is 33 4 and the median is approximately 33 3 The shape in the data is slightly skewed to the left which is due to the slight difference between the mean and the median which is 0 1 When the mean is similar to the median the distribution is symmetric which signifies that the mean is the preferred measure of central tendency over the median The smallest value of this data set is 12 5 and the largest value is 53 9 so the spread of the whole data is 41 4 The amount of spread in the central bolus of the data is represented by the difference between the first and third quartile which is 13 8 The standard deviation is around 8 9 Overall the dataset is dispersed values are spread further away from the mean leading to a larger variance and standard deviation 6 Provide descriptive statistics for BMICAT Copy and paste STATA output STATA output only 10 7 To assess how percent body fat varies by BMI category summarize PCTFAT by each strata of BMICAT using a complete set of descriptive statistics and histograms Copy and paste STATA output and figures STATA output only 20 8 Based on your analyses in 5 do there appear to be any differences in the percent body fat between the BMI categories Write 2 3 sentences to summarize your findings Written statements only 15 Percent body fat distribution differs between the three BMI categories The shape and skewness of the distribution represents these differences where category 1 normal underweight has a normal distribution with no real skew as the median and mean only differs by 0 35 The overweight category 2 is bimodal where two values appear most frequently in the data set The obese category 3 is left skewed and the median has a value of 20 5 and is greater than the mean which is 38 85 The normal and underweight category has the smallest mean among the three categories overweight lies in the middle and the obese category has the largest Overall the histograms and data suggest a correlation between higher BMI categories and higher body fat percentage

View Full Document