DOC PREVIEW
UIUC STAT 420 - Computer+Language+Exam+-+Fall+2014

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computer Language ExamNovember 21, 2014Problem 1: Use only SAS to complete this problem.Flight delays are an unfortunate reality when traveling by air. The Bureau of Transportation Statis-tics maintains data on all aspects of air travel, including flight delays at departure and arrival. La-Guardia Airport (LGA) is one of the three major airports that serves the New York City metropolitanarea. United Airlines and American Airlines are two of the major airlines that schedule services atLGA. The files UAFlightInformation.txt and AAFlightInformation.txt contain the scheduled flightinformation on all 544 and 3,485 departures of United Airlines and American Airlines, respectively,from LGA during May and June of 2009. The file FlightDelays.txt contains the delay times forall United Airlines and American flights from LGA during this same period. A description of thevariables in the data sets can be found in the file FlightInformationAndDelayDataDescription.txt.Part A: Read in the United Airlines and American Airlines scheduled flight information data andstack them into one data set. Then read in the flight delay times for all of these flights andmerge the delay times onto the scheduled flight information data. Next format the departuretime code according to the key in the variable description file (FlightInformationAndDelay-DataDescription.txt) so that when the data is printed, the departure time range is showninstead of the dummy values. Save this resulting data set as an external SAS data set calledProblem1A.sas7bdat.Part B: Calculate the following descriptive statistics for delay time by flight destination – mean,standard deviation, median, IQR, and maximum. The table should have one row for eachflight destination that contains the corresponding descriptive statistics. Output the resultingdescriptive statistics table to a pdf file named Problem1B.pdf.Part C: Starting with the data set created in Part A, in a new data set, create a new indicatorvariable with a value of 1 if the flight is delayed by 30 minutes or more and a value of 0 otherwise.Using this updated data set, run a logistic regression with the newly created indicator variablefor whether the flight is delayed by 30 minutes or more as the dependent variable and withflight length, destination, and departure time as the predictor variables. Flight length shouldbe used as a continuous variable while destination and departure time should be used ascategorical variables. Save the regression output to a pdf file named Problem1C.pdf.1Problem 2: Use only R to complete this problem.Black spruce (Picea mariana) is a species of slow-growing coniferous tree found across the northernpart of North America. It is commonly found on wet organic soils. In a study conducted in the 1990s,a biologist interested in factors affecting the growth of the black spruce planted its seedlings on sitesin northern Manitoba, Canada. The data for this problem is a portion of the data from this study.Seventy-two black spruce seedlings were planted in three plots under varying conditions (fertilizeror no fertilizer, competition or no competition) and their heights and diameters were measured atplanting and again after 5 years. The file Spruce.csv contains the information about the conditions foreach seedling, the height and diameter measurements for the seedlings at planting, and the majorityof the height and diameter measurements for the seedlings after 5 years. Due to identification andweather issues, some of the seedlings could not be measured on the 5 year measurement day. Thus,a researcher was required to go back the next week for the missing measurements, which were putinto the file SpruceSecondMeasurement.csv. A description of the variables in these data sets can befound in the file SpruceDataDescription.txt.Part A: Read in the spruce and spruce second measurement data. Next use a FOR loop to replacethe missing 5 year height and diameter measurements in the spruce data with the values foundduring the second measurement week. Then calculate the change in height and diameter foreach tree. Create a data frame with the following complete variables (no missing values) – treeid, plot number, group code, competition, fertilizer, height at planting, height after 5 years,diameter at planting, diameter after 5 years, change in height, change in diameter. Oncecreated, write this data frame to a text file named Problem2A.txt.Part B: Create a table (or an object with labels) that includes the mean, standard deviation,median, IQR, minimum, and maximum of the change in height for each group code. This tableshould have four rows (one per group code) and six columns (one per descriptive statistic).Next create a second table of these same descriptive statistics for the change in diameter. Savethese tables as a permanent R file named Problem2B.RData.Part C: Run a linear regression for the change in tree height using competition, fertilizer, theirinteraction, and the initial height as the independent variables. Create an object that containsthe estimated coefficients, their standard errors, their t-values, and their p-values. Save thisobject containing the regression output as a permanent R file named Problem2C.RData.2Submission Instructions:Prepare the items listed below and put all of the items for both problems into a single compressed(.zip) file. Name the .zip file with either your name or your computing ID and upload it tothe assignment on the F14 STAT CLE UVaCollab page. For example, my submission would beCLEGretchen Martinet.zip or CLE gaf9f.zip.General Item to Submit:1. An electronic file containing the typed University Honor Code and your electronic signature.Problem 1 Items to Submit:1. A copy of your SAS Program Editor (either copied and pasted into a program like Word orsaved as a .sas file)2. A clean copy of you SAS Log3. The SAS data set saved in Part A – Problem1A.sas7bdat4. The pdf created in Part B – Problem1B.pdf5. The pdf created in Part C – Problem1C.pdfProblem 2 Items to Submit:1. A copy of your R script (either copied and pasted into a program like Word or saved as a .Rfile)2. A clean copy of you R Console3. The .txt data file saved in Part A – Problem2A.txt4. The permanent R file created in Part B – Problem2B.RData5. The permanent R file created in Part C –


View Full Document

UIUC STAT 420 - Computer+Language+Exam+-+Fall+2014

Documents in this Course
Load more
Download Computer+Language+Exam+-+Fall+2014
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computer+Language+Exam+-+Fall+2014 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computer+Language+Exam+-+Fall+2014 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?