Unformatted text preview:

Homework 3 Solutions1) Output90-776 Manipulation of Large Data SetsHomework 3 Solutions1) Program/* u:\class\907776\class\HW3P1.SAS does the tasks in Homework 3, problem1*//* in this program, I merge the LAB1 and WEIGHTS data sets from lab 3*//* Rob Greenbaum*//* 4/1/1999*/options pageno=1;/* create library reference for my SAS data sets*/libname landat 'u:\class\90776\lan\data\';libname mydat 'u:\class\90776\data\';libname class 'l:\academic\90776\data\';/*1.1*//* Before I can merge the data sets LAB1 and WEIGHTS, I first need to sort them */PORC SORT data=landat.lab1; BY name;run;PROC SORT data=mydat.weights; BY name;run;/* now create the merged data set*//* I use a KEEP= option when I bring in the lab1 data to only keep the name and sex variables*//* I use a IN= option to only keep the observations that are in WEIGHTS */DATA combine; MERGE landat.lab1 (keep= name sex) mydat.weights (in=a) BY name; if a;run;/* now print out the merged data set */PROC print = combine;run;2) Program/* u:\class\907776\class\HW3P2.SAS does the tasks in Homework 3, problem2*//* This program uses employment and establishment data from EMP data sets*//* Rob Greenbaum*//* 4/1/1999*/options pageno=1;/* create library reference for my SAS data sets */libname class 'l:\academic\90776\data';libname mydat ‘u:\class\90776\data\’;/* create file references for my ASCII data sets */filename d2 'l:\academic\90776\data\text\emp292je.txt';filename d3 'l:\academic\90776\data\text\emp293je.txt';filename d4 'l:\academic\90776\data\text\emp294je.txt';/*2.a. read in each ASCII data set and create year variables*/DATA y1992; INFILE d2; /* tell SAS to grab 'l:\academic\90776\data\text\emp292je.txt' */ INPUT zip sic2 estbtot emp; /* tell SAS what variables are in the dataset */ year=1992; /* crate the year variable *//*2.b. describe the data with contents and means procedures */PROC contents; run;PROC means; run;/* now do the same for 1993*/DATA y1993; INFILE d3; INPUT zip sic2 estbtot emp; year=1993;PROC contents; run;PROC means; run;/* now do the same for 1994*/DATA y1994; INFILE d4; INPUT zip sic2 estbtot emp; year=1994;PROC contents; run;PROC means; run;/* 2.c. Put all 3 data sets together */DATA allthree; SET y1992 y1993 y1994;/* 2.d. create a new data set that has the mean number or establishmentsand employees *//* I use proc SUMMARY to avoid printing out all of the observations. I use NWAY to avoid extra observations. I use CLASS to avoid having to sort the data */PROC summary data=allthree nway; var estbtot emp; class zip sic2 year;output out=mdat mean=m_estb m_emp;run;/* 2.e. FORGET IT!*//* 2.f. Describe the data with contents and means procedures */PROC contents data = mdat;run;PROC means data= mdat;run;/*2.g. create a subset of the data for the 15213 ZIP code */DATA OAKLAND; set mdat; if zip = 15213; /* this keeps only the zip=15213 zips */run;/* 2.h. Print means by industry and year */PROC means mean data= oakland maxdec=3; where sic2 >= 70 and sic2 <= 89; class sic2 year; var m_estb m_emp;run;/*I don’t see much of a pattern across time. Looking at the means by year will help identify any patterns. */PROC means mean data= oakland maxdec=3; where sic2 >= 70 and sic2 <= 89; class year; var m_estb m_emp;run;1) OutputThe SAS System 08:31 Thursday, April 1, 1999 1 OBS NAME SEX WEIGHT HEIGHT 1 Alex M 130 5.8 2 Alicia F 119 5.1 3 Amir M 187 6.0 4 Becky F 155 5.8 5 Lester 220 5.3 6 Trixi 150 5.62) OutputThe SAS System 08:31 Thursday, April 1, 1999 1 CONTENTS PROCEDURE Data Set Name: WORK.Y1992 Observations: 44975 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 9:44 Thursday, April 1, 1999 Observation Length: 40 Last Modified: 9:44 Thursday, April 1, 1999 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information-----Data Set Page Size: 8192 Number of Data Set Pages: 222 File Format: 607 First Data Page: 1 Max Obs per Page: 203 Obs in First Data Page: 180 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 4 EMP Num 8 24 3 ESTBTOT Num 8 16 2 SIC2 Num 8 8 5 YEAR Num 8 32 1 ZIP Num 8 0 The SAS System 08:31 Thursday, April 1, 1999 2 Variable N Mean Std Dev Minimum Maximum ----------------------------------------------------------------------- ZIP 44975 17394.33 1455.89 15000.00 19698.00 SIC2 44975 51.8301056 23.2422174 0 89.0000000 ESTBTOT 44975 6.2569205 12.6794026 1.0000000 370.0000000 EMP 44975 93.7893646 296.8611500 0 14589.74 YEAR 44975 1992.00 0 1992.00 1992.00 ----------------------------------------------------------------------- The SAS System 08:31 Thursday, April 1, 1999 3 CONTENTS PROCEDURE Data Set Name: WORK.Y1993 Observations: 44997 Member Type: DATA Variables: 5 Engine: V612 Indexes:


View Full Document

CMU PPP 90776 - Homework

Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?