Homework 3 Solutions1) Output90-776 Manipulation of Large Data SetsHomework 3 Solutions1) Program/* u:\class\907776\class\HW3P1.SAS does the tasks in Homework 3, problem1*//* in this program, I merge the LAB1 and WEIGHTS data sets from lab 3*//* Rob Greenbaum*//* 4/1/1999*/options pageno=1;/* create library reference for my SAS data sets*/libname landat 'u:\class\90776\lan\data\';libname mydat 'u:\class\90776\data\';libname class 'l:\academic\90776\data\';/*1.1*//* Before I can merge the data sets LAB1 and WEIGHTS, I first need to sort them */PORC SORT data=landat.lab1; BY name;run;PROC SORT data=mydat.weights; BY name;run;/* now create the merged data set*//* I use a KEEP= option when I bring in the lab1 data to only keep the name and sex variables*//* I use a IN= option to only keep the observations that are in WEIGHTS */DATA combine; MERGE landat.lab1 (keep= name sex) mydat.weights (in=a) BY name; if a;run;/* now print out the merged data set */PROC print = combine;run;2) Program/* u:\class\907776\class\HW3P2.SAS does the tasks in Homework 3, problem2*//* This program uses employment and establishment data from EMP data sets*//* Rob Greenbaum*//* 4/1/1999*/options pageno=1;/* create library reference for my SAS data sets */libname class 'l:\academic\90776\data';libname mydat ‘u:\class\90776\data\’;/* create file references for my ASCII data sets */filename d2 'l:\academic\90776\data\text\emp292je.txt';filename d3 'l:\academic\90776\data\text\emp293je.txt';filename d4 'l:\academic\90776\data\text\emp294je.txt';/*2.a. read in each ASCII data set and create year variables*/DATA y1992; INFILE d2; /* tell SAS to grab 'l:\academic\90776\data\text\emp292je.txt' */ INPUT zip sic2 estbtot emp; /* tell SAS what variables are in the dataset */ year=1992; /* crate the year variable *//*2.b. describe the data with contents and means procedures */PROC contents; run;PROC means; run;/* now do the same for 1993*/DATA y1993; INFILE d3; INPUT zip sic2 estbtot emp; year=1993;PROC contents; run;PROC means; run;/* now do the same for 1994*/DATA y1994; INFILE d4; INPUT zip sic2 estbtot emp; year=1994;PROC contents; run;PROC means; run;/* 2.c. Put all 3 data sets together */DATA allthree; SET y1992 y1993 y1994;/* 2.d. create a new data set that has the mean number or establishmentsand employees *//* I use proc SUMMARY to avoid printing out all of the observations. I use NWAY to avoid extra observations. I use CLASS to avoid having to sort the data */PROC summary data=allthree nway; var estbtot emp; class zip sic2 year;output out=mdat mean=m_estb m_emp;run;/* 2.e. FORGET IT!*//* 2.f. Describe the data with contents and means procedures */PROC contents data = mdat;run;PROC means data= mdat;run;/*2.g. create a subset of the data for the 15213 ZIP code */DATA OAKLAND; set mdat; if zip = 15213; /* this keeps only the zip=15213 zips */run;/* 2.h. Print means by industry and year */PROC means mean data= oakland maxdec=3; where sic2 >= 70 and sic2 <= 89; class sic2 year; var m_estb m_emp;run;/*I don’t see much of a pattern across time. Looking at the means by year will help identify any patterns. */PROC means mean data= oakland maxdec=3; where sic2 >= 70 and sic2 <= 89; class year; var m_estb m_emp;run;1) OutputThe SAS System 08:31 Thursday, April 1, 1999 1 OBS NAME SEX WEIGHT HEIGHT 1 Alex M 130 5.8 2 Alicia F 119 5.1 3 Amir M 187 6.0 4 Becky F 155 5.8 5 Lester 220 5.3 6 Trixi 150 5.62) OutputThe SAS System 08:31 Thursday, April 1, 1999 1 CONTENTS PROCEDURE Data Set Name: WORK.Y1992 Observations: 44975 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 9:44 Thursday, April 1, 1999 Observation Length: 40 Last Modified: 9:44 Thursday, April 1, 1999 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information-----Data Set Page Size: 8192 Number of Data Set Pages: 222 File Format: 607 First Data Page: 1 Max Obs per Page: 203 Obs in First Data Page: 180 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 4 EMP Num 8 24 3 ESTBTOT Num 8 16 2 SIC2 Num 8 8 5 YEAR Num 8 32 1 ZIP Num 8 0 The SAS System 08:31 Thursday, April 1, 1999 2 Variable N Mean Std Dev Minimum Maximum ----------------------------------------------------------------------- ZIP 44975 17394.33 1455.89 15000.00 19698.00 SIC2 44975 51.8301056 23.2422174 0 89.0000000 ESTBTOT 44975 6.2569205 12.6794026 1.0000000 370.0000000 EMP 44975 93.7893646 296.8611500 0 14589.74 YEAR 44975 1992.00 0 1992.00 1992.00 ----------------------------------------------------------------------- The SAS System 08:31 Thursday, April 1, 1999 3 CONTENTS PROCEDURE Data Set Name: WORK.Y1993 Observations: 44997 Member Type: DATA Variables: 5 Engine: V612 Indexes:
View Full Document