DOC PREVIEW
CMU PPP 90776 - Homework 4 Solutions

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Homework 4 Solutions90-776 Manipulation of Large Data SetsHomework 4 Solutions1) Program/* u:\class\907776\class\HW4P1.SAS does the tasks in Homework 4, problem 1*//* In lab 4, I took the data set l:\academic\90776\data\text\country.txt, created an improved comma delimited ASCII file, then brought it into SASand saved the file as u:\class\90776\data\lab4.sd2. This program cleans upthat data set *//* Rob Greenbaum*//* 4/6/1999*/options pageno=1 ps=150 ls=150;/* create library reference for my SAS data sets */libname class 'l:\academic\90776\data';libname mydat 'u:\class\90776\data\';/* 1a. First, find the means (with proc means and univariat) of the numeric variables */PROC MEANS DATA = MYDAT.LAB4;/* I can see which observations are the outliers by including an ID subcommand in my univariate procedure*/PROC UNIVARIATE DATA=MYDAT.LAB4; VAR YR_IND; ID COUNTRY;/*1b. Let’s set negative yr_ind observations to missing */DATA FIXYR; SET MYDAT.LAB4; IF YR_IND <0 THEN YR_IND=.;/* Let’s see how the mean has changed */PROC MEANS DATA=C4;RUN;/*1c. Change the missing values for the character variables to blanks */DATA FIXCHAR; SET FIXYR;ARRAY ROB[*] _CHARACTER_; DO I=1 TO DIM(ROB); IF ROB[I]="?" OR ROB[I]="-" THEN ROB[I] = " ";END;DROP I;RUN;/* 1d. Print out the data set */PROC PRINT DATA=FIXCHAR;RUN;/* 1e. Check religion */PROC FREQ DATA=FIXCHAR; TABLES RELIGION;RUN;/* For one of the observations, Lutheran was misspelled as Lutherian. That is easy to fix */DATA FIXSPELL;SET FIXCHAR; IF RELIGION = “Lutherian” THEN RELIGION = “Lutheran”;/* MAKE SURE WE FIXED THE PROBLEM */PROC FREQ DATA=FIXSPELL; TABLES RELIGION;RUN;/* 1f. let’s create an ID variable */DATA NEWVAR; SET FIXSPELL;LENGTH CSMALL $2; /* need to reset the length of the new variable*/CSMALL = SUBSTR(COUNTRY,1,2); /* CSMALL is the first 2 chars of country name*/ENDYR = MOD(YR_UN,100); *Get the last 2 digits of the year they entered the UN;YR = PUT(ENDYR, 2.); /* convert ENDYR to CHARACTER*/IF YR = “ .” THEN YR=”00”; /*convert missings to 00*/ID = CSMALL||YR; /* put csmall and yr together*/RUN;/*1g. print out the data */PROC PRINT DATA=NEWVAR ; VAR COUNTRY ID;RUN;1) OutputThe SAS System 16:05 Tuesday, April 6, 1999 1 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- TOTALPOP 93 55462086.65 151888397 27816.00 1149667000 CAPPOP 91 2196361.60 2379700.09 134393.00 10726000.00 YR_IND 92 1796.02 395.9593435 -1000.00 1980.00 YR_UN 89 1954.08 10.6602754 1945.00 1991.00 --------------------------------------------------------------------1)a. There is at least one observation with a negative value for YR_IND. The SAS System 16:05 Tuesday, April 6, 1999 2 Univariate ProcedureVariable=YR_IND Moments Quantiles(Def=5) Extremes N 92 Sum Wgts 92 100% Max 1980 99% 1980 Lowest ID Highest ID Mean 1796.022 Sum 165234 75% Q3 1960 95% 1964 -1000(Ethiopia) 1964(Zambia ) Std Dev 395.9593 Variance 156783.8 50% Med 1922.5 90% 1962 800(Denmark ) 1971(Banglade) Skewness -4.69388 Kurtosis 27.93015 25% Q1 1821.5 10% 1499 836(Sweden ) 1975(Angola ) USS 3.1103E8 CSS 14267326 0% Min -1000 5% 900 843(France ) 1975(Mozambiq) CV 22.04647 Std Mean 41.28162 1% -1000 900(USSR ) 1980(Zimbabwe) T:Mean=0 43.50658 Pr>|T| 0.0001 Range 2980 Num ^= 0 92 Num > 0 91 Q3-Q1 138.5 M(Sign) 45 Pr>=|M| 0.0001 Mode 1960 Sgn Rank 2134 Pr>=|S| 0.0001 Missing Value . Count 1 % Count/Nobs 1.08 The SAS System 16:05 Tuesday, April 6, 1999 3 Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- TOTALPOP 93 55462086.65 151888397 27816.00 1149667000 CAPPOP 91 2196361.60 2379700.09 134393.00 10726000.00 YR_IND 91 1826.75 265.9089064 800.0000000 1980.00 YR_UN 89 1954.08 10.6602754 1945.00 1991.00 --------------------------------------------------------------------1)b. The mean YR_IND has now increased to 1827 – before it was 1796. Removing the one outlier made a large difference. The SAS System 16:05 Tuesday, April 6, 1999 4 OBS COUNTRY TOTALPOP CAPITAL CAPPOP REGION YR_IND YR_UN RELIGION LANG1 LANG2 LANG3 1 Afghanistan 16922000 Kabul 1424400 Asia 1919 1946 Islam Pashto Persian2 Algeria 25888000 Algiers 1507241 Africa 1962 1962 Islam Arabic 3 Angola 10284000 Luanda 1134000 Africa 1975 1976 Portuguese 4 Argentina 32470000 Buenos A 2922829 Latin America 1816 1945 Catholicism Spanish 5 Australia 17337000 Canberra 310000 Oceania 1901 1945 English 6 Austria 7815000 Vienna 1487577 Europe 1918 1955 German 7 Bangladesh 108760000 Dhaka 5731000 Asia 1971 1974 Islam Bengali 8 Belgium 9978000 Brussels 137966 Europe 1830 1945 Dutch French German 9 Bolivia 7528000 La Paz 669400 Latin America 1825 1945 Catholicism Spanish Aymara Quechua 10 Brazil 153322000 Brasilia 1567709 Latin America 1822 1945 Portuguese 11 Bulgaria


View Full Document

CMU PPP 90776 - Homework 4 Solutions

Documents in this Course
Load more
Download Homework 4 Solutions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework 4 Solutions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 4 Solutions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?