122S:166Computing in StatisticsProc tabulateIntro to relational database conceptsLecture 19Nov. 11. 2005Kate Cowles374 SH, [email protected] tabulate• displays descriptive statistics in tabular for-mat• can create variety of tables ranging from sim-ple to complex and highly customized• computes many of same statistics reportedfrom proc means and proc freq• flexibility in classifying values of variables andestablishing a hierarchical relationship betweenvariables• mechanism for labeling and formatting vari-ables and procedure- generated statistics3Example 1fromhttp://ftp.sas.com/techsup/download/sample/base/tabulate/tabformat_classvar.htmlPROC TABULATE Sample--------------------USAGE: User would like to format the CLASS variables and ANALYSISvariables.METHOD: Use a FORMAT statement to format the CLASS variables. Usethe format modifier on the TABLE statement to format theanalysis variables.DATE CREATED: 2-19-974SAMPLE CODE:data sales;input name $ region $ product $ sales;cards;SMITH A CANDY 22000.SMITH A CHIPS 10000.JONES A CANDY 25000.JONES A CHIPS 5000.JOHNSON B CANDY 12000.JOHNSON B CHIPS 15000.ADAMS B CANDY 10000.ADAMS B CHIPS 8000.;proc format; /* Create user-defined format */value $fmtx ’A’=’CARY’’B’=’RALEIGH’;proc tabulate data=sales;/*-----------------------------------------------------*//* Use FORMAT stmt. to assign format to CLASS variable *//* Use *F= to assign a format to an ANALYSIS variable *//*-----------------------------------------------------*/format region $fmtx.;class name region;var sales;table region*name, sales*(sum n)*f=comma8.;run;5SAMPLE OUTPUT:-------------------------------------| | SALES || |-----------------|| | SUM | N ||-----------------+--------+--------||REGION |NAME | | ||--------+--------| | ||CARY |JONES | 30,000| 2|| |--------+--------+--------|| |SMITH | 32,000| 2||--------+--------+--------+--------||RALEIGH |ADAMS | 18,000| 2|| |--------+--------+--------|| |JOHNSON | 27,000| 2|-------------------------------------6Example 2PROC TABULATE Sample--------------------USAGE: User has data derived from a multiple choice questionarie. Theywould like to get frequency counts of the response for eachquestion.METHOD: Manipulate the data so that TABULATE receives one CLASS variablefor reponses instead of four. Also, create a new answer variable.Place both variables on the CLASS statement.DATE CREATED: 2-19-977SAMPLE CODE:data old;input q1 $ q2 $ q3 $ q4 $;cards;A B C DE F A EC B B AB A D EE F A BA A A CF E A E;data new;set old;q=’Question 1’; ans=q1; output;q=’Question 2’; ans=q2; output;q=’Question 3’; ans=q3; output;q=’Question 4’; ans=q4; output;drop q1-q4;run;proc tabulate data=new format=1.0;class q ans;table q=’ ’, ans=’CHOICES’*n=’ ’ / misstext=’0’;run;8SAMPLE OUTPUT:-------------------------------------------| | CHOICES || |-----------|| |A|B|C|D|E|F||-----------------------------+-+-+-+-+-+-||Question 1 |2|1|1|0|2|1||-----------------------------+-+-+-+-+-+-||Question 2 |2|2|0|0|1|2||-----------------------------+-+-+-+-+-+-||Question 3 |4|1|1|1|0|0||-----------------------------+-+-+-+-+-+-||Question 4 |1|1|1|1|3|0|-------------------------------------------9Example 3options ls=72;data timerec;input employee $ week $ phase $ hours;cards;Chen 11SEP89 Analysis 8Chen 11SEP89 Analysis 7Chen 11SEP89 Coding 2.5Chen 11SEP89 Testing 8Chen 11SEP89 Coding 8.5Chen 11SEP89 Testing 6Chen 11SEP89 Coding 4Stewart 11SEP89 Coding 8Stewart 11SEP89 Testing 4.5Stewart 11SEP89 Coding 4.5Stewart 11SEP89 Coding 10.5Stewart 11SEP89 Testing 10;run;proc tabulate data=timerec format=8.1;class employee week phase;var hours;table week, employee all, sum*hours=’ ’*(phase all);table week, employee all, pctsum*hours=’ ’*(phase all);keylabel sum=’Total Hours’pctsum=’Percentage of Hours’;title ’Summary of Project Hours’;run;10Summary of Project Hours 1week 11SEP89------------------------------------------------------| | Total Hours || |-----------------------------------|| | phase | || |--------------------------| || |Analysis| Coding |Testing | All ||----------------+--------+--------+--------+--------||employee | | | | ||----------------| | | | ||Chen | 15.0| 15.0| 14.0| 44.0||----------------+--------+--------+--------+--------||Stewart | .| 23.0| 14.5| 37.5||----------------+--------+--------+--------+--------||All | 15.0| 38.0| 28.5| 81.5|------------------------------------------------------11week 11SEP89------------------------------------------------------| | Percentage of Hours || |-----------------------------------|| | phase | || |--------------------------| || |Analysis| Coding |Testing | All ||----------------+--------+--------+--------+--------||employee | | | | ||----------------| | | | ||Chen | 18.4| 18.4| 17.2| 54.0||----------------+--------+--------+--------+--------||Stewart | .| 28.2| 17.8| 46.0||----------------+--------+--------+--------+--------||All | 18.4| 46.6| 35.0| 100.0|------------------------------------------------------12Introduction to relational database con-cepts• database: a system for storing data• relational database model has become thede-facto standard for the design of databasesboth large and small• storage of data for use in statistical analysisideally should follow this model• today’s lecture deals with two related topics– efficient storage of data (applies to settingup datafiles for use by SAS or any otheranalysis system)– some aspects of relational database soft-ware (such as Microsoft Access)Material drawn in part fromwww.citilink.com/~jgarrick/vbasic/database/rdbms.html and http://www.citilink.com/~jgarrick/vbasic/database/fundamentals.html13What is a relational database?• relational database stores all its data in “ta-bles”• table is a set of rows and columns– set has no predefined sort order for its el-ements– “record” is database terminology for a rowor observation– “field” or “attribute” is database termi-nology for a column or variable14Basic concepts• Primary and Foreign Keys• Queries• Referential Integrity• Normalization15Flat files (how not to store complexdata)• simplest model for a database• a single table which includes fields for eachelement you need to store• you have probably worked with flat file databases,at least in the form of spreadsheets• waste storage space and are problematic tomaintain16Example: customer order entry system• You’re managing the data
View Full Document