DOC PREVIEW
UI STAT 5400 - Computing in Statistic

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

122S:166Computing in StatisticsProc tabulateIntro to relational database conceptsLecture 19Nov. 11. 2005Kate Cowles374 SH, [email protected] tabulate• displays descriptive statistics in tabular for-mat• can create variety of tables ranging from sim-ple to complex and highly customized• computes many of same statistics reportedfrom proc means and proc freq• flexibility in classifying values of variables andestablishing a hierarchical relationship betweenvariables• mechanism for labeling and formatting vari-ables and procedure- generated statistics3Example 1fromhttp://ftp.sas.com/techsup/download/sample/base/tabulate/tabformat_classvar.htmlPROC TABULATE Sample--------------------USAGE: User would like to format the CLASS variables and ANALYSISvariables.METHOD: Use a FORMAT statement to format the CLASS variables. Usethe format modifier on the TABLE statement to format theanalysis variables.DATE CREATED: 2-19-974SAMPLE CODE:data sales;input name $ region $ product $ sales;cards;SMITH A CANDY 22000.SMITH A CHIPS 10000.JONES A CANDY 25000.JONES A CHIPS 5000.JOHNSON B CANDY 12000.JOHNSON B CHIPS 15000.ADAMS B CANDY 10000.ADAMS B CHIPS 8000.;proc format; /* Create user-defined format */value $fmtx ’A’=’CARY’’B’=’RALEIGH’;proc tabulate data=sales;/*-----------------------------------------------------*//* Use FORMAT stmt. to assign format to CLASS variable *//* Use *F= to assign a format to an ANALYSIS variable *//*-----------------------------------------------------*/format region $fmtx.;class name region;var sales;table region*name, sales*(sum n)*f=comma8.;run;5SAMPLE OUTPUT:-------------------------------------| | SALES || |-----------------|| | SUM | N ||-----------------+--------+--------||REGION |NAME | | ||--------+--------| | ||CARY |JONES | 30,000| 2|| |--------+--------+--------|| |SMITH | 32,000| 2||--------+--------+--------+--------||RALEIGH |ADAMS | 18,000| 2|| |--------+--------+--------|| |JOHNSON | 27,000| 2|-------------------------------------6Example 2PROC TABULATE Sample--------------------USAGE: User has data derived from a multiple choice questionarie. Theywould like to get frequency counts of the response for eachquestion.METHOD: Manipulate the data so that TABULATE receives one CLASS variablefor reponses instead of four. Also, create a new answer variable.Place both variables on the CLASS statement.DATE CREATED: 2-19-977SAMPLE CODE:data old;input q1 $ q2 $ q3 $ q4 $;cards;A B C DE F A EC B B AB A D EE F A BA A A CF E A E;data new;set old;q=’Question 1’; ans=q1; output;q=’Question 2’; ans=q2; output;q=’Question 3’; ans=q3; output;q=’Question 4’; ans=q4; output;drop q1-q4;run;proc tabulate data=new format=1.0;class q ans;table q=’ ’, ans=’CHOICES’*n=’ ’ / misstext=’0’;run;8SAMPLE OUTPUT:-------------------------------------------| | CHOICES || |-----------|| |A|B|C|D|E|F||-----------------------------+-+-+-+-+-+-||Question 1 |2|1|1|0|2|1||-----------------------------+-+-+-+-+-+-||Question 2 |2|2|0|0|1|2||-----------------------------+-+-+-+-+-+-||Question 3 |4|1|1|1|0|0||-----------------------------+-+-+-+-+-+-||Question 4 |1|1|1|1|3|0|-------------------------------------------9Example 3options ls=72;data timerec;input employee $ week $ phase $ hours;cards;Chen 11SEP89 Analysis 8Chen 11SEP89 Analysis 7Chen 11SEP89 Coding 2.5Chen 11SEP89 Testing 8Chen 11SEP89 Coding 8.5Chen 11SEP89 Testing 6Chen 11SEP89 Coding 4Stewart 11SEP89 Coding 8Stewart 11SEP89 Testing 4.5Stewart 11SEP89 Coding 4.5Stewart 11SEP89 Coding 10.5Stewart 11SEP89 Testing 10;run;proc tabulate data=timerec format=8.1;class employee week phase;var hours;table week, employee all, sum*hours=’ ’*(phase all);table week, employee all, pctsum*hours=’ ’*(phase all);keylabel sum=’Total Hours’pctsum=’Percentage of Hours’;title ’Summary of Project Hours’;run;10Summary of Project Hours 1week 11SEP89------------------------------------------------------| | Total Hours || |-----------------------------------|| | phase | || |--------------------------| || |Analysis| Coding |Testing | All ||----------------+--------+--------+--------+--------||employee | | | | ||----------------| | | | ||Chen | 15.0| 15.0| 14.0| 44.0||----------------+--------+--------+--------+--------||Stewart | .| 23.0| 14.5| 37.5||----------------+--------+--------+--------+--------||All | 15.0| 38.0| 28.5| 81.5|------------------------------------------------------11week 11SEP89------------------------------------------------------| | Percentage of Hours || |-----------------------------------|| | phase | || |--------------------------| || |Analysis| Coding |Testing | All ||----------------+--------+--------+--------+--------||employee | | | | ||----------------| | | | ||Chen | 18.4| 18.4| 17.2| 54.0||----------------+--------+--------+--------+--------||Stewart | .| 28.2| 17.8| 46.0||----------------+--------+--------+--------+--------||All | 18.4| 46.6| 35.0| 100.0|------------------------------------------------------12Introduction to relational database con-cepts• database: a system for storing data• relational database model has become thede-facto standard for the design of databasesboth large and small• storage of data for use in statistical analysisideally should follow this model• today’s lecture deals with two related topics– efficient storage of data (applies to settingup datafiles for use by SAS or any otheranalysis system)– some aspects of relational database soft-ware (such as Microsoft Access)Material drawn in part fromwww.citilink.com/~jgarrick/vbasic/database/rdbms.html and http://www.citilink.com/~jgarrick/vbasic/database/fundamentals.html13What is a relational database?• relational database stores all its data in “ta-bles”• table is a set of rows and columns– set has no predefined sort order for its el-ements– “record” is database terminology for a rowor observation– “field” or “attribute” is database termi-nology for a column or variable14Basic concepts• Primary and Foreign Keys• Queries• Referential Integrity• Normalization15Flat files (how not to store complexdata)• simplest model for a database• a single table which includes fields for eachelement you need to store• you have probably worked with flat file databases,at least in the form of spreadsheets• waste storage space and are problematic tomaintain16Example: customer order entry system• You’re managing the data


View Full Document

UI STAT 5400 - Computing in Statistic

Documents in this Course
Load more
Download Computing in Statistic
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statistic and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statistic 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?