DOC PREVIEW
UI STAT 5400 - Computing in Statistics

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computing in Statistics, 22S:166Fall 2008, Lab 4Getting more meaningful output from SAS; Working with Nominal DataNov. 14, 20081 Controlling print width and character formattingPut these lines at the beginning of every SAS program if you want output to print correctlyon 8-1/2 by 11 inch paper and to have lines print correctly in tables and graphs:options linesize = 75formchar = "|----|+|---+=|-/\<>*";The character string for formchar is included on the course web page under “Datasets” inthe file called “formchar.” You may copy it from there into your program.The linesize option tells SAS how many characters to print on each line of text. Fornormalsize text, a maximum of 80 characters can be p rinted per line. The formchar optiontells SAS wh at characters to use to print the lines dividing cells in certain kind s of tables.If we let SAS use its default setting for formchar, these tables will not print correctly.2 Dataset to downloadPlease download the dataset gulanick.dat from the “Datasets” section of the course webpage after reading its associated .info files.Gulanick (Heart and Lung, 1991) studied patients who were recovering from heart surgery.She was interested in whether different combinations of supervised exercise or teachingwould affect patients’ self-efficacy (or confidence) to perform physical activity.Patients were randomly assigned to one of three groups. Group 1 received teaching, tread-mill exercise testing, an d exercise training three times per week. Group 2 received onlyteaching and exercise testing. Group 3 received only routine care without su pervised exer-cise or teaching. After 4 weeks, each patient was scored on self-efficacy.Self-efficacy was measured on a continuous scale and scores were assumed to be distributednormally in each of the populations of interest.The variables in the dataset are:• score• group (coded 1, 2, 3)Data is taken from Daniel, WW (1999) Biostatistics: A Foundation for Analysisin theHealth Sciences. Wiley.13 Using formats to get SAS to print something other than the values a variable actuallycontainsUsing labels to get SAS to print more descriptive variable namesThe values of the group variable in the dataset are the numbers 1, 2, and 3. If we wantSAS to print out descriptive words instead of the numeric codes, so that tables and graphsare more understandable, we need to run a “proc format” before the data step. The datastep must then refer to the formats d efi ned in the format procedure.proc format ;value grpfmt 1 = ’Teaching and Training’ 2 = ’Teaching’ 3 = ’Neither’ ;run ;Note the format statement in the data s tep below. It tells SAS to apply the format youhave defined here to a particular variable. When you use th e format statement in a datastep, you must put a period at the end of the format name.The label s tatement in a data step causes most of the subsequent procedures to displaythe variable labels instead of the variable names.data gulan ;*infile ’/group/ftp/pub/kcowles/datasets/gulanick.dat’ ;infile ’c:\temp\gulanick.dat’ ;input score group ;format group grpfmt. ;label group = ’Treatment Group’ score = ’Self-Efficacy Score’ ;run ;Now enter and run the following code to see how the formats and labels affect the outp utof the “print” and “freq” procedures.proc print data = gulan (obs = 20);run ;proc print label data = gulan (obs = 20);run ;proc freq data = gulan ;tables group ;run ;24 Using proc tabulate to summ arize the distributions of quantitative variables in dif-ferent groupsproc tabulate data = gulan ; class group ; * class statement identifies qualitative variables ;var score ; * var statement identifies qu antitative variables ; tables group , score * (n meanstd) ; run ;5 Formats for numeric variablesFormats can also be used to group numeric data. Suppose we want to identify the valuesof the score variable as either below the median score, or equal to or above the median.First we need to fin d out the median score.proc means data = gulan median ;var score ;run ;Now add a line to your format procedure and change one line in the data step as follows:proc format ;value grpfmt 1 = ’Teaching and Training’ 2 = ’Teaching’ 3 = ’Neither’ ;value scorefmt low-<117 = ’Below median’117high = ’At or above median’ ;run ;data gulan ;*infile ’/group/ftp/pub/kcowles/datasets/gulanick.dat’ ;infile ’c:\temp\gulanick.dat’ ;input score group ;format group grpfmt. score scorefmt. ;label group = ’Treatment Group’ score = ’Self-Efficacy Score’ ;run ;Now we can treat the score variable like another categorical variable (in this case, binary).For example, we can use proc freq to test the null hypothesis that the population p ro-portion of patients who score below the median in self-efficacy is the same in the threepopulations defined by the three types of treatment.H0: p1= p2= p3proc freq data = gulan ;tables group * score / chisq ;run ;The FREQ ProcedureTable of group by score3group(Treatment Group) score(Self-Efficacy Score)Frequency |Percent |Row Pct |Col Pct |Below me|Median a| Total|dian |nd above|-----------------+--------+--------+Teaching and Tra | 5 | 6 | 11ining | 13.89 | 16.67 | 30.56| 45.45 | 54.55 || 29.41 | 31.58 |-----------------+--------+--------+Teaching | 3 | 9 | 12| 8.33 | 25.00 | 33.33| 25.00 | 75.00 || 17.65 | 47.37 |-----------------+--------+--------+Neither | 9 | 4 | 13| 25.00 | 11.11 | 36.11| 69.23 | 30.77 || 52.94 | 21.05 |-----------------+--------+--------+Total 17 19 3647.22 52.78 100.00Statistics for Table of group by scoreStatistic DF Value Prob------------------------------------------------------Chi-Square 2 4.9181 0.0855Likelihood Ratio Chi-Square 2 5.0929 0.0784Mantel-Haenszel Chi-Square 1 1.5246 0.2169Phi Coefficient 0.3696Contingency Coefficient 0.3467Cramer’s V 0.3696The Chi-Square test gives a test statistic (4.9181) and a p-value (0.0855) for the the nullhypothesis stated above.6 Dataset to downloadPlease download the datasets 175rna_fup.dat and 175status.dat from the course webpage after reading the 175.info files.4I purchased the data from the virology substudy of an AIDS clinical trial (ACTG 175) fromthe National Technical Information Service (NTIS). The data were provided as 9 separatefiles on a floppy disk. I was interested in relating the longitudinal trajectories of patients’RNA concentrations to their clinical disease status.Today we will use the data file that contains only the follow-up values


View Full Document

UI STAT 5400 - Computing in Statistics

Documents in this Course
Load more
Download Computing in Statistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computing in Statistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computing in Statistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?