DOC PREVIEW
UI STAT 5400 - One approach to handling missing data

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

122S:166Computing in StatisticsOne approach to handling missingdataArrays and looping in SASLecture 28 Nov. 16, 2009Kate Cowles374 [email protected] datasetACTG 320 (Hammer, Squires, et al., 1997) was a randomized, double-blind, placebo-controlled trial comparing a three-drug regimen (in-dinavir, lamivudine, and either zidovudine or stavudine) with a two-drug regimen (zidovudine and lam ivudine) in HIV-infected adultswith CD4 counts <= 200 and at least 3 m o nths of prior zidovudinetherapy. The 1156 ra ndomi zed patients were stratified according totheir CD4 count (≤ 50 cells/mm3or 50-200 cells/ m m3) at study en-try. T he primar y endpoint was occurrence of an AIDS-defining event(according to the CDC definition) or death. In addition, blood spec-imens were collected at baseline and at weeks 4, 8, 24, and 40 duringfollow-up for analysis of CD4 counts and viral load. The ACTG 320dataset available for purchase from the National Technical Informa-tion Service includes clinical endpoints and CD4 data for all patientsbut viral load data on only 198 patients who were randomly selectedfor a virology substudy.3Example dataset, continued• includes the 198 patients who have RNA d a ta• variables aretrt -- 1/0 treatment group indicatorstrat -- 1/0 stratification group indicatorrna1 -- week 0 RNArna2 -- week 4 RNArna3 -- week 8 RNArna4 -- week 24 RNArna5 -- week 40 RNAcd41 -- similar 5 cd4 valuescd42cd43cd44cd45obst -- time at which clinical endpoint occurred, or lastat which patient was observed and no clinical endpointfail -- 1: cliical endpoint; 0: no clinical endpoint• Note: no patient identifier4What we would like to do• impute values for missing RNA and CD4 data• calculate patient-specific rates of change ofRNA by week and of CD4 by week– how will data file have to be la id out to dothis?5Last-value-carried forward• one (not terribly good ) method of imputingmissing values of lo ngitudinal data• may make sense if values are ”missing at ran-dom”– that is, if the prob a bility that a value ismissing doesn’t depend on the value thatwould have been observed– not likely to be the case for th is type ofdata6options linesize = 72 ;data actg320 ;infile ’/group/ftp/pub/kcowles/datasets/combo1.dat’ firstobs = 2 ;input trt strat rna1 rna2 rna3 rna4 rna5 cd41 cd42 cd43 cd44cd45 obst fail ;pid = _N_ ; * copy observation number into permanent variable ;run ;proc print data = actg320 (obs = 12) ;title ’no arrays used’ ;run ;7no arrays used 118:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 23 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 128Arrays in SAS datasteps• enable referencing a group of SAS variablesby a sin g le name and subscripts• defined in array statementsarray arrayname[number of items] listnames or (list ofvalues)• exist during execution of data step9data actg320 ;set actg320 ;array arna[5] rna1 rna2 rna3 rna4 rna5 ; * define array and make it a copyof existing variables in dataset;array acd4[5] cd41 cd42 cd43 cd44 cd45 ;run ;proc print data = actg320 (obs = 12) ;title ’first way of doing arrays’ ;run ;first way of doing arrays 218:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 2103 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 1211data actg320 ;set actg320 ;array arna[5] rna1 - rna5 ;array acd4[5] cd41 - cd45 ;run ;proc print data = actg320 (obs = 12) ;title ’second way of doing arrays’ ;run ;12second way of doing arrays 318:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 23 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 1213Do loops in SAS data steps• enable coding a task once and having SASexecute it repeatedl y• frameworkdo <...> ;...end ;• in simplest form, do statement includes a”loop-counter” such asdo i = 1 to 5 ;14data actg320lvcf ;set actg320 ;array arna[5] rna1 - rna5 ;array acd4[5] cd41 -


View Full Document

UI STAT 5400 - One approach to handling missing data

Documents in this Course
Load more
Download One approach to handling missing data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view One approach to handling missing data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view One approach to handling missing data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?