122S:166Computing in StatisticsOne approach to handling missingdataArrays and looping in SASLecture 28 Nov. 16, 2009Kate Cowles374 [email protected] datasetACTG 320 (Hammer, Squires, et al., 1997) was a randomized, double-blind, placebo-controlled trial comparing a three-drug regimen (in-dinavir, lamivudine, and either zidovudine or stavudine) with a two-drug regimen (zidovudine and lam ivudine) in HIV-infected adultswith CD4 counts <= 200 and at least 3 m o nths of prior zidovudinetherapy. The 1156 ra ndomi zed patients were stratified according totheir CD4 count (≤ 50 cells/mm3or 50-200 cells/ m m3) at study en-try. T he primar y endpoint was occurrence of an AIDS-defining event(according to the CDC definition) or death. In addition, blood spec-imens were collected at baseline and at weeks 4, 8, 24, and 40 duringfollow-up for analysis of CD4 counts and viral load. The ACTG 320dataset available for purchase from the National Technical Informa-tion Service includes clinical endpoints and CD4 data for all patientsbut viral load data on only 198 patients who were randomly selectedfor a virology substudy.3Example dataset, continued• includes the 198 patients who have RNA d a ta• variables aretrt -- 1/0 treatment group indicatorstrat -- 1/0 stratification group indicatorrna1 -- week 0 RNArna2 -- week 4 RNArna3 -- week 8 RNArna4 -- week 24 RNArna5 -- week 40 RNAcd41 -- similar 5 cd4 valuescd42cd43cd44cd45obst -- time at which clinical endpoint occurred, or lastat which patient was observed and no clinical endpointfail -- 1: cliical endpoint; 0: no clinical endpoint• Note: no patient identifier4What we would like to do• impute values for missing RNA and CD4 data• calculate patient-specific rates of change ofRNA by week and of CD4 by week– how will data file have to be la id out to dothis?5Last-value-carried forward• one (not terribly good ) method of imputingmissing values of lo ngitudinal data• may make sense if values are ”missing at ran-dom”– that is, if the prob a bility that a value ismissing doesn’t depend on the value thatwould have been observed– not likely to be the case for th is type ofdata6options linesize = 72 ;data actg320 ;infile ’/group/ftp/pub/kcowles/datasets/combo1.dat’ firstobs = 2 ;input trt strat rna1 rna2 rna3 rna4 rna5 cd41 cd42 cd43 cd44cd45 obst fail ;pid = _N_ ; * copy observation number into permanent variable ;run ;proc print data = actg320 (obs = 12) ;title ’no arrays used’ ;run ;7no arrays used 118:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 23 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 128Arrays in SAS datasteps• enable referencing a group of SAS variablesby a sin g le name and subscripts• defined in array statementsarray arrayname[number of items] listnames or (list ofvalues)• exist during execution of data step9data actg320 ;set actg320 ;array arna[5] rna1 rna2 rna3 rna4 rna5 ; * define array and make it a copyof existing variables in dataset;array acd4[5] cd41 cd42 cd43 cd44 cd45 ;run ;proc print data = actg320 (obs = 12) ;title ’first way of doing arrays’ ;run ;first way of doing arrays 218:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 2103 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 1211data actg320 ;set actg320 ;array arna[5] rna1 - rna5 ;array acd4[5] cd41 - cd45 ;run ;proc print data = actg320 (obs = 12) ;title ’second way of doing arrays’ ;run ;12second way of doing arrays 318:26 Sunday, June 29, 2003Obs trt strat rna1 rna2 rna3 rna4 rna51 0 1 4.24790 3.27323 4.05660 3.98290 .2 0 0 5.56951 5.10036 4.96781 5.41695 5.010413 0 0 4.96314 5.48520 4.60326 5.29003 5.967554 0 0 3.91666 3.53046 3.96881 2.03342 .5 0 1 3.36286 2.29667 2.44560 3.55835 4.208506 1 1 4.65297 3.03342 2.71517 . .7 1 1 5.09670 2.87157 3.89856 2.14613 .8 0 0 5.10824 5.14251 5.04452 2.37291 2.260079 1 0 5.52962 3.28825 2.88081 5.36319 .10 1 1 4.97237 2.48001 2.13672 2.23045 2.3032011 1 1 5.71936 3.45894 2.58995 2.25285 .12 1 0 5.85708 3.18949 2.90309 2.02531 2.69984Obs cd41 cd42 cd43 cd44 cd45 obst fail pid1 188.5 152 178 148 . 32.8571 0 12 19.0 43 35 7 . 43.8571 0 23 15.0 15 19 15 . 38.8571 0 34 30.0 30 40 70 . 49.7143 0 45 190.5 205 255 301 243 46.0000 0 56 128.5 166 867 . . 24.8571 0 67 33.0 96 100 159 . 43.8571 0 78 10.0 10 11 72 67 13.0000 1 89 8.0 33 48 7 . 13.0000 1 910 139.0 178 119 305 305 48.7143 0 1011 90.0 243 154 266 . 42.5714 0 1112 20.0 155 172 160 142 49.0000 0 1213Do loops in SAS data steps• enable coding a task once and having SASexecute it repeatedl y• frameworkdo <...> ;...end ;• in simplest form, do statement includes a”loop-counter” such asdo i = 1 to 5 ;14data actg320lvcf ;set actg320 ;array arna[5] rna1 - rna5 ;array acd4[5] cd41 -
View Full Document