RECURSIVE ALGORITHM FOR SUMS OF SUBSETS Computational complexity of D conditional logistic likelihood Finite population baseline odds increment estimate Solution Recursive algorithm 1 COMPUTATIONALLY CHALLENGING SUMS D Conditional logistic likelihood l D D P D s R s D s Finite population estimator of the baseline odds P n D 1 s R s D 1 s b P n D s R s D s Other sums in the variance expressions 2 THE COMPUTATIONAL PROBLEM The expressions involve P d R d m If the number of cases in the risk set is largish then these sums can have too many terms for direct summing If there are 15 cases in a risk set of size 100 the number of terms is 100 15 2 53 1017 3 SUMS THAT NEED TO BE COMPUTED Let xi be a vector of values associated with unit i and xd X xi i d B0 m r B1 m r x X d r d m X d xd d d r d m X B2 m r x x 2 d d d r d m 4 HISTORY Cox 1972 JRSS paper describing Cox regression Discussion includes Breslow estimator of baseline hazard Howard has discussion point on computation with ties and describes the recursive algorithms but seems to have been forgotten Gail Lubin Rubenstein 1981 rediscover and implement Cox DR JRSS B 34 187 220 1972 Gail et al Biometrika68 703 707 1981 5 SPECIAL CASE BINOMIAL COEFFICIENT When the i 1 0 B0 m r r m 6 THE RECURSIVE ALGORITHM AN EXAMPLE Consider the contribution of a particular element 5 to the sum B0 3 1 2 3 4 5 B0 3 1 2 3 4 5 123 124 134 234 125 135 145 235 245 345 123 124 134 234 5 12 13 14 23 24 34 B0 3 1 2 3 4 5 B0 2 1 2 3 4 Suggests in general that B0 m r j B0 m r j B0 m 1 r 7 THE RECURSIVE ALGORITHM FOR SUMS OF PRODUCTS OF SETS OF SIZE 4 FROM 8 B0 1 r1 B0 1 r2 B0 2 r2 B0 1 r3 B0 2 r3 B0 3 r3 B0 1 r4 B0 2 r4 B0 3 r4 B0 4 r4 B0 1 r5 B0 2 r5 B0 3 r5 B0 4 r5 B0 1 r6 B0 2 r6 B0 3 r6 B0 4 r6 B0 1 r7 B0 2 r7 B0 3 r7 B0 4 r7 B0 1 r8 B0 2 r8 B0 3 r8 B0 4 r8 B0 m r j B0 m r j B0 m 1 r 8 THE RECURSIONS FOR EACH TYPE OF SUM Theorem Let X B0 m r d d r d m X X d r d m i d X X d r d m i d B1 m r xi d 2 B2 m r xi d Then the following recursion formulas hold B0 m r j B0 m r j B0 m 1 r B1 m r j B1 m r j B1 m 1 r xj j B0 m 1 r B2 m r j B2 m r j B2 m 1 r 2 j xj B1 m 1 r T B1 m 1 r xT j j xj B0 m 1 r 9 PROOF OF B1 RECURSION FORMULA B1 m r j B1 m r j B1 m 1 r xj j B0 m 1 r Proof Splitting the outer sum into sets that do not contain j and those that do contain j X X d r j d m i d B1 m r j xi d terms without j terms with j X X d r d m i d xi X X d r d m 1 i d d xi xj d j X X d r d m 1 i d B1 m r j xi X d j xj d r d m 1 B1 m r j B1 m 1 r j xj B0 m 1 r Provides recursive method 10 d COMPUTATIONAL EFFICIENCY OF THE RECURSIVE ALGORITHM The algorithm requires the order of R R D operations compared to the D operations required for direct summing So for a case control set with 15 cases and 15 controls the recursive algorithm requires on the order of 30 15 450 operations to compute the 150 million terms in the sum SAS LOGIST STRATA and PHREG TIES DISCRETE use the recursive algorithm to fit conditional logistic likelihood 11 EXAMPLE COMPUTE BASELINE ODDS FROM RISK SET 6 libname rs book miners coh rs read in data and risk set and covariate information into a single record data riskset6 keep setno ncases ntot z1 z1626 set rs gr miners rs 5yr by setno length ncases ntot 5 array z 1626 z1 z1626 retain i 0 z1 z1626 ntot 1626 ncases 60 if setno ne 6 then delete put setno 6 subject data into the array i i 1 z i cr500 if last setno then output run Output data set has one record with cr500 for all 1626 members 12 COMPUTE BASELINE ODDS PRELIMINARIES data baselineodds keep setno ncases ntot bhat b0 bfd b0 bfd 1 baseline set riskset6 retain bhat 1 45 input arrays z covariate values will expand to multiple covariates later array zt 1 1626 z1 z1626 temporary arrays phi odds ratio calculated from beta and z array phi 1626 temporary recursion arrays set size to at least the max number of cases array b0a 1626 temporary array b0b 1 1626 temporary compute phi vector do i 1 to ntot phi i exp zt 1 i bhat end 13 DO THE RECURSION initialize values b0b 1 0 b0b 0 1 initialize m 1 level b0a 1 phi 1 do i 2 to ncases b0a i 0 end do the recursion do k 2 to ntot b0b 1 b0a 1 phi k do l 2 to min k ncases b0b l b0a l b0a l 1 phi k end re initialize the m 1 step array do l 1 to min k ncases b0a l b0b l end end recursion section 14 Compute baseline log odds b0 bfd b0b ncases b0 bfd 1 b0b ncases 1 baselineodds b0b ncases 1 b0b ncases ntot ncases 1 ntot ncases run 15 Baseline odds estimate from CPUM 5 year 55 60 risk set ncases ntot bhat b0 bfd b0 bfd 1 baselineodds 60 1626 1 45 1 9532E134 2 9979E132 0 015358 16
View Full Document
Unlocking...