For a very short introduction on survival data, please refer to the vignette on univariate analysis. Multivariate analysis, using the technique of Cox regression, is applied when there are multiple, potentially interacting covariates. While the log-rank test and Kaplan-Meier plots require categorical variables, Cox regression works with continuous variables. (Of course, you can use it with categorical variables as well, but this has implications which are described below.)
For the purpose of this vignette, we use the lung
dataset from the survival
package:
inst | time | status | age | sex | ph.ecog | ph.karno | pat.karno | meal.cal | wt.loss |
---|---|---|---|---|---|---|---|---|---|
3 | 306 | 2 | 74 | 1 | 1 | 90 | 100 | 1175 | NA |
3 | 455 | 2 | 68 | 1 | 0 | 90 | 90 | 1225 | 15 |
3 | 1010 | 1 | 56 | 1 | 0 | 90 | 90 | NA | 15 |
5 | 210 | 2 | 57 | 1 | 1 | 90 | 60 | 1150 | 11 |
1 | 883 | 2 | 60 | 1 | 0 | 100 | 90 | NA | 0 |
12 | 1022 | 1 | 74 | 1 | 1 | 50 | 80 | 513 | 0 |
7 | 310 | 2 | 68 | 2 | 2 | 70 | 60 | 384 | 10 |
11 | 361 | 2 | 71 | 2 | 2 | 60 | 80 | 538 | 1 |
1 | 218 | 2 | 53 | 1 | 1 | 70 | 80 | 825 | 16 |
7 | 166 | 2 | 61 | 1 | 2 | 70 | 70 | 271 | 34 |
We load the tidyverse, for some tools the tidytidbits package and finally and the survivalAnalysis package:
library(tidyverse)
library(tidytidbits)
library(survivalAnalysis)
Possibly interesting covariates are patient age, sex, ECOG status and the amount of weight loss. Sex is encoded as a numerical vector, but is in fact categorical. We need to make it a factor. ECOG status is at least ordinally scaled, so we leave it numerical for now. Following the two-step philosophy of survivalAnalysis
, we first perform the analysis with analyse_multivariate
and store the result object. We provide readable labels for the covariates to allow easy interpretation. There is a print()
implementation which prints essential information for our result.
<- c(age="Age at Dx",
covariate_names sex="Sex",
ph.ecog="ECOG Status",
wt.loss="Weight Loss (6 mo.)",
`sex:female`="Female",
`ph.ecog:0`="ECOG 0",
`ph.ecog:1`="ECOG 1",
`ph.ecog:2`="ECOG 2",
`ph.ecog:3`="ECOG 3")
::lung %>%
survivalmutate(sex=rename_factor(sex, `1` = "male", `2` = "female")) %>%
analyse_multivariate(vars(time, status),
covariates = vars(age, sex, ph.ecog, wt.loss),
covariate_name_dict = covariate_names) ->
resultprint(result)
#> $colname_unmangle_dict
#> time status age sex ph.ecog wt.loss
#> "time" "status" "age" "sex" "ph.ecog" "wt.loss"
#>
#> $data
#> time status age sex ph.ecog wt.loss
#> 1 306 2 74 male 1 NA
#> 2 455 2 68 male 0 15
#> 3 1010 1 56 male 0 15
#> 4 210 2 57 male 1 11
#> 5 883 2 60 male 0 0
#> 6 1022 1 74 male 1 0
#> 7 310 2 68 female 2 10
#> 8 361 2 71 female 2 1
#> 9 218 2 53 male 1 16
#> 10 166 2 61 male 2 34
#> 11 170 2 57 male 1 27
#> 12 654 2 68 female 2 23
#> 13 728 2 68 female 1 5
#> 14 71 2 60 male NA 32
#> 15 567 2 57 male 1 60
#> 16 144 2 67 male 1 15
#> 17 613 2 70 male 1 -5
#> 18 707 2 63 male 2 22
#> 19 61 2 56 female 2 10
#> 20 88 2 57 male 1 NA
#> 21 301 2 67 male 1 17
#> 22 81 2 49 female 0 -8
#> 23 624 2 50 male 1 16
#> 24 371 2 58 male 0 13
#> 25 394 2 72 male 0 0
#> 26 520 2 70 female 1 6
#> 27 574 2 60 male 0 -13
#> 28 118 2 70 male 3 20
#> 29 390 2 53 male 1 -7
#> 30 12 2 74 male 2 20
#> 31 473 2 69 female 1 -1
#> 32 26 2 73 male 2 20
#> 33 533 2 48 male 2 -11
#> 34 107 2 60 female 2 -15
#> 35 53 2 61 male 2 10
#> 36 122 2 62 female 2 NA
#> 37 814 2 65 male 2 28
#> 38 965 1 66 female 1 4
#> 39 93 2 74 male 2 24
#> 40 731 2 64 female 1 15
#> 41 460 2 70 male 1 10
#> 42 153 2 73 female 2 11
#> 43 433 2 59 female 0 27
#> 44 145 2 60 female 2 NA
#> 45 583 2 68 male 1 7
#> 46 95 2 76 female 2 -24
#> 47 303 2 74 male 0 30
#> 48 519 2 63 male 1 10
#> 49 643 2 74 male 0 2
#> 50 765 2 50 female 1 4
#> 51 735 2 72 female 1 9
#> 52 189 2 63 male 0 0
#> 53 53 2 68 male 0 0
#> 54 246 2 58 male 0 7
#> 55 689 2 59 male 1 15
#> 56 65 2 62 male 0 NA
#> 57 5 2 65 female 0 5
#> 58 132 2 57 male 2 18
#> 59 687 2 58 female 1 10
#> 60 345 2 64 female 1 -3
#> 61 444 2 75 female 2 8
#> 62 223 2 48 male 1 68
#> 63 175 2 73 male 1 NA
#> 64 60 2 65 female 1 0
#> 65 163 2 69 male 1 0
#> 66 65 2 68 male 2 8
#> 67 208 2 67 female 2 2
#> 68 821 1 64 female 0 3
#> 69 428 2 68 male 0 0
#> 70 230 2 67 male 1 23
#> 71 840 1 63 male 0 -1
#> 72 305 2 48 female 1 29
#> 73 11 2 74 male 2 0
#> 74 132 2 40 male 1 3
#> 75 226 2 53 female 1 3
#> 76 426 2 71 female 1 19
#> 77 705 2 51 female 0 0
#> 78 363 2 56 female 1 -2
#> 79 11 2 81 male 0 15
#> 80 176 2 73 male 0 30
#> 81 791 2 59 male 0 5
#> 82 95 2 55 male 1 15
#> 83 196 1 42 male 1 8
#> 84 167 2 44 female 1 -1
#> 85 806 1 44 male 1 1
#> 86 284 2 71 male 1 14
#> 87 641 2 62 female 1 1
#> 88 147 2 61 male 0 4
#> 89 740 1 44 female 1 39
#> 90 163 2 72 male 2 2
#> 91 655 2 63 male 0 -1
#> 92 239 2 70 male 1 23
#> 93 88 2 66 male 1 8
#> 94 245 2 57 female 1 14
#> 95 588 1 69 female 0 13
#> 96 30 2 72 male 2 7
#> 97 179 2 69 male 1 25
#> 98 310 2 71 male 1 0
#> 99 477 2 64 male 1 0
#> 100 166 2 70 female 0 10
#> 101 559 1 58 female 0 15
#> 102 450 2 69 female 1 3
#> 103 364 2 56 male 1 4
#> 104 107 2 63 male 1 0
#> 105 177 2 59 male 2 32
#> 106 156 2 66 male 1 14
#> 107 529 1 54 female 1 -3
#> 108 11 2 67 male 1 NA
#> 109 429 2 55 male 1 5
#> 110 351 2 75 female 2 11
#> 111 15 2 69 male 0 10
#> 112 181 2 44 male 1 5
#> 113 283 2 80 male 1 6
#> 114 201 2 75 female 0 1
#> 115 524 2 54 female 1 15
#> 116 13 2 76 male 2 20
#> 117 212 2 49 male 2 20
#> 118 524 2 68 male 2 30
#> 119 288 2 66 male 2 24
#> 120 363 2 80 male 1 11
#> 121 442 2 75 male 0 0
#> 122 199 2 60 female 2 10
#> 123 550 2 69 female 1 0
#> 124 54 2 72 male 2 -3
#> 125 558 2 70 male 0 17
#> 126 207 2 66 male 1 20
#> 127 92 2 50 male 1 13
#> 128 60 2 64 male 1 0
#> 129 551 1 77 female 2 28
#> 130 543 1 48 female 0 4
#> 131 293 2 59 female 1 52
#> 132 202 2 53 male 1 20
#> 133 353 2 47 male 0 5
#> 134 511 1 55 female 1 49
#> 135 267 2 67 male 0 6
#> 136 511 1 74 female 2 37
#> 137 371 2 58 female 1 0
#> 138 387 2 56 male 2 NA
#> 139 457 2 54 male 1 -5
#> 140 337 2 56 male 0 15
#> 141 201 2 73 female 2 -16
#> 142 404 1 74 male 1 38
#> 143 222 2 76 male 2 8
#> 144 62 2 65 female 1 0
#> 145 458 1 57 male 1 30
#> 146 356 1 53 female 1 2
#> 147 353 2 71 male 0 2
#> 148 163 2 54 male 1 13
#> 149 31 2 82 male 0 27
#> 150 340 2 59 female 0 0
#> 151 229 2 70 male 1 -2
#> 152 444 1 60 male 0 7
#> 153 315 1 62 female 0 0
#> 154 182 2 53 female 1 4
#> 155 156 2 55 male 2 10
#> 156 329 2 69 male 2 20
#> 157 364 1 68 female 1 7
#> 158 291 2 62 male 2 27
#> 159 179 2 63 male 1 -2
#> 160 376 1 56 female 1 17
#> 161 384 1 62 female 0 8
#> 162 268 2 44 female 1 2
#> 163 292 1 69 male 2 36
#> 164 142 2 63 male 1 2
#> 165 413 1 64 male 1 16
#> 166 266 1 57 female 0 3
#> 167 194 2 60 female 1 33
#> 168 320 2 46 male 0 4
#> 169 181 2 61 male 1 0
#> 170 285 2 65 male 0 0
#> 171 301 1 61 male 1 2
#> 172 348 2 58 female 0 10
#> 173 197 2 56 male 1 37
#> 174 382 1 43 female 0 6
#> 175 303 1 53 male 1 12
#> 176 296 1 59 female 1 0
#> 177 180 2 56 male 2 -2
#> 178 186 2 55 female 1 NA
#> 179 145 2 53 female 1 13
#> 180 269 1 74 female 0 0
#> 181 300 1 60 male 0 5
#> 182 284 1 39 male 0 -5
#> 183 350 2 66 female 0 NA
#> 184 272 1 65 female 1 -1
#> 185 292 1 51 female 0 0
#> 186 332 1 45 female 0 5
#> 187 285 2 72 female 2 20
#> 188 259 1 58 male 0 8
#> 189 110 2 64 male 1 12
#> 190 286 2 53 male 0 8
#> 191 270 2 72 male 1 14
#> 192 81 2 52 male 2 NA
#> 193 131 2 50 male 1 NA
#> 194 225 1 64 male 1 33
#> 195 269 2 71 male 1 -2
#> 196 225 1 70 male 0 6
#> 197 243 1 63 female 1 0
#> 198 279 1 64 male 1 4
#> 199 276 1 52 female 0 0
#> 200 135 2 60 male 1 0
#> 201 79 2 64 female 1 37
#> 202 59 2 73 male 1 5
#> 203 240 1 63 female 0 0
#> 204 202 1 50 female 0 1
#> 205 235 1 63 female 0 0
#> 206 105 2 62 male 2 NA
#> 207 224 1 55 female 0 23
#> 208 239 2 50 female 2 -3
#> 209 237 1 69 male 1 NA
#> 210 173 1 59 female 1 10
#> 211 252 1 60 female 0 -2
#> 212 221 1 67 male 1 23
#> 213 185 1 69 male 1 0
#> 214 92 1 64 female 2 31
#> 215 13 2 65 male 1 10
#> 216 222 1 65 male 1 18
#> 217 192 1 41 female 1 -10
#> 218 183 2 76 male 2 7
#> 219 211 1 70 female 2 3
#> 220 175 1 57 female 0 11
#> 221 197 1 67 male 1 2
#> 222 203 1 71 female 1 0
#> 223 116 2 76 male 1 0
#> 224 188 1 77 male 1 3
#> 225 191 1 39 male 0 -5
#> 226 105 1 75 female 2 5
#> 227 174 1 66 male 1 1
#> 228 177 1 58 female 1 0
#>
#> $data_attributes
#> $data_attributes$survival_ids
#> time status
#> "time" "status"
#>
#> $data_attributes$factor_ids
#> age sex ph.ecog wt.loss
#> "age" "sex" "ph.ecog" "wt.loss"
#>
#> $data_attributes$strata_ids
#> character(0)
#>
#>
#> $coxph
#> Call:
#> coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + wt.loss,
#> data = data)
#>
#> coef exp(coef) se(coef) z p
#> age 0.013369 1.013459 0.009628 1.389 0.164951
#> sexfemale -0.590775 0.553898 0.175339 -3.369 0.000754
#> ph.ecog 0.515111 1.673824 0.125988 4.089 4.34e-05
#> wt.loss -0.009006 0.991034 0.006658 -1.353 0.176135
#>
#> Likelihood ratio test=31.02 on 4 df, p=3.032e-06
#> n= 213, number of events= 151
#> (15 observations deleted due to missingness)
#>
#> $summary
#> Call:
#> coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + wt.loss,
#> data = data)
#>
#> n= 213, number of events= 151
#> (15 observations deleted due to missingness)
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> age 0.013369 1.013459 0.009628 1.389 0.164951
#> sexfemale -0.590775 0.553898 0.175339 -3.369 0.000754 ***
#> ph.ecog 0.515111 1.673824 0.125988 4.089 4.34e-05 ***
#> wt.loss -0.009006 0.991034 0.006658 -1.353 0.176135
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> age 1.0135 0.9867 0.9945 1.0328
#> sexfemale 0.5539 1.8054 0.3928 0.7811
#> ph.ecog 1.6738 0.5974 1.3076 2.1427
#> wt.loss 0.9910 1.0090 0.9782 1.0041
#>
#> Concordance= 0.647 (se = 0.026 )
#> Likelihood ratio test= 31.02 on 4 df, p=3e-06
#> Wald test = 29.94 on 4 df, p=5e-06
#> Score (logrank) test = 30.65 on 4 df, p=4e-06
#>
#>
#> $summaryAsFrame
#> factor.id factor.name factor.value HR Lower_CI Upper_CI
#> 1 sex:female Sex female 0.5538981 0.3928084 0.7810504
#> 2 wt.loss Weight Loss (6 mo.) <continuous> 0.9910344 0.9781867 1.0040508
#> 3 age Age at Dx <continuous> 1.0134589 0.9945143 1.0327643
#> 4 ph.ecog ECOG Status <continuous> 1.6738244 1.3075807 2.1426504
#> Inv_HR Inv_Lower_CI Inv_Upper_CI p
#> 1 1.8053862 1.2803272 2.5457706 7.535384e-04
#> 2 1.0090467 0.9959656 1.0222997 1.761353e-01
#> 3 0.9867199 0.9682752 1.0055160 1.649509e-01
#> 4 0.5974342 0.4667117 0.7647712 4.340524e-05
#>
#> $overall
#> $overall$n
#> [1] 213
#>
#> $overall$covariates
#> [1] "age" "sex" "ph.ecog" "wt.loss"
#>
#> $overall$`Likelihood ratio test p`
#> [1] 3.031988e-06
#>
#> $overall$`Wald test p`
#> [1] 5.029969e-06
#>
#> $overall$`Score (logrank) test p`
#> [1] 3.61563e-06
#>
#>
#> $formula
#> [1] "Surv(time, status) ~ age + sex + ph.ecog + wt.loss"
#>
#> attr(,"class")
#> [1] "SurvivalAnalysisResult" "SurvivalAnalysisMultivariateResult"
#> [3] "list"
In the example above, you may have noted that the hazard ratio is given for women compared to men (women have a better outcome, HR 0.55). What is the hazard ratio for men compared to women? In the case of a binary variable, this is simply the inverted HR, which is also always provided (1.81). Things get more complicated with three or more categories. The rule is: You must choose one level of the factor as the reference level with defined hazard ratio 1.0. Then, for k levels, there will be k-1 pseudo variables created which represent the hazard ratio of this level compared to subjects in the reference level. (For the comparison of one level vs. all remaining subjects see the paragraph on one-hot analysis further down.)
As an example, we consider the two covariates which were significant above, sex and ECOG, and regard the ECOG status as a categorical variable with four levels. As reference level, we choose ECOG=0 with the parameter reference_level_dict
.
::lung %>%
survivalmutate(sex=rename_factor(sex, `1` = "male", `2` = "female"),
ph.ecog = as.factor(ph.ecog)) %>%
analyse_multivariate(vars(time, status),
covariates = vars(sex, ph.ecog),
covariate_name_dict=covariate_names,
reference_level_dict=c(ph.ecog="0"))
#> $colname_unmangle_dict
#> time status sex ph.ecog
#> "time" "status" "sex" "ph.ecog"
#>
#> $data
#> time status sex ph.ecog
#> 1 306 2 male 1
#> 2 455 2 male 0
#> 3 1010 1 male 0
#> 4 210 2 male 1
#> 5 883 2 male 0
#> 6 1022 1 male 1
#> 7 310 2 female 2
#> 8 361 2 female 2
#> 9 218 2 male 1
#> 10 166 2 male 2
#> 11 170 2 male 1
#> 12 654 2 female 2
#> 13 728 2 female 1
#> 14 71 2 male <NA>
#> 15 567 2 male 1
#> 16 144 2 male 1
#> 17 613 2 male 1
#> 18 707 2 male 2
#> 19 61 2 female 2
#> 20 88 2 male 1
#> 21 301 2 male 1
#> 22 81 2 female 0
#> 23 624 2 male 1
#> 24 371 2 male 0
#> 25 394 2 male 0
#> 26 520 2 female 1
#> 27 574 2 male 0
#> 28 118 2 male 3
#> 29 390 2 male 1
#> 30 12 2 male 2
#> 31 473 2 female 1
#> 32 26 2 male 2
#> 33 533 2 male 2
#> 34 107 2 female 2
#> 35 53 2 male 2
#> 36 122 2 female 2
#> 37 814 2 male 2
#> 38 965 1 female 1
#> 39 93 2 male 2
#> 40 731 2 female 1
#> 41 460 2 male 1
#> 42 153 2 female 2
#> 43 433 2 female 0
#> 44 145 2 female 2
#> 45 583 2 male 1
#> 46 95 2 female 2
#> 47 303 2 male 0
#> 48 519 2 male 1
#> 49 643 2 male 0
#> 50 765 2 female 1
#> 51 735 2 female 1
#> 52 189 2 male 0
#> 53 53 2 male 0
#> 54 246 2 male 0
#> 55 689 2 male 1
#> 56 65 2 male 0
#> 57 5 2 female 0
#> 58 132 2 male 2
#> 59 687 2 female 1
#> 60 345 2 female 1
#> 61 444 2 female 2
#> 62 223 2 male 1
#> 63 175 2 male 1
#> 64 60 2 female 1
#> 65 163 2 male 1
#> 66 65 2 male 2
#> 67 208 2 female 2
#> 68 821 1 female 0
#> 69 428 2 male 0
#> 70 230 2 male 1
#> 71 840 1 male 0
#> 72 305 2 female 1
#> 73 11 2 male 2
#> 74 132 2 male 1
#> 75 226 2 female 1
#> 76 426 2 female 1
#> 77 705 2 female 0
#> 78 363 2 female 1
#> 79 11 2 male 0
#> 80 176 2 male 0
#> 81 791 2 male 0
#> 82 95 2 male 1
#> 83 196 1 male 1
#> 84 167 2 female 1
#> 85 806 1 male 1
#> 86 284 2 male 1
#> 87 641 2 female 1
#> 88 147 2 male 0
#> 89 740 1 female 1
#> 90 163 2 male 2
#> 91 655 2 male 0
#> 92 239 2 male 1
#> 93 88 2 male 1
#> 94 245 2 female 1
#> 95 588 1 female 0
#> 96 30 2 male 2
#> 97 179 2 male 1
#> 98 310 2 male 1
#> 99 477 2 male 1
#> 100 166 2 female 0
#> 101 559 1 female 0
#> 102 450 2 female 1
#> 103 364 2 male 1
#> 104 107 2 male 1
#> 105 177 2 male 2
#> 106 156 2 male 1
#> 107 529 1 female 1
#> 108 11 2 male 1
#> 109 429 2 male 1
#> 110 351 2 female 2
#> 111 15 2 male 0
#> 112 181 2 male 1
#> 113 283 2 male 1
#> 114 201 2 female 0
#> 115 524 2 female 1
#> 116 13 2 male 2
#> 117 212 2 male 2
#> 118 524 2 male 2
#> 119 288 2 male 2
#> 120 363 2 male 1
#> 121 442 2 male 0
#> 122 199 2 female 2
#> 123 550 2 female 1
#> 124 54 2 male 2
#> 125 558 2 male 0
#> 126 207 2 male 1
#> 127 92 2 male 1
#> 128 60 2 male 1
#> 129 551 1 female 2
#> 130 543 1 female 0
#> 131 293 2 female 1
#> 132 202 2 male 1
#> 133 353 2 male 0
#> 134 511 1 female 1
#> 135 267 2 male 0
#> 136 511 1 female 2
#> 137 371 2 female 1
#> 138 387 2 male 2
#> 139 457 2 male 1
#> 140 337 2 male 0
#> 141 201 2 female 2
#> 142 404 1 male 1
#> 143 222 2 male 2
#> 144 62 2 female 1
#> 145 458 1 male 1
#> 146 356 1 female 1
#> 147 353 2 male 0
#> 148 163 2 male 1
#> 149 31 2 male 0
#> 150 340 2 female 0
#> 151 229 2 male 1
#> 152 444 1 male 0
#> 153 315 1 female 0
#> 154 182 2 female 1
#> 155 156 2 male 2
#> 156 329 2 male 2
#> 157 364 1 female 1
#> 158 291 2 male 2
#> 159 179 2 male 1
#> 160 376 1 female 1
#> 161 384 1 female 0
#> 162 268 2 female 1
#> 163 292 1 male 2
#> 164 142 2 male 1
#> 165 413 1 male 1
#> 166 266 1 female 0
#> 167 194 2 female 1
#> 168 320 2 male 0
#> 169 181 2 male 1
#> 170 285 2 male 0
#> 171 301 1 male 1
#> 172 348 2 female 0
#> 173 197 2 male 1
#> 174 382 1 female 0
#> 175 303 1 male 1
#> 176 296 1 female 1
#> 177 180 2 male 2
#> 178 186 2 female 1
#> 179 145 2 female 1
#> 180 269 1 female 0
#> 181 300 1 male 0
#> 182 284 1 male 0
#> 183 350 2 female 0
#> 184 272 1 female 1
#> 185 292 1 female 0
#> 186 332 1 female 0
#> 187 285 2 female 2
#> 188 259 1 male 0
#> 189 110 2 male 1
#> 190 286 2 male 0
#> 191 270 2 male 1
#> 192 81 2 male 2
#> 193 131 2 male 1
#> 194 225 1 male 1
#> 195 269 2 male 1
#> 196 225 1 male 0
#> 197 243 1 female 1
#> 198 279 1 male 1
#> 199 276 1 female 0
#> 200 135 2 male 1
#> 201 79 2 female 1
#> 202 59 2 male 1
#> 203 240 1 female 0
#> 204 202 1 female 0
#> 205 235 1 female 0
#> 206 105 2 male 2
#> 207 224 1 female 0
#> 208 239 2 female 2
#> 209 237 1 male 1
#> 210 173 1 female 1
#> 211 252 1 female 0
#> 212 221 1 male 1
#> 213 185 1 male 1
#> 214 92 1 female 2
#> 215 13 2 male 1
#> 216 222 1 male 1
#> 217 192 1 female 1
#> 218 183 2 male 2
#> 219 211 1 female 2
#> 220 175 1 female 0
#> 221 197 1 male 1
#> 222 203 1 female 1
#> 223 116 2 male 1
#> 224 188 1 male 1
#> 225 191 1 male 0
#> 226 105 1 female 2
#> 227 174 1 male 1
#> 228 177 1 female 1
#>
#> $data_attributes
#> $data_attributes$survival_ids
#> time status
#> "time" "status"
#>
#> $data_attributes$factor_ids
#> sex ph.ecog
#> "sex" "ph.ecog"
#>
#> $data_attributes$strata_ids
#> character(0)
#>
#>
#> $coxph
#> Call:
#> coxph(formula = Surv(time, status) ~ sex + ph.ecog, data = data)
#>
#> coef exp(coef) se(coef) z p
#> sexfemale -0.5449 0.5799 0.1681 -3.241 0.00119
#> ph.ecog1 0.4182 1.5192 0.1994 2.097 0.03602
#> ph.ecog2 0.9475 2.5792 0.2248 4.216 2.49e-05
#> ph.ecog3 2.0485 7.7565 1.0269 1.995 0.04605
#>
#> Likelihood ratio test=29.5 on 4 df, p=6.172e-06
#> n= 227, number of events= 164
#> (1 observation deleted due to missingness)
#>
#> $summary
#> Call:
#> coxph(formula = Surv(time, status) ~ sex + ph.ecog, data = data)
#>
#> n= 227, number of events= 164
#> (1 observation deleted due to missingness)
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> sexfemale -0.5449 0.5799 0.1681 -3.241 0.00119 **
#> ph.ecog1 0.4182 1.5192 0.1994 2.097 0.03602 *
#> ph.ecog2 0.9475 2.5792 0.2248 4.216 2.49e-05 ***
#> ph.ecog3 2.0485 7.7565 1.0269 1.995 0.04605 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> sexfemale 0.5799 1.7245 0.4171 0.8062
#> ph.ecog1 1.5192 0.6582 1.0277 2.2459
#> ph.ecog2 2.5792 0.3877 1.6602 4.0067
#> ph.ecog3 7.7565 0.1289 1.0366 58.0390
#>
#> Concordance= 0.642 (se = 0.025 )
#> Likelihood ratio test= 29.5 on 4 df, p=6e-06
#> Wald test = 30.7 on 4 df, p=4e-06
#> Score (logrank) test = 32.71 on 4 df, p=1e-06
#>
#>
#> $summaryAsFrame
#> factor.id factor.name factor.value HR Lower_CI Upper_CI Inv_HR
#> 1 sex:female Sex female 0.5798863 0.4170846 0.8062348 1.7244761
#> 2 ph.ecog:1 ECOG Status 1 1.5192159 1.0276575 2.2459010 0.6582343
#> 3 ph.ecog:2 ECOG Status 2 2.5791741 1.6602467 4.0067172 0.3877210
#> 4 ph.ecog:3 ECOG Status 3 7.7564573 1.0365901 58.0389756 0.1289248
#> Inv_Lower_CI Inv_Upper_CI p
#> 1 1.2403335 2.3975953 1.191350e-03
#> 2 0.4452556 0.9730869 3.601567e-02
#> 3 0.2495809 0.6023201 2.490622e-05
#> 4 0.0172298 0.9647014 4.604714e-02
#>
#> $overall
#> $overall$n
#> [1] 227
#>
#> $overall$covariates
#> [1] "sex" "ph.ecog"
#>
#> $overall$`Likelihood ratio test p`
#> [1] 6.172497e-06
#>
#> $overall$`Wald test p`
#> [1] 3.528987e-06
#>
#> $overall$`Score (logrank) test p`
#> [1] 1.372107e-06
#>
#>
#> $formula
#> [1] "Surv(time, status) ~ sex + ph.ecog"
#>
#> attr(,"class")
#> [1] "SurvivalAnalysisResult" "SurvivalAnalysisMultivariateResult"
#> [3] "list"
Sidenote: We are very much used to see hazard ratios for a binary covariate. Please note the different interpretation for continous variables: The hazard ratio is to be interpreted with regard to a change of size 1. With a binary variable, there is only one step from 0 to 1. With age, the range is different! Assume we detect a hazard ratio of 1.04 for age in some study. What is the calculated HR of a 75y old patient compared to a 45y old?
exp((75-45)*log(1.04))
#> [1] 3.243398
The usual method to display results of multivariate analyses is the forest plot. The survivalAnalysis
package provides an implementation which generates ready-to-publish plots and allows extensive customization.
forest_plot(result)
#> Warning: `lgl_along()` is deprecated as of rlang 0.2.0.
#> This warning is displayed once per session.
Ok, this one is not ready to publish. We need to tweak some parameters:
tidytidbits
’ save_pdf
, which reads this attribute. Further adjustment is trial&error. Please note that this is plotting and not text processing - if you specify a too small size, text will disappear or overlap.forest_plot(result,
factor_labeller = covariate_names,
endpoint_labeller = c(time="OS"),
orderer = ~order(HR),
labels_displayed = c("endpoint", "factor", "n"),
ggtheme = ggplot2::theme_bw(base_size = 10),
relative_widths = c(1, 1.5, 1),
HR_x_breaks = c(0.25, 0.5, 0.75, 1, 1.5, 2))
The forest_plot
function actually accepts any number of results and will display them all in the same plot. For example, you may want to display both OS and PFS in the same plot, but of course ordered and with a line separating the two. To do that,
forest_plot
orderer = ~order(endpoint, factor.name)
categorizer = ~!sequential_duplicates(endpoint, ordering = order(endpoint, factor.name))
, where the ordering
clause is identical to your orderer code.Finally, if you want separate plots but display them in a grid, use forest_plot_grid
to do the grid layout and forest_plot
’s title argument to add the A, B, C… labels.
It is common practice to perform univariate analyses of all covariates first and take only those into the multivariate analysis which were significant to some level in the univariate analysis. (I see some pros and strong cons with this procedure, but am open to learn more on this). The univariate part can easily be achieved using purrr
’s map
function. A forest plot, as already said, will happily plot multiple results, even if they come as a list.
<- survival::lung %>% mutate(sex=rename_factor(sex, `1` = "male", `2` = "female"))
df
map(vars(age, sex, ph.ecog, wt.loss), function(by)
{analyse_multivariate(df,
vars(time, status),
covariates = list(by), # covariates expects a list
covariate_name_dict = covariate_names)
%>%
}) forest_plot(factor_labeller = covariate_names,
endpoint_labeller = c(time="OS"),
orderer = ~order(HR),
labels_displayed = c("endpoint", "factor", "n"),
ggtheme = ggplot2::theme_bw(base_size = 10))
Note how the values only slighlty differ. The number of cases is larger each time because the multivariate analysis will only include the subset of subjects for which all covariates are known. Age is significant in UV but not in MV - there is probably interaction between ECOG and age.
We are moving to the grounds of exploratory analysis. For a somewhat interesting example, we add the KRAS mutational status to the data set (by random sampling, for the sake of this tutorial). No, there is a categorical variable with five levels, but none of these comes natural as reference level. One may argue that wild type should be the reference level, but we may want to know if wild type is better than any KRAS mutation. If we omit wild-type and compare only among mutated tumors, there is definitely no suitable reference level.
The one-hot parameter triggers a mode where for each factor level, the hazard ratio vs. the remaining cohort is plotted. This means that no level is omitted. Please be aware of the statistical caveats. And please note that this has nothing to do any more with multivariate analysis. In fact, now you need the result of the univariate, categorically-minded analyse_survival
.
::lung %>%
survivalmutate(kras=sample(c("WT", "G12C", "G12V", "G12D", "G12A"),
nrow(.),
replace = TRUE,
prob = c(0.6, 0.24, 0.16, 0.06, 0.04))
%>%
) analyse_survival(vars(time, status), by=kras) %>%
forest_plot(use_one_hot=TRUE,
endpoint_labeller = c(time="OS"),
orderer = ~order(HR),
labels_displayed = c("endpoint", "factor", "n"),
ggtheme = ggplot2::theme_bw(base_size = 10))
We randomly assigned the RAS status, so results will differ at each generation of this vignette. If one of the subgroups should differ significantly, by the law of small numbers, it will probably be one of the rarer variants.