This vignette shows the results of a simulation-based calibration
(SBC) checking study to validate the implementation of the models in
brms.mmrm
. SBC checking tests the ability of a Bayesian
model to recapture the parameters used to simulate prior predictive
data. For details on SBC checking, please read Modrák et al. (2024) and the SBC
R package
(Kim et al. 2022). This particular SBC
checking study uses the targets
pipeline in the viggnettes/sbc/
subdirectory of the brms.mmrm
package source code.
From the results below, the SBC rank statistics are approximately
uniformly distributed. In other words, the posterior distribution from
the brms
/Stan MMRM modeling code matches the prior from
which the datasets were simulated. This is evidence that both the
subgroup and non-subgroup models in brms.mmrm
are
implemented correctly.
To show the SBC checking results in this vignette, we first load code from the SBC checking study, and we use custom functions to read and plot rank statistics:
library(dplyr)
library(ggplot2)
library(tibble)
library(tidyr)
source("sbc/R/prior.R")
source("sbc/R/response.R")
source("sbc/R/scenarios.R")
read_ranks <- function(path) {
fst::read_fst(path) |>
tibble::as_tibble() |>
pivot_longer(
cols = everything(),
names_to = "parameter",
values_to = "rank"
)
}
plot_ranks <- function(ranks) {
ggplot(ranks) +
geom_histogram(
aes(x = rank),
breaks = seq(from = 0, to = max(ranks$rank), length.out = 10)
) +
facet_wrap(~parameter)
}
Each section below is its own SBC checking study based on a given modeling scenario. Each scenario shows results from 1000 independent simulations from the prior.
The subgroup scenario distinguishes itself from the others by the presence of a subgroup factor. Assumptions:
Model formula:
#> response ~ group + group:subgroup + group:subgroup:time + group:time + subgroup + subgroup:time + time + continuous1 + continuous2 + balanced + unbalanced + unstr(time = time, gr = patient)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(subgroup) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef
#> 1 normal(0.0267, 0.9478) Intercept
#> 3 normal(0.2253, 2.218) b balancedlevel2
#> 4 normal(-0.2149, 1.1265) b balancedlevel3
#> 5 normal(0.2277, 2.1154) b continuous1
#> 6 normal(0.2009, 2.0754) b continuous2
#> 7 normal(0.2442, 0.694) b groupgroup_2
#> 8 normal(0.1478, 2.1664) b groupgroup_2:subgroupsubgroup_2
#> 9 normal(-0.0688, 0.3147) b groupgroup_2:subgroupsubgroup_2:timetime_2
#> 10 normal(0.0493, 1.656) b groupgroup_2:subgroupsubgroup_2:timetime_3
#> 11 normal(-0.1995, 1.0938) b groupgroup_2:timetime_2
#> 12 normal(-0.2115, 0.5998) b groupgroup_2:timetime_3
#> 13 normal(0.1978, 2.6558) b subgroupsubgroup_2
#> 14 normal(0.0661, 0.6617) b subgroupsubgroup_2:timetime_2
#> 15 normal(0.1861, 2.2234) b subgroupsubgroup_2:timetime_3
#> 16 normal(-0.08, 0.6091) b timetime_2
#> 17 normal(0.1083, 1.7101) b timetime_3
#> 18 normal(0.0165, 2.5845) b unbalancedlevel2
#> 19 normal(0.0535, 0.4968) b unbalancedlevel3
#> 20 lkj(1.12) cortime
#> 22 normal(0.08, 1.3733) b timetime_1
#> 23 normal(0.1067, 1.4465) b timetime_2
#> 24 normal(-0.0887, 0.8012) b timetime_3
#> dpar
#> 1
#> 3
#> 4
#> 5
#> 6
#> 7
#> 8
#> 9
#> 10
#> 11
#> 12
#> 13
#> 14
#> 15
#> 16
#> 17
#> 18
#> 19
#> 20
#> 22 sigma
#> 23 sigma
#> 24 sigma
The following histograms show the SBC rank statistics which compare the prior parameter draws draws to the posterior draws. If the data simulation code and modeling code are both correct and consistent, then the rank statistics should be approximately uniformly distributed.
Fixed effect parameter ranks:
ranks_subgroup |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Standard deviation parameter ranks:
Correlation parameter ranks:
This scenario uses unstructured correlation and does not use a subgroup variable. Assumptions:
Model formula:
#> response ~ 0 + group + time + unstr(time = time, gr = patient)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(unstructured) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 2 normal(0.1433, 0.403) b groupgroup_1
#> 3 normal(0.2197, 2.4527) b groupgroup_2
#> 4 normal(-0.0344, 2.3087) b groupgroup_3
#> 5 normal(0.0434, 0.5924) b timetime_2
#> 6 normal(-0.1839, 2.639) b timetime_3
#> 7 normal(0.2186, 0.9006) b timetime_4
#> 8 lkj(1.4196) cortime
#> 10 normal(-0.1644, 2.893) b timetime_1 sigma
#> 11 normal(-0.2093, 2.7022) b timetime_2 sigma
#> 12 normal(-0.0947, 0.4162) b timetime_3 sigma
#> 13 normal(0.0528, 1.1022) b timetime_4 sigma
SBC checking rank statistics:
Fixed effect parameter ranks:
ranks_unstructured |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks:
Correlation parameter ranks:
This scenario uses an autoregressive moving average (ARMA) model with autoregressive order 1 and moving average order 1. Assumptions:
Model formula:
#> response ~ 0 + group + time + arma(time = time, gr = patient, p = 1L, q = 1L, cov = FALSE)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(autoregressive_moving_average) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 1 uniform(0.1, 0.9) ar
#> 3 normal(-0.0789, 0.6718) b groupgroup_1
#> 4 normal(0.078, 2.5557) b groupgroup_2
#> 5 normal(0.046, 0.7607) b timetime_2
#> 6 normal(0.0105, 1.5174) b timetime_3
#> 7 uniform(0.1, 0.9) ma
#> 9 normal(0.1917, 0.647) b timetime_1 sigma
#> 10 normal(-0.1232, 1.2625) b timetime_2 sigma
#> 11 normal(0.1052, 1.2691) b timetime_3 sigma
SBC checking rank statistics:
ranks_autoregressive_moving_average <- read_ranks(
"sbc/results/autoregressive_moving_average.fst"
)
Fixed effect parameter ranks:
ranks_autoregressive_moving_average |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks:
Correlation parameter ranks:
This scenario is the same as above, but the correlation structure is autoregressive with order 2. Model formula:
#> response ~ 0 + group + time + ar(time = time, gr = patient, p = 2L, cov = FALSE)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(autoregressive) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 1 uniform(0.1, 0.9) ar
#> 3 normal(-0.1299, 2.7448) b groupgroup_1
#> 4 normal(-0.062, 2.4358) b groupgroup_2
#> 5 normal(0.1115, 2.184) b timetime_2
#> 6 normal(0.0616, 0.9486) b timetime_3
#> 8 normal(0.0484, 1.2511) b timetime_1 sigma
#> 9 normal(0.1327, 1.5393) b timetime_2 sigma
#> 10 normal(0.1074, 1.795) b timetime_3 sigma
SBC checking rank statistics:
Fixed effect parameter ranks:
ranks_autoregressive |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks:
Correlation parameter ranks:
This scenario is the same as above, but it uses a moving average correlation structure with order 2. Model formula:
#> response ~ 0 + group + time + ma(time = time, gr = patient, q = 2L, cov = FALSE)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(moving_average) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 2 normal(0.2346, 0.4199) b groupgroup_1
#> 3 normal(0.2109, 1.8085) b groupgroup_2
#> 4 normal(0.1955, 1.3087) b timetime_2
#> 5 normal(-0.1645, 2.5088) b timetime_3
#> 6 uniform(0.1, 0.9) ma
#> 8 normal(0.1624, 2.1076) b timetime_1 sigma
#> 9 normal(0.1165, 0.6724) b timetime_2 sigma
#> 10 normal(-0.0014, 1.0286) b timetime_3 sigma
SBC checking rank statistics:
Fixed effect parameter ranks:
ranks_moving_average |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks:
Correlation parameter ranks:
This scenario is the same as above, but it uses a compound symmetry correlation structure. Model formula:
#> response ~ 0 + group + time + cosy(time = time, gr = patient)
#> sigma ~ 0 + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(compound_symmetry) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 2 normal(-0.0199, 2.3874) b groupgroup_1
#> 3 normal(-0.1953, 2.3133) b groupgroup_2
#> 4 normal(0.1091, 1.0249) b timetime_2
#> 5 normal(0.0072, 2.6298) b timetime_3
#> 6 uniform(0.1, 0.9) cosy
#> 8 normal(-0.1885, 0.251) b timetime_1 sigma
#> 9 normal(0.2114, 2.6677) b timetime_2 sigma
#> 10 normal(0.0093, 1.0933) b timetime_3 sigma
SBC checking rank statistics:
Fixed effect parameter ranks:
ranks_compound_symmetry |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks:
Correlation parameter ranks:
This scenario is the same as above, but it uses a diagonal correlation structure (independent time points within patients). Model formula:
#> response ~ 0 + group + time
#> sigma ~ group + group:time + time
The prior was randomly generated and used for both simulation and analysis:
setup_prior(diagonal) |>
select(prior, class, coef, dpar) |>
as.data.frame()
#> prior class coef dpar
#> 2 normal(0.1858, 1.6609) b groupgroup_1
#> 3 normal(0.1542, 1.0235) b groupgroup_2
#> 4 normal(0.1078, 2.43) b timetime_2
#> 5 normal(0.1545, 1.4261) b timetime_3
#> 6 normal(0.2037, 1.2606) Intercept sigma
#> 8 normal(-0.2353, 2.852) b groupgroup_2 sigma
#> 9 normal(0.2399, 1.0988) b groupgroup_2:timetime_2 sigma
#> 10 normal(-0.2134, 0.5962) b groupgroup_2:timetime_3 sigma
#> 11 normal(0.2072, 0.7107) b timetime_2 sigma
#> 12 normal(0.1294, 1.1781) b timetime_3 sigma
SBC checking rank statistics:
Fixed effect parameter ranks:
ranks_diagonal |>
filter(grepl("^b_", parameter)) |>
filter(!grepl("^b_sigma", parameter)) |>
plot_ranks()
Log-scale standard deviation parameter ranks: