library(GLMMselect)
The GLMMselect
package provides a novel Bayesian model
selection approach for the analysis of Poisson and binary data.
GLMMselect
contains functions to simultaneously select
fixed effects and random effects for non-Gaussian data. In the
GLMMselect
package, we use pseudo likelihood method to
solve the problem that the random effects cannot be analytically
integrated out of GLMMs. In addition, we develop a fractional Bayes
factor (FBF) approach to obtain posterior probabilities of the competing
models. Full details on the methods implemented in
GLMMselect
can be found in the manuscript (Xu et
al. 202X).
This vignette contains a estimated example for count data and a case
study presented in the manuscript (Xu et al. 202X) to illustrate how
GLMMselect
works. For the simulated example, the count data
are simulated from Poisson GLMM with four candidate fixed effects and
two types of random effects (spatial random effects and overdispersion
random effects).
The function GLMMselect()
implemented in the
GLMMselect
package is described below:
GLMMselect()
performs the ARM and HCM (Xu et
al. 202X) which performs model selection for generalized linear mixed
models, given a numeric vector of binary or count data Y
, a
matrix of covariates X
, a list of covariance matrices for
random effects Sigma
, a list of design matrices for random
effects Z
, the description of the error distribution
family
, and the prior for variance component of random
effects prior
. Arguments pip_fixed
and
pip_random
are the thresholds that if the posterior
inclusion probability of fixed effects or random effects is larger than
pip_fixed
or pip_random
, the corresponding
fixed effects and random effects will be included in the best model. In
addition, the argument NumofModel
can adjust the number of
models with the largest posterior probabilities which are printed out in
the output. The function GLMMselect()
returns a list of the
indices of fixed effects and random effects that are identified in the
best model, a table of models’ posterior probabilities (MPP) which are
sorted in decreasing order, a table for fixed effects’ inference, and a
table for random effects’ inference which includes point estimates and
standard deviations for coefficients, and posterior inclusion
probabilities (PIP) for each effect.The generalized linear mixed model used in the
GLMMselect
package is \[
g(\pmb{Y})=X\pmb{\beta}+\sum_{q=1}^Q Z_q \pmb{\alpha}_q,\]
where
Currently, the GLMMselect
package can analyze binary
responses (family = "bernoulli"
) and Poisson responses
(family = "poisson"
). The manuscript that develops the
methods in GLMMselect (Xu et al. 202X) provides details on the priors
for \(\pmb{\beta}\) and \(\tau_q\), as well as details about the FBF
approach used by GLMMselect.
The GLMMselect()
function requires a vector of
observations (either Bernoulli or assumed Poisson distributed), a matrix
of covariates, a list of design matrices for random effects, and a list
of covariance matrices for random effects.
The vector of observations must be a numeric \(n \times 1\) vector. In the GLMMselect package, there is an example simulated from the Poisson generalized linear mixed model with four candidate covariates, a vector of spatial random effects, and a vector of overdispersion random effects. Here are the first five elements of the Poisson simulated vector of observations:
data("Y")
1:5]
Y[#> [1] 33 47 187 26 35
The covariate matrix passed to the function contains all candidate covariates. Each column corresponds to a candidate covariate. Each row corresponds to an observation. In this example, the covariate matrix contains four candidate covariates. The covariates in the first two columns are in the true model. Here are the first five rows of the covariate matrix:
data("X")
1:5,]
X[#> [,1] [,2] [,3] [,4]
#> [1,] 0.3586051 -1.39887035 -1.2441304 -0.97847040
#> [2,] -0.1766957 0.09031447 0.5870933 -1.19242832
#> [3,] 0.7548008 0.54639426 -0.2518881 0.02433473
#> [4,] 0.0854506 -0.97412286 0.1203769 0.04743402
#> [5,] 0.5478787 -1.19069820 -1.3385666 -0.28752443
The design matrices for candidate random effects are passed to the GLMMselect function as a list. In this example, this is a list of two design matrices for two types of random effects. The first component is for spatial random effects. The second component is for overdispersion random effects. Here are the first five rows and the first five columns for each of these design matrices:
data("Z")
1]][1:5,1:5]
Z[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
2]][1:5,1:5]
Z[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
The covariance matrices for the candidate random effects are also passed to the GLMMselect function as a list. In this example, this is a list of two covariance matrices for two types of random effects. The first component is for spatial random effects. The second componet is for overdispersion random effects. Here are the first five rows and the first five columns for each of these covariance matrices:
data("Sigma")
1]][1:5,1:5]
Sigma[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1.2861009 0.7911009 0.4988302 0.3047474 0.1670184
#> [2,] 0.7911009 0.9938302 0.5970181 0.3611012 0.2041574
#> [3,] 0.4988302 0.5970181 0.8561012 0.4964282 0.2877719
#> [4,] 0.3047474 0.3611012 0.4964282 0.7827719 0.4448714
#> [5,] 0.1670184 0.2041574 0.2877719 0.4448714 0.7498331
2]][1:5,1:5]
Sigma[[#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 0 0 0 0
#> [2,] 0 1 0 0 0
#> [3,] 0 0 1 0 0
#> [4,] 0 0 0 1 0
#> [5,] 0 0 0 0 1
Data Y
,X
, Sigma
, and
Z
above are attached in the GLMMselect
package.
In this example, we use GLMMselect
to analyze Poisson
count data with an approximate reference (AR) prior for the variance
components of random effects.
<- GLMMselect(Y=Y, X=X, Sigma=Sigma, Z=Z,
Model_selection_output family="poisson", prior="AR", offset=NULL)
$BestModel
Model_selection_output#> $covariate_inclusion
#> [1] 1 2
#>
#> $random_effect_inclusion
#> [1] 1
$PosteriorProb
Model_selection_output#> x1 x2 x3 x4 r1 r2 MPP
#> 1 1 1 0 0 1 0 0.624
#> 2 1 1 1 0 1 0 0.137
#> 3 1 1 0 1 1 0 0.114
#> 4 1 1 1 1 1 0 0.055
#> 5 1 1 0 0 0 1 0.032
#> 6 1 1 0 0 1 1 0.022
#> 7 1 1 1 0 0 1 0.006
#> 8 1 1 0 1 0 1 0.004
#> 9 1 1 1 0 1 1 0.002
#> 10 1 1 0 1 1 1 0.002
$FixedEffect
Model_selection_output#> Est SD PIP
#> x1 1.028 0.028 1.000
#> x2 0.997 0.027 1.000
#> x3 -0.012 0.020 0.202
#> x4 0.000 0.023 0.177
$RandomEffect
Model_selection_output#> Est SD PIP
#> r1 0.042 0.016 0.956
#> r2 0.012 0.006 0.070
GLMMselect()
outputs the indices of the covariates and
the indices of the random effects which are in the best model. The true
model contains the first two covariates and spatial random effects.
GLMMselect
correctly identifies these two covariates and
the spatial random effects.
Here is a case study to help illustrate how to use
GLMMselect
. This dataset provides the number of male lip
cancer cases in the 56 counties of Scotland during the period 1975-1980,
as well as the percentage of the work force employed in agriculture,
fishing, or forestry (AFF) as a covariate. The model we consider is
\[
y_i|\mu_i \stackrel{ind}{\sim} \mbox{Poisson}(\mu_i), \ \ \ i=1\dots 56,
\\
\log(\mu_i) = \log(n_i)+\pmb{x}_i^T\pmb{\beta}+\alpha_{1i}+\alpha_{2i},
\\
\pmb{\alpha}_1 \sim N(\pmb{0},\tau_1 \Sigma_1), \\
\pmb{\alpha}_2 \sim N(\pmb{0}, \tau_2 \Sigma_2).
\]
\(n_i\) is the expected number of lip cancer cases in the \(i^{th}\) county, which are assumed to be known constants. \(\pmb{\alpha}_1\) is a vector of spatial random effects. \(\pmb{\alpha}_2\) is a vector of overdispersion random effects.
In this example, we use GLMMselect
to analyze lip cancer
data with a half Cauchy (HC) prior for variance components of random
effects. Data lipcancer_Y
,lipcancer_X
,
lipcancer_Sigma
, and lipcancer_Z
are attached
in the GLMMselect
package.
<- GLMMselect(Y=lipcancer_Y, X=lipcancer_X, Sigma=lipcancer_Sigma, Z=lipcancer_Z,
lip_cancer_output family="poisson", prior="HC", offset=lipcancer_offset)
$BestModel
lip_cancer_output#> $covariate_inclusion
#> [1] 1
#>
#> $random_effect_inclusion
#> [1] 1
$PosteriorProb
lip_cancer_output#> x1 r1 r2 MPP
#> 1 1 1 0 0.905
#> 2 0 1 0 0.084
#> 3 1 1 1 0.009
#> 4 0 1 1 0.002
#> 5 1 0 1 0.000
#> 6 0 0 1 0.000
#> 7 1 0 0 0.000
#> 8 0 0 0 0.000
$FixedEffect
lip_cancer_output#> Est SD PIP
#> x1 0.428 0.128 0.914
$RandomEffect
lip_cancer_output#> Est SD PIP
#> r1 0.486 0.157 1.000
#> r2 0.000 0.052 0.011
ref.ICAR
provides an objective Bayesian approach for
modeling spatially correlated areal data using an intrinsic conditional
autoregressive prior on a vector of spatial random effects.
Xu, Shuangshuang, Ferreira, M. A. R., Porter, E. M., and Franck, C. T. (202X). Bayesian Model Selection for Generalized Linear Mixed Models, Biometrics, under review.