Welcome to nimbleEcology
. This package provides
distributions that can be used in NIMBLE models for common ecological
model components. These include:
NIMBLE is a system for writing hierarchical statistical models and algorithms. It is distributed as an R package nimble. NIMBLE stands for “Numerical Inference for statistical Models using Bayesian and Likelihood Estimation”. NIMBLE includes:
A dialect of the BUGS model language that is extensible. NIMBLE uses almost the same model code as WinBUGS, OpenBUGS, and JAGS. Being “extensible” means that it is possible to write new functions and distributions and use them in your models.
An algorithm library including Markov chain Monte Carlo (MCMC) and other methods.
A compiler that generates C++ for each model and algorithm, compiles the C++, and lets you use it from R. You don’t need to know anything about C++ to use nimble.
More information about NIMBLE can be found at https://r-nimble.org.
The paper that describes NIMBLE is here.
The best way to seek user support is the nimble-users list. Information on how to join can be found at https://r-nimble.org.
The distributions provided in nimbleEcology
let you
simplify model code and the algorithms that use it, such as MCMC. For
the ecological models in nimbleEcology
, the simplification
comes from removing some discrete latent states from the model and
instead doing the corresponding probability (or likelihood) calculations
in a specialized distribution that marginalizes over the latent
states.
For each of the ecological model components provided by
nimbleEcology
, here are the discrete latent states that are
replaced by use of a marginalized distribution:
Before going further, let’s illustrate how nimbleEcology
can be used for a basic occupancy model.
Occupancy models are used for data from repeated visits to a set of
sites, where the detection (1) or non-detection (0) of a species of
interest is recorded on each visit. Define y[i, j]
as the
observation from site i
on visit j
.
y[i, j]
is 1 if the species was seen and 0 if not.
Typical code for for an occupancy model would be as follows.
Naturally, this is written for nimble
, but the same code
should work for JAGS or BUGS (WinBUGS or OpenBUGS) when used as needed
for those packages.
library(nimble)
#> nimble version 1.2.0 is loaded.
#> For more information on NIMBLE and a User Manual,
#> please visit https://R-nimble.org.
#>
#> Note for advanced users who have written their own MCMC samplers:
#> As of version 0.13.0, NIMBLE's protocol for handling posterior
#> predictive nodes has changed in a way that could affect user-defined
#> samplers in some situations. Please see Section 15.5.1 of the User Manual.
#>
#> Attaching package: 'nimble'
#> The following object is masked from 'package:stats':
#>
#> simulate
#> The following object is masked from 'package:base':
#>
#> declare
library(nimbleEcology)
#> Loading nimbleEcology. Registering multiple variants of the following distributions:
#> dOcc, dDynOcc, dCJS, dHMM, dDHMM, dNmixture.
occupancy_code <- nimbleCode({
psi ~ dunif(0,1)
p ~ dunif (0,1)
for(i in 1:nSites) {
z[i] ~ dbern(psi)
for(j in 1:nVisits) {
y[i, j] ~ dbern(z[i] * p)
}
}
})
In this code:
psi
is occupancy probability;p
is detection probability;z[i]
is the latent state of whether a site is really
occupied (z[i]
= 1) or not (z[i]
= 0);nSites
is the number of sites; andnVisits
is the number of sampling visits to each
site.The new version of this model using nimbleEcology
’s
specialized occupancy distribution will only work in nimble
(not JAGS or BUGS). It is:
occupancy_code_new <- nimbleCode({
psi ~ dunif(0,1)
p ~ dunif (0,1)
for(i in 1:nSites) {
y[i, 1:nVisits] ~ dOcc_s(probOcc = psi, probDetect = p, len = nVisits)
}
})
In the new code, the vector of data from all visits to site
i
, namely y[i, 1:nVisits]
, has its likelihood
contribution calculated in one step, dOcc_s
. This occupancy
distribution calculates the total probability of the data by summing
over the cases that the site is occupied or unoccupied. That means that
z[i]
is not needed in the model, and MCMC will not need to
sample over z[i]
. Details of all calculations, and
discussion of the pros and cons of changing models in this way, are
given later this vignette.
The _s
part of dOcc_s
means that
p
is a scalar, i.e. it does not vary with visit. If it
should vary with visit, a condition sometimes described as being
time-dependent, it would need to be provided as a vector, and the
distribution function should be dOcc_v
.
We can run an MCMC for this model in the following steps:
The function nimbleMCMC
does all of these steps for you.
The function runMCMC
does steps 5-6 for you, with
convenient management of options such as discarding burn-in samples. The
full set of steps allows greater control over how you use a model and
configure and use an MCMC. We will go through the steps 1-4 and then use
runMCMC
for steps 5-6.
In this example, we also need simulated data. We can use the same model to create simulated data, rather than writing separate R code for that purpose.
occupancy_model <- nimbleModel(occupancy_code,
constants = list(nSites = 50, nVisits = 5))
#> Defining model
#> Building model
#> Running calculate on model
#> [Note] Any error reports that follow may simply reflect missing values in model variables.
#> Checking model sizes and dimensions
#> [Note] This model is not fully initialized. This is not an error.
#> To see which variables are not initialized, use model$initializeInfo().
#> For more information on model initialization, see help(modelInitialization).
occupancy_model$psi <- 0.7
occupancy_model$p <- 0.15
simNodes <- occupancy_model$getDependencies(c("psi", "p"), self = FALSE)
occupancy_model$simulate(simNodes)
occupancy_model$z
#> [1] 0 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 1
#> [39] 1 1 1 1 1 1 1 0 1 0 1 0
Next we show all of the same steps, except for simulating data, using the new version of the model.
occupancy_model_new <- nimbleModel(occupancy_code_new,
constants = list(nSites = 50, nVisits = 5),
data = list(y = occupancy_model$y),
inits = list(psi = 0.7, p = 0.15))
#> Defining model
#> Building model
#> Setting data and initial values
#> Running calculate on model
#> [Note] Any error reports that follow may simply reflect missing values in model variables.
#> Checking model sizes and dimensions
MCMC_new <- buildMCMC(occupancy_model_new) ## This will use default call to configureMCMC.
Coccupancy_model_new <- compileNimble(occupancy_model_new)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
CMCMC_new <- compileNimble(MCMC_new, project = occupancy_model_new)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
The results of the two versions match closely.
The posterior density plots show that the familiar, conventional
version of the model yields the same posterior distribution as the new
version, which uses dOcc_s
.
It is useful that the new way to write the model does not have
discrete latent states. Since this example also does not have other
latent states or random effects, we can use it simply as a likelihood or
posterior density calculator. More about how to do so can be found in
the nimble User Manual. Here we illustrate making a compiled
nimbleFunction
for likelihood calculations for parameters
psi
and p
and maximizing the likelihood using
R’s optim
.
CalcLogLik <- nimbleFunction(
setup = function(model, nodes)
calcNodes <- model$getDependencies(nodes, self = FALSE),
run = function(v = double(1)) {
values(model, nodes) <<- v
return(model$calculate(calcNodes))
returnType(double(0))
}
)
OccLogLik <- CalcLogLik(occupancy_model_new, c("psi", "p"))
COccLogLik <- compileNimble(OccLogLik, project = occupancy_model_new)
#> Compiling
#> [Note] This may take a minute.
#> [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
nimbleEcology
.As of nimble
version 1.0.0, there is a system for
automatic (or algorithmic) differentiation, known as AD. This is used by
algorithms such as Hamiltonian Monte Carlo (see package nimbleHMC
)
and Laplace approximation (buildLaplace
in
nimble
).
The distributions provided in nimbleEcology
support AD
as much as possible. There are three main points to keep in mind:
nimbleEcology
distributions
are all discrete values. It is possible to take derivatives
with respect to continuous parameters of the distributions. If the
“data” are marked as data
in the model (and hence will not
be sampled by MCMC, for example), there is no problem.nimbleEcology
distributions,
the values “baked in” sizes of variables. In some cases (such ash dHMM
and dDHMM) they also include the data values. See the help page for each
distribution for more details (e.g. help(dOcc)
). If the
data are scientific data that do not need to be changed after creating
the model and algorithm, there is no problem.dNmixture
portion of a
distribution name below should be replaced with
dNmixtureAD
.nimbleEcology
In this section, we introduce each of the nimbleEcology
distributions in more detail. We will summarize the calculations using
mathematical notation and then describe how to use the distributions in
a nimble
model.
Some distribution names are followed by a suffix indicating the type
of some parameters, for example the _s
in
dOcc_s
. NIMBLE uses a static typing system, meaning that a
function must know in advance if a certain argument will be a scalar,
vector, or matrix. There may be notation such as sv
or
svm
if there are two or three parameters that can be
time-independent (s
) or time-dependent (v
or
m
) in one or more dimensions. In general, s
corresponds to a scalar argument, v
to a vector argument,
and m
to a matrix argument. The order of these type
indicators will correspond to the order of the relevant parameters, but
always check the documentation when using a new distribution with the R
function ?
(e.g., both ?dOcc
and
?dOcc_s
work).
The static typing requirement may be relaxed somewhat in the future.
Cormack-Jolly-Seber models give the probability of a capture history
for each of many individuals, conditional on first capture, based on
parameters for survival and detection probability.
nimbleEcology
provides a distribution for individual
capture histories, with variants for time-independent and time-dependent
survival and/or detection probability. Of course, the survival and
detection parameters for the CJS probability may themselves depend on
other parameters and/or random effects. The rest of this summary will
focus on one individual’s capture history.
Define \(\phi_t\) as survival from time \(t\) to \(t+1\) and \(\mathbf{\phi} = (\phi_1, \ldots, \phi_{T-1})\), where \(T\) is the length of the series. We use “time” and “sampling occasion” as synonyms in this context, so \(T\) is the number of sampling occasions. (Be careful with time indexing. Sometimes you might see \(\phi_t\) defined as survival from time \(t-1\) to \(t\).) Define \(p_t\) as detection probability at time \(t\) and \(\mathbf{p} = (p_1, \ldots, p_T)\). Define the capture history as \(\mathbf{y} = y_{1:T} = (y_1, \ldots, y_T)\), where each \(y_t\) is 0 or 1. The notation \(y_{i:j}\) means the sequence of observations from time \(i\) to time \(j\). The first observation of the capture history should always be 1: \(y_1 = 1\). The CJS probability calculations condition on this first capture.
There are multiple ways to write the CJS probability. We will do so in a state-space format because that leads to the more general DHMM case next. The probability of observations given parameters, \(P(\mathbf{y} | \mathbf{\phi}, \mathbf{p})\), is factored as: \[ P(\mathbf{y} | \mathbf{\phi}, \mathbf{p}) = \prod_{t = 1}^{T-1} P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p}) \]
Each factor \(P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p})\) is calculated as: \[ P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p}) = I(y_{t+1} = 1) (A_{t+1} p_{t+1}) + I(y_{t+1} = 0) (A_{t+1} (1-p_{t+1}) + (1-A_{t+1})) \] The indicator function \(I(y_t = 1)\) is 1 if it \(y_t\) is 1, 0 otherwise, and vice versa for \(I(y_t = 0)\). Here \(A_{t+1}\) is the probability that the individual is alive at time \(t+1\) given \(y_{1:t}\), the data up to the previous time. This is calculated as: \[ A_{t+1} = G_{t} \phi_{t} \] where \(G_{t}\) is the probability that the individual is alive at time \(t\) given \(y_{1:t}\), the data up to the current time. This is calculated as: \[ G_{t} = I(y_t = 1) 1 + I(y_t = 0) \frac{A_t (1-p_t)}{A_t (1-p_t) + (1-A_t)} \] The sequential calculation is initialized with \(G_1 = 1\). For time step \(t+1\), we calculate \(A_{t+1}\), then \(P(y_{t+1} | y_{1:t}, \mathbf{\phi}, \mathbf{p})\), then \(G_{t+1}\), leaving us ready for time step \(t+2\). This is a simple case of a hidden Markov model where the latent state, alive or dead, is not written explicitly.
In the cases with time-independent survival or capture probability, we simply drop the time indexing for the corresponding parameter.
nimbleEcology
CJS models are available in four distributions in
nimbleEcology
. These differ only in whether survival
probability and/or capture probability are time-dependent or
time-independent, yielding four combinations:
dCJS_ss
: Both are time-independent (scalar).dCJS_sv
: Survival is time-independent (scalar). Capture
probability is time-dependent (vector).dCJS_vs
: Survival is time-dependent (vector). Capture
probability is time-independent (scalar).dCJS_vv
: Both are time-dependent (vector).The usage for each is similar. An example for dCJS_vs
is:
y[i, 1:T] ~ dCJS_sv(probSurvive = phi, probCapture = p[i, 1:T], len = T)
Note the following points:
y[i, 1:T]
is a vector of the capture history. It is
written as if i
indexes individual, but it could be any
vector in any variable in the model.dCJS_sv
are named. As in R, this is
optional but helpful. Without names, the order matters.probSurvive
is provided as a scalar value, assuming
there is a variable called phi
.probSurvive
is a vector
(dOcc_vs
and dOcc_vv
), the \(t^{\mbox{th}}\) element of
probSurvive
is \(\phi_t\)
above, namely the probability of survival from occasion \(t\) to \(t+1\).probCapture
is provided as a vector value, assuming
there is a matrix variable called p
. The value of
probCapture
could be any vector from any variable in the
model.probCapture[t]
(i.e., the \(t^{\mbox{th}}\) element of
probCapture
, which is p[i, t]
in this example)
is \(p_t\) above, namely the
probability of capture, if alive, at time \(t\).len
is in some cases redundant
(the information could be determined by the length of other inputs), but
nevertheless it is required.nimbleEcology
HMMs and DHMMs are available in four distributions in
nimbleEcology
. These differ only in whether transition
and/or observation probabilities are time-dependent or time-independent,
yielding four combinations:
dHMM
: Both are time-independent.dDHMM
: State transitions are time-dependent (dynamic).
Observation probabilities are time-independent.dHMMo
: Observation probabilities are time-dependent.
State transitions are time-independent (not dynamic).dDHMMo
: Both are time-dependent.In this notation, the leading D
is for “dynamic”
(time-dependent state transitions), while the trailing “o” is for
“observations” being time-dependent.
The usage for each is similar. An example for dDHMM
is:
y[i, 1:T] ~ dDHMM(init = initial_probs[i, 1:T], obsProb = p[1:nStates, 1:nCat], transProb = Trans[i, 1:nStates, 1:nStates, 1:(T-1)], len = T)
Note the following points:
i
indexes individuals
in the model, but this is arbitrary as an example.nStates
is \(S\)
above.nCat
is \(K\)
above.init[i]
is the initial probability of being in state
i
, namely \(A_{i, 1}\)
above.obsProb[i, j]
(i.e., p[i, j]
in this
example) is probability of observing an individual who is truly in state
i
as being in observation state j
. This is
\(p_{i, j}\) above if indexing by \(t\) is not needed. If observation
probabilities are time-dependent (in dHMMo
and
dDHMMo
), then obsProb[i, j, t]
is \(p_{i, j, t}\) above.transProb[i, j, t]
(i.e., Trans[i, j, t]
in this example) is the probability that an individual who is truly in
state i
changes to state j
during the
transition from time step t
to t+1
. This is
\(\psi_{i,j,t}\) above.len
is the length of the observation record,
T
in this example.len
must match the length of the data,
y[i, 1:T]
in this example.obsProb
must be \(K \times S\) in the time-independent case
(dHMM
or dDHMM
) or \(K \times S \times T\) in the time-dependent
case (dHMMo
or dDHMMo
).transProb
must be \(S \times S\) in the time-independent case
(dHMM
or dHMMo
) or \(S \times S \times (T-1)\) in the
time-dependent case (dDHMM
or dDHMMo
). The
last dimension is one less than \(T\)
because no transition to time \(T+1\)
is needed.An occupancy model gives the probability of a series of
detection/non-detection records for a species during multiple visits to
a site. The occupancy distributions in nimbleEcology
give
the probability of the detection history for one site, so this summary
focuses on data from one site.
Define \(y_t\) to be the observation at time \(t\), with \(y_t = 1\) for a detection and \(y_t = 0\) for a non-detection. Again, we use “time” as a synonym for “sampling occasion”. Again, define the vector of observations as \(\mathbf{y} = (y_1, \ldots, y_T)\), where \(T\) is the number of sampling occasions.
Define \(\psi\) as the probability that a site is occupied. Define \(p_t\) as the probability of a detection on sampling occasion \(t\) if the site is occupied, and \(\mathbf{p} = (p_1, \ldots, p_T)\). Then the probability of the data given the parameters is: \[ P(\mathbf{y} | \psi, \mathbf{p}) = \psi \prod_{t = 1}^T p_t^{y_t} (1-p_t)^{1-y_t} + (1-\psi) I\left(\sum_{t=1}^T y_t= 0 \right) \] The indicator function usage in the last term, \(I(\cdot)\), is 1 if the given summation is 0, i.e. if no detections were made. Otherwise it is 0.
nimbleEcology
Occupancy models are available in two distributions in
nimbleEcology
. These differ only in whether detection
probability depends on time or not:
dOcc_s
: Detection probability is time-independent
(scalar).dOcc_v
: Detection probability is time-dependent
(vector).An example for dOcc_v
is:
y[i, 1:T] ~ dOcc_v(probOcc = psi, probDetect = p[i, 1:T], len = T)
Note the following points:
i
indexes site, but the variables
could be arranged in other ways.y[i, 1:T]
is the detection record.probOcc
is the probability of occupancy, \(\psi\) above.probDetect
is the vector of detection probabilities,
\(\mathbf{p}\) above. In the case of
dOcc_s
, probDetect
would be a scalar.len
is the length of the detection record.Dynamic occupancy models give the probability of detection records
from multiple seasons (primary periods) in each of which there were
multiple sampling occasions (secondary periods) at each of multiple
sites. The dynamic occupancy distribution in nimbleEcology
provides probability calculations for data from one site at a time.
We will use “year” for primary periods and “time” or “sampling occasion” as above for secondary periods. Define \(y_{r, t}\) as the observation (1 or 0) on sampling occasion \(t\) of year \(r\). Define \(\mathbf{y}_r\) as the detection history in year \(r\), i.e. \(\mathbf{y}_r = (y_{r, 1}, \ldots, y_{r, T})\) . Define \(\phi_t\) as the probability of being occupied at time \(t+1\) given the site was occupied at time \(t\), called “persistence”. Define \(\gamma_t\) as the probability of being occupied at time \(t+1\) given the site was unoccupied at time \(t\), called “colonization”. Define \(p_{r, t}\) as the detection probability on sampling occasion \(t\) of year \(r\) given the site is occupied.
The probability of all the data given parameters is: \[ P(\mathbf{y} | \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p}) = \prod_{r = 1}^R P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p}) \] Each factor \(P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})\) is calculated as: \[ P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p}) = A_{r} \prod_{t = 1}^T p_{r,t}^{y_{r,t}} (1-p_{r,t})^{1-y_{r,t}} + (1-A_{r}) I\left(\sum_{t=1}^T y_{r,t} = 0 \right) \] Here \(A_r\) is the probability that the site is occupied in year \(r\) given observations up to the previous year \(\mathbf{y}_{1:r-1}\). Otherwise, this equation is just like the occupancy model above, except there are indices for year \(r\) in many places. \(A_r\) is calculated as: \[ A_r = G_{r-1} \phi_{r-1} + (1-G_{r-1}) \gamma_{r-1} \] Here \(G_r\) is the probability that the site is occupied given the data up to time \(r\), \(\mathbf{y}_{1:r}\). This is calculated as \[ G_r = \frac{A_{r} \prod_{t = 1}^T p_{r,t}^{y_{r,t}} (1-p_{r,t})^{1-y_{r,t}}}{P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})} \]
The sequential calculation is initiated with \(A_1\), which is natural to think of as “\(\psi_1\)”, probability of occupancy in the first year. Then for year \(r\), starting with \(r = 1\), we calculate \(P(\mathbf{y}_{r} | \mathbf{y}_{1:r-1}, \mathbf{\phi}, \mathbf{\gamma}, \mathbf{p})\). If \(r < R\), we calculate \(G_r\) and then \(A_{r+1}\), leaving us ready to increment \(r\) and iterate.
nimbleEcology
Dynamic occupancy models are available in twelve parameterizations in
nimbleEcology
. These differ in whether persistence,
colonization, and/or detection probabilities are time-dependent, with a
“s” (time-independent) and “v” (time-dependent) notation similar to the
distributions above. Detection probabilities can be the same for all
seasons and sampling events (“s”), constant within each season but
different season to season (“v”), or time-dependent by sampling event
within season (“m”), in which case a matrix argument is required. The
distributions are named by dDynOcc_
followed by three
letters. Each letter indicates the typing (or dimension) of the
persistence, colonization, and detection probabilities,
respectively:
dDynOcc_s**
functions take time-independent (scalar)
persistence probabilities, while dDynOcc_v**
functions take
time-dependent (vector) persistence probabilitiesdDynOcc_*s*
functions take time-independent (scalar)
colonization probabilities, while dDynOcc_*v*
functions take
time-dependent (vector) colonization probabilitiesdDynOcc_**s
functions take time-independent (scalar)
observation probabilities, while dDynOcc_**v
functions take
observation probabilities dependent on time step (vector) and
dDynOcc_**m
functions take observation probabilities
dependent on both time step and observation event (matrix)Expanding these typing possibilities gives \(2 \times 2 \times 3 = 12\) total functions:
dDynOcc_sss
dDynOcc_svs
dDynOcc_vss
dDynOcc_vvs
dDynOcc_ssv
dDynOcc_svv
dDynOcc_vsv
dDynOcc_vvv
dDynOcc_ssm
dDynOcc_svm
dDynOcc_vsm
dDynOcc_vvm
An example for dDynOcc_svs
is:
y[i, 1:T] ~ dDynOcc_svs(init = psi1[i], probPersist = phi[i], probColonize = gamma[i, 1:T], p = p, len = T)
Note the following points:
i
indexes the individual site, but the variables could be arranged in
other ways.y[i, 1:T]
is the detection record.probPersist
is the probability of persistence, \(\phi\) above.probColonize
is the vector of detection probabilities,
\(\mathbf{\gamma}\) above. In the case
of dDynOcc_*s*
, probColonize
would be a
scalar.len
is the length of the detection record.p
here is a single constant value of observation
probability for all samples. If p
changed with season or
season and observation event, we would need to use a different function
(dDynOcc_**v
or dDynOcc_**m
).An N-mixture model gives the probability of a set of counts from
repeated visits to each of multiple sites. The N-mixture distribution in
nimbleEcology
gives probability calculations for data from
one site.
Define \(y_t\) as the number of individuals counted at the site on sampling occasion (time) \(t\). Define \(\mathbf{y} = (y_1, \ldots, y_t)\). Define \(\lambda\) as the average density of individuals, such that the true number of individuals, \(N\), follows a Poisson distribution with mean \(\lambda\). Define \(p_t\) to be the detection probability for a single individual at time \(t\), and \(\mathbf{p} = (p_1, \ldots, p_t)\).
The probability of the data given the parameters is: \[ P(\mathbf{y} | \lambda, \mathbf{p}) = \sum_{N = 1}^\infty \left[ P(N | \lambda) \prod_{t = 1}^T P(y_t | N) \right] \] where \(P(N | \lambda)\) is a Poisson probability and \(P(y_t | N)\) is a binomial probability. That is, \(y_t \sim \mbox{Binomial}(N, p_t)\), and the \(y_t\)s are independent.
In practice, the summation over \(N\) can start at a value greater than 0 and must be truncated at some value less than infinity. Two options are provided for the range of summation:
If we consider a single \(y_t\), then \(N - y_t | y_t \sim \mbox{Poisson}(\lambda (1-p_t))\) (See opening example of Royle and Dorazio, 2008). Thus, a natural upper end for the summation range of \(N\) would be \(y_t\) plus a very high quantile of The \(\mbox{Poisson}(\lambda (1-p_t))\) distribution. For a set of observations, a natural choice would be the maximum of such values across the observation times. We use the 0.99999 quantile to be conservative.
Correspondingly, the summation can begin at smallest of the 0.00001 quantiles of \(N | y_t\). If \(p_t\) is small, this can be considerably larger than the maximum value of \(y_t\), allowing more efficient computation.
nimbleEcology
Standard (binomial-Poisson) N-mixture models are available in two
distributions in nimbleEcology
. They differ in whether
probability of detection is visit-dependent (vector case, corresponding
to dNmixture_v
) or visit-independent (scalar,
dNmixture_s
).
An example is:
y[i, 1:T] ~ dNmixture_v(lambda = lambda, p = p[1:T], Nmin = Nmin, Nmax = Nmax, len = T)
i
indexes the individual site, but the variables could be arranged in
other ways.lambda
is \(\lambda\)
above.p[1:T]
is \(\mathbf{p}\) above. If \(p\) were constant across visits, we would
use dNmixture_s
and a scalar value of p
.len
is \(T\).Nmin
and Nmax
provide the lower and upper
bounds for the sum over Ns (option 1 above). If both are set to
-1
, bounds are chosen dynamically using quantiles of the
Poisson distribution (option 2 above).Three variations of the N-mixture model are also available, in which
the Poisson distribution is replaced by negative binomial, the binomial
is replaced by beta binomial, or both. These are called
dNmixture_BNB_*
, dNmixture_BBP_*
, and
dNmixture_BBNB_*
, respectively. Each has three suffixes:
_v
and _s
correspond to the cases provided
above, and _oneObs
distributions are provided for the case
where the data are scalar (i.e., only one observation at the site). No
_oneObs
observation is provided for the default
dNmixture
because
dNmixture(x[1:1], lambda, prob[1:1])
is equivalent to
dpois(x[1:1], lambda * prob[1:1])
.
These combinations lead to the following set of 11 N-mixture distributions:
dNmixture_v
dNmixture_s
dNmixture_BNB_v
dNmixture_BNB_s
dNmixture_BNB_oneObs
dNmixture_BBP_v
dNmixture_BBP_s
dNmixture_BBP_oneObs
dNmixture_BBNB_v
dNmixture_BBNB_s
dNmixture_BBNB_oneObs
If an N-mixture distribution needs to be used with AD (e.g. for HMC
or Laplace approximation), replace dNmixture
with
dNmixtureAD
. In that case, one must provide
Nmin
and Nmax
values manually; the second
(heuristic) option described above is not available.
Further details on all the distributions in
nimbleEcology
can be found on the help pages within R,
e.g. help(dNmixture)
.