MakeADFun
is the high level version. It
actually creates several internal tapes gathered in a ‘model object’
often referred to as ‘obj’.MakeTape
is the low level version. It creates
a single tape.There are things you can do with the tape that you can’t do with ‘obj’:
To understand these concepts let’s look at some examples.
A simple R function such as
is turned into a ‘tape’ using
## Object of class='Tape'
## : R^2 -> R^1
A number of methods are available for the tape.
## [1] "jacobian" "simplify" "print" "jacfun" "atomic"
## [6] "laplace" "newton" "graph" "data.frame" "node"
## [11] "par"
These methods are part of the tape object and documented under
?Tape
. We can print the tape using the print
method:
## OpName: Node: Value: Deriv: Index: Inputs:
## InvOp 0 0 NA 0
## InvOp 1 0 NA 1
## ConstOp 2 1.23 NA 2
## MulOp 3 0 NA 3 1 2
## AddOp 4 0 NA 4 0 3
## ExpOp 5 1 NA 5 4
## DepOp 6 1 NA 6 5
This is the internal tape representation of the function. At first sight it looks a bit like a data frame. The first column (‘OpName’) holds the tape operators and the second column (‘Node’) the operator index. Each operator calculates an output variable which sits in the ‘value’ column. Each value has a derivative (‘deriv’ column) which is not yet allocated (‘NA’). In addition, each value has an ‘index’. The final column holds the inputs to each node. For example, node number 3 (multiplication) has inputs ‘1’ and ‘2’ referring to the ‘index’ / ‘value’ column.
The tape can be evaluated as a normal R function. This sets the independent values (‘InvOp’) to the inputs and triggers a forward operator loop effectively updating the entire value array:
## [1] 2751.771
To see how the internal structures have been updated, we print the tape again:
## OpName: Node: Value: Deriv: Index: Inputs:
## InvOp 0 3 NA 0
## InvOp 1 4 NA 1
## ConstOp 2 1.23 NA 2
## MulOp 3 4.92 NA 3 1 2
## AddOp 4 7.92 NA 4 0 3
## ExpOp 5 2751.77 NA 5 4
## DepOp 6 2751.77 NA 6 5
Note how the function output or dependent variable (‘DepOp’) is located at the end of the value array.
Derivatives are calculated using the jacobian()
method.
This method seeds the derivative of the dependent variable to one, and
triggers a reverse operator loop effectively updating the
entire ‘deriv’ array:
## [,1] [,2]
## [1,] 2751.771 3384.678
Again, we print the tape to see the effect on the internal representation:
## OpName: Node: Value: Deriv: Index: Inputs:
## InvOp 0 3 2751.77 0
## InvOp 1 4 3384.68 1
## ConstOp 2 1.23 0 2
## MulOp 3 4.92 2751.77 3 1 2
## AddOp 4 7.92 2751.77 4 0 3
## ExpOp 5 2751.77 1 5 4
## DepOp 6 2751.77 1 6 5
Note how the derivatives are located at the beginning of the ‘deriv’ array.
The operator graph is defined by connecting operators \(i\) and \(j\) ( \(i
\rightarrow j\) ) if some output of operator \(i\) is input to operator \(j\). Its adjacency matrix is obtained using
the graph()
method:
## 7 x 7 sparse Matrix of class "ngCMatrix"
## Inv Inv Const Mul Add Exp Dep
## Inv . . . . | . .
## Inv . . . | . . .
## Const . . . | . . .
## Mul . . . . | . .
## Add . . . . . | .
## Exp . . . . . . |
## Dep . . . . . . .
It is conveniently visualized using the igraph package:
The tape \(F\) can be reused to create new tapes. Consider for example a new tape \(G\) which evaluates \(F\) twice:
Compare this with the graph of \(F\): Clearly every node (except inputs) have been doubled.
When applying the same function many times it is sometimes beneficial
to collapse its nodes into a single ‘super node’ rather than replaying
all individual operators over and over. In our example we could turn
\(F\) into such a ‘super node’ using
the atomic()
method:
Now, construct \(G\) again and see what it looks like:
Internally, these atomic operators are shared pointers to a single instance of the actual tape of \(F\). Therefore, auto-generated atomic functions generally reduce memory. But there are disadvantages, so they should normally only be used in case of memory issues.
It often happens that a tape has redundant operations. For example, consider the tape
F <- MakeTape(function(x) {
y1 <- sin(x[1] + x[2])
y2 <- sin(x[1] + x[2])
y3 <- cos(x[1] + x[2])
y1+y2
}, numeric(2))
This tape representation is not optimal because
y3
which doesn’t
affect the output.y1
and y2
are
identical, so y2
could have just copied
y1
.We visualize the graph of the inefficient tape and note that it
contains two sin
operators and one cos
operator:
Two methods are available to simplify the tape
F$simplify("eliminate")
: Eliminates operations that do
not affect the output.F$simplify("optimize")
: In addition to
eliminate
also remaps expressions that have already been
computed.By running
the tape is modified in-place, and no longer contains the
cos
operator:
The full optimization
also removes the redundant sin
expression which has
already been calculated.:
In contrast to MakeTape
, we note that
MakeADFun
automatically calls
F$simplify("optimize")
for all tapes.
We’ve seen how the jacobian()
method evaluates the
jacobian of a tape. An alternative jacfun()
exists to
return the jacobian as a new tape.
For example, consider a tape of the diff
operator:
## Object of class='Tape'
## : R^5 -> R^4
The jacobian tape is obtained by
## Object of class='Tape'
## : R^5 -> R^20
And can be evaluated as
## [,1] [,2] [,3] [,4] [,5]
## [1,] -1 1 0 0 0
## [2,] 0 -1 1 0 0
## [3,] 0 0 -1 1 0
## [4,] 0 0 0 -1 1
To get the sparse jacobian do instead:
## Object of class='Tape'
## : R^5 -> R^8
Compared to J
, the tape Js
only calculates
the non-zero elements stored as a sparse matrix.
## 4 x 5 sparse Matrix of class "dgCMatrix"
##
## [1,] -1 1 . . .
## [2,] . -1 1 . .
## [3,] . . -1 1 .
## [4,] . . . -1 1
Note that both J
and Js
can be applied to
construct new tapes.
Consider the following test function that multiplies a matrix by it self:
\[ f(x)= \begin{pmatrix} x_{1} & x_{3} \\ x_{2} & x_{4} \\ \end{pmatrix} ^2 = \begin{pmatrix} x_{1}x_{1} + x_{3}x_{2} & x_{1}x_{3} + x_{3}x_{4} \\ x_{2}x_{1} + x_{4}x_{2} & x_{2}x_{3} + x_{4}x_{4} \\ \end{pmatrix} \]
The corresponding function coded in R is
and we will examine how it is represented as a tape. First, we disable atomic matrix multiply. This gives a representation corresponding exactly to the above formula:
This is only possible to show for very small matrices. The representation grows very fast (cubic) in the dimension of the input. By default the ‘atomic’ flag is set to true.
PROS/CONS: Atomic matrix multiply is faster and takes up much less space. However, sparsity of the jacobian is lost for the atomic version (all outputs depend and all inputs). There are elements of the jacobian that will always be zero. This can only be seen from the plain representation.
TODO:
In the folowing example we construct a tape \(F\). While \(F\) is being constructed, a new tape \(G\) is created which accesses a temporary variable \(a\) in the scope of \(F\).
F <- MakeTape(function(x) {
a <- sin(x)
G <- MakeTape(function(y) {
a * y
}, numeric(1))
DG <- G$jacfun()
DG(x * x)
}, numeric(1))
Implicitly, \(a\) becomes a variable within \(G\) without us having to pass it explicitly as a function argument. This greatly simplifies things when all we care about is the derivatives of \(G\) with respect to \(y\) (not \(a\)).
One of the tape methods is called ‘laplace’. It is used to transform the tape to a new tape for which a set of variables has been integrated out by the Laplace approximation. The first argument to the method are indices determining this set of variables.
If, for example, we want to integrate \(x_2\) out of \(\exp(-x_1^2-x_2^2)\), and return the negative log of the result, we construct the tape
## Object of class='Tape'
## : R^2 -> R^1
and integrate the 2nd variable out using
## Object of class='Tape'
## : R^1 -> R^1
We finally check that the result makes sense:
## [1] 8.427635
## [1] 8.427639
An intermediate calulation of laplace
is to minimize the
negative log integrand. A tape of this intermediate calulation can be
obtained by
## Object of class='Tape'
## : R^1 -> R^1
The solution is as expected
## [1] 0
We’ll now discuss the SPA
argument to
laplace()
.
Consider a sample of 100 replicates from the double Poisson distribution with parameters \(\mu_X\) and \(\mu_N\).
rdblpois <- function(n, muX=5, muN=10) {
replicate(n, sum(rpois(rpois(1, muN), muX)))
}
set.seed(1)
x <- rdblpois(100)
We want to estimate the two unknown parameters
(muX=5; muN=10
), but the density is not available on closed
form. However, we do have access to the cumulant generating
function (CGF).
The CGF for the Poisson distribution is
and using the convolution property we can get the moment generating function of the double poisson distribution:
To approximate the (negative log) density we
laplace
with SPA=TRUE
.nldens <- function(obs, muX, muN) {
## 1
F <- MakeTape(function(s) {
K <- Kpois2(s, muX, muN) ## CGF
K <- K - s * obs ## SPA adjustment
sum(K)
}, rep(1, length(obs)))
## 2
L <- F$laplace(1:length(obs), SPA=TRUE)
## 3
L(numeric(0))
}
Note how the inner tape refers to parameters in an outer context as demonstated in a previous section. We can now use this density to estimate the parameters:
obj <- MakeADFun(function(p) nldens(x, p$muX, p$muN), list(muX=1, muN=1), silent=TRUE)
opt <- nlminb(obj$par, obj$fn, obj$gr)
sdreport(obj)
## sdreport(.) result
## Estimate Std. Error
## muX 4.703096 0.7973392
## muN 10.556870 1.8128315
## Maximum gradient component: 8.964947e-05
We note that this approach only works for non-zero observations.
However, it should be easy to modify nldens
to handle zero
observations as a special case.
RTMB tapes cannot be directly evaluated or constructed using complex
numbers. However, it is possible to use intermediate calculations with
complex numbers. The convertion function as.complex
is used
for that. The supported complex number operations are documented under
?ADcomplex
.
An example could be an R function using the complex exponential. For it to be applicable with RTMB it must take normal real input and output:
f <- function(x) {
## Get real/imag part
xreal <- x[1]
ximag <- x[2]
## Construct AD complex number
z <- as.complex(xreal) + 1i * as.complex(ximag)
## Return real numbers
Mod(exp(z))
}
We can tape it
## Object of class='Tape'
## : R^2 -> R^1
and check that it gives the correct result:
## [1] 2.718282
## [1] 2.718282
We can verify that the complex operations have been translated into real operations:
## OpName: Node: Value: Deriv: Index: Inputs:
## InvOp 0 1 NA 0
## InvOp 1 2 NA 1
## ExpOp 2 2.71828 NA 2 0
## CosOp 3 -0.416147 NA 3 1
## MulOp 4 -1.1312 NA 4 2 3
## SinOp 5 0.909297 NA 5 1
## MulOp 6 2.47173 NA 6 2 5
## MulOp 7 1.27962 NA 7 4 4
## MulOp 8 6.10943 NA 8 6 6
## AddOp 9 7.38906 NA 9 7 8
## SqrtOp 10 2.71828 NA 10 9
## DepOp 11 2.71828 NA 11 10