This subsection outlines the steps for constructing, running
simulations, and estimating a univariate Hawkes model. To begin, create
an hspec
object, which defines the Hawkes model. The S4
class hspec
contains slots for the model parameters:
mu
, alpha
, beta
,
dimens
, rmark
, and impact
.
In a univariate model, the basic parameters of the
model—mu
, alpha
, and beta
—can be
given as numeric values. If numeric values are provided, they will be
converted to matrices. Below is an example of a univariate Hawkes model
without a mark.
mu1 <- 0.3; alpha1 <- 1.2; beta1 <- 1.5
hspec1 <- new("hspec", mu = mu1, alpha = alpha1, beta = beta1)
show(hspec1)
#> An object of class "hspec" of 1-dimensional Hawkes process
#>
#> Slot mu:
#> [,1]
#> [1,] 0.3
#>
#> Slot alpha:
#> [,1]
#> [1,] 1.2
#>
#> Slot beta:
#> [,1]
#> [1,] 1.5
The function hsim
implements simulation where the input
arguments are hspec
, size
, and the initial
values of the intensity component process,
lambda_component0
, and the initial values of the Hawkes
processes, N0
. More precisely, the intensity process of the
basic univariate Hawkes model is represented by
\[ \lambda(t) = \mu + \int_{-\infty}^t \alpha e^{-\beta (t-s)} d N(s) = \mu + \lambda_c(0) e^{-\beta t} + \int_0^t \alpha e^{-\beta (t-s)} d N(s) \]
where the lambda_component0
denotes
\[ \lambda_c(0) = \int_{-\infty}^0 \alpha e^{\beta s} d N(s). \]
If lambda_component0
is not provided, the internally
determined initial values for the intensity process are used. If
size
is sufficiently large, the exact value of
lambda_component0
may not be important. The default initial
value of the counting process, N0
, is zero.
set.seed(1107)
res1 <- hsim(hspec1, size = 1000)
summary(res1)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#> arrival N1 mark lambda1
#> [1,] 0.00000 0 0 0.90000
#> [2,] 0.97794 1 1 0.43838
#> [3,] 1.09001 2 1 1.43128
#> [4,] 1.28999 3 1 2.02711
#> [5,] 1.53225 4 1 2.33527
#> [6,] 1.65001 5 1 3.01139
#> [7,] 2.51807 6 1 1.36377
#> [8,] 2.81710 7 1 1.74553
#> [9,] 2.87547 8 1 2.72378
#> [10,] 3.16415 9 1 2.65016
#> [11,] 3.51378 10 1 2.40131
#> [12,] 4.22355 11 1 1.43843
#> [13,] 16.96752 12 1 0.30000
#> [14,] 17.71654 13 1 0.69015
#> [15,] 19.10293 14 1 0.49874
#> [16,] 24.06354 15 1 0.30082
#> [17,] 24.09256 16 1 1.44967
#> [18,] 28.40173 17 1 0.30366
#> [19,] 28.53743 18 1 1.28198
#> [20,] 28.56702 19 1 2.38725
#> ... with 980 more rows
#> -------------------------------------------------------
The results of hsim
is an S3 class hreal
,
which consists of hspec
, inter_arrival
,
arrival
, type
, mark
,
N
, Nc
, lambda
,
lambda_component
, rambda
,
rambda_component
.
hspec
is the model specification.
inter_arrival
is the inter-arrival time of every
event.
arrival
is the cumulative sum of
inter_arrival
.
type
is the type of events, i.e., \(i\) for \(N_i\), and is used for a multivariate
model.
mark
is a numeric vector that represents additional
information for the event.
lambda
represents \(\lambda\), which is the left continuous and
right limit version.
The right continuous version of intensity is
rambda
.
lambda_component
represents \(\lambda_{ij}\), and
rambda_component
is the right continuous version.
inter_arrival
, type
, mark
,
N
, and Nc
start at zero. Using the
summary()
function, one can print the first 20 elements of
arrival
, N
, and lambda
. The
print()
function can also be used.
By definition, we have
lambda == mu + lambda_component
.
# first and third columns are the same
cbind(res1$lambda[1:5], res1$lambda_component[1:5], mu1 + res1$lambda_component[1:5])
#> [,1] [,2] [,3]
#> [1,] 0.900000 0.600000 0.900000
#> [2,] 0.438383 0.138383 0.438383
#> [3,] 1.431282 1.131282 1.431282
#> [4,] 2.027111 1.727111 2.027111
#> [5,] 2.335269 2.035269 2.335269
For all rows except the first, rambda
equals
lambda + alpha
in this model.
# second and third columns are the same
cbind(res1$lambda[1:5], res1$rambda[1:5], res1$lambda[1:5] + alpha1)
#> [,1] [,2] [,3]
#> [1,] 0.900000 0.900000 2.100000
#> [2,] 0.438383 1.638383 1.638383
#> [3,] 1.431282 2.631282 2.631282
#> [4,] 2.027111 3.227111 3.227111
#> [5,] 2.335269 3.535269 3.535269
Additionally, verify that the exponential decay is accurately represented in the model.
# By definition, the following two are equal:
res1$lambda[2:6]
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391
mu1 + (res1$rambda[1:5] - mu1) * exp(-beta1 * res1$inter_arrival[2:6])
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391
The log-likelihood function is calculated using the
logLik
method. In this context, the inter-arrival times and
hspec
are provided as inputs to the function.
logLik(hspec1, inter_arrival = res1$inter_arrival)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood
#> -214.2385
The likelihood estimation is performed using the hfit
function. The specification of the initial parameter values,
hspec0
, is required. Note that only
inter_arrival
is needed for this univariate model. For more
accurate simulation, it is recommended to specify lambda0
,
the initial value of the lambda component. If lambda0
is
not provided, the function uses internally determined initial values. By
default, the BFGS method is employed for numerical optimization.
# initial value for numerical optimization
mu0 <- 0.5; alpha0 <- 1.0; beta0 <- 1.8
hspec0 <- new("hspec", mu = mu0, alpha = alpha0, beta = beta0)
# the intial values are provided through hspec
mle <- hfit(hspec0, inter_arrival = res1$inter_arrival)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 24 iterations
#> Return code 0: successful convergence
#> Log-Likelihood: -213.4658
#> 3 free parameters
#> Estimates:
#> Estimate Std. error t value Pr(> t)
#> mu1 0.33641 0.03475 9.682 <2e-16 ***
#> alpha1 1.16654 0.09608 12.141 <2e-16 ***
#> beta1 1.52270 0.12468 12.213 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
The intensity process of a basic bivariate Hawkes model is defined by
\[ \lambda_1(t) = \mu_1 + \int_{-\infty}^t \alpha_{11} e^{-\beta_{11}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{12} e^{-\beta_{12}(t-s)} d N_2(s), \]
\[ \lambda_2(t) = \mu_2 + \int_{-\infty}^t \alpha_{21} e^{-\beta_{21}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{22} e^{-\beta_{22}(t-s)} d N_2(s). \]
In a bivariate model, the parameters within the slots of
hspec
are matrices. Specifically, mu
is a
2-by-1 matrix, while alpha
and beta
are 2-by-2
matrices.
\[ \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{bmatrix}, \quad \beta = \begin{bmatrix} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{bmatrix} \]
rmark
is a random number generating function for marks
and is not used in non-mark models. lambda_component0
is a
2-by-2 matrix that represents the initial values of
lambda_component
, which includes the set of values
lambda11
, lambda12
, lambda21
, and
lambda22
. The intensity processes are represented by
\[ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t), \]
\[ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t). \]
The terms \(\lambda_{ij}\) are
referred to as lambda components, and lambda0
represents
$_{ij}(0). The parameter
lambda_component0` can be omitted
in this model, in which case internally determined initial values will
be used.
mu2 <- matrix(c(0.2), nrow = 2)
alpha2 <- matrix(c(0.5, 0.9, 0.9, 0.5), nrow = 2, byrow = TRUE)
beta2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE)
hspec2 <- new("hspec", mu=mu2, alpha=alpha2, beta=beta2)
print(hspec2)
#> An object of class "hspec" of 2-dimensional Hawkes process
#>
#> Slot mu:
#> [,1]
#> [1,] 0.2
#> [2,] 0.2
#>
#> Slot alpha:
#> [,1] [,2]
#> [1,] 0.5 0.9
#> [2,] 0.9 0.5
#>
#> Slot beta:
#> [,1] [,2]
#> [1,] 2.25 2.25
#> [2,] 2.25 2.25
To perform a simulation, use the hsim
function.
set.seed(1107)
res2 <- hsim(hspec2, size=1000)
summary(res2)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#> arrival N1 N2 mark lambda1 lambda2
#> [1,] 0.00000 0 0 0 0.52941 0.52941
#> [2,] 0.57028 1 0 1 0.29130 0.29130
#> [3,] 1.66175 1 1 1 0.25073 0.28505
#> [4,] 2.17979 1 2 1 0.49638 0.38238
#> [5,] 2.47685 1 3 1 0.81319 0.54975
#> [6,] 2.64001 2 3 1 1.24825 0.78866
#> [7,] 2.70249 3 3 1 1.54519 1.49341
#> [8,] 2.94547 4 3 1 1.26810 1.46968
#> [9,] 3.39313 4 4 1 0.77271 0.99242
#> [10,] 3.52533 4 5 1 1.29379 1.15989
#> [11,] 3.56971 5 5 1 2.00432 1.52115
#> [12,] 3.70761 5 6 1 1.88965 1.82866
#> [13,] 4.30122 5 7 1 0.88106 0.75983
#> [14,] 4.34337 6 7 1 1.63800 1.16393
#> [15,] 4.40222 7 7 1 1.89764 1.83275
#> [16,] 4.58943 8 7 1 1.64219 1.86211
#> [17,] 5.14665 9 7 1 0.75437 0.93131
#> [18,] 5.18186 9 8 1 1.17407 1.70707
#> [19,] 5.36167 9 9 1 1.45050 1.53925
#> [20,] 5.89118 10 9 1 0.85331 0.75875
#> ... with 980 more rows
#> -------------------------------------------------------
In multivariate models, type
is crucial as it represents
the type of event.
In multivariate models, the column names of N
are
N1
, N2
, N3
, and so on.
Similarly, the column names of lambda
are
lambda1
, lambda2
, lambda3
, and so
on.
res2$lambda[1:3, ]
#> lambda1 lambda2
#> [1,] 0.5294118 0.5294118
#> [2,] 0.2913028 0.2913028
#> [3,] 0.2507301 0.2850475
The column names of lambda_component
are
lambda_component11
, lambda_component12
,
lambda_component13
, and so on.
res2$lambda_component[1:3, ]
#> lambda11 lambda12 lambda21 lambda22
#> [1,] 0.11764706 0.211764706 0.21176471 0.117647059
#> [2,] 0.03260813 0.058694641 0.05869464 0.032608134
#> [3,] 0.04569443 0.005035631 0.08224997 0.002797573
By definition, the following two expressions are equivalent:
mu2[1] + rowSums(res2$lambda_component[1:5, c("lambda11", "lambda12")])
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889
res2$lambda[1:5, "lambda1"]
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889
From the results, we obtain vectors of realized
inter_arrival
and type
. A bivariate model
requires both inter_arrival
and type
for
estimation.
The log-likelihood is computed using the logLik
function.
logLik(hspec2, inter_arrival = inter_arrival2, type = type2)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood
#> -974.2809
Maximum log-likelihood estimation is performed using the
hfit
function. In this process, the parameter values in
hspec0
, such as mu
, alpha
, and
beta
, serve as starting points for the numerical
optimization. For illustration purposes, we set
hspec0 <- hspec2
. Since the true parameter values are
unknown in practical applications, these initial guesses are used. The
realized inter_arrival
and type
data are
utilized for estimation.
hspec0 <- hspec2
mle <- hfit(hspec0, inter_arrival = inter_arrival2, type = type2)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 36 iterations
#> Return code 0: successful convergence
#> Log-Likelihood: -970.1408
#> 4 free parameters
#> Estimates:
#> Estimate Std. error t value Pr(> t)
#> mu1 0.19095 0.01636 11.671 < 2e-16 ***
#> alpha1.1 0.48217 0.07405 6.511 7.45e-11 ***
#> alpha1.2 0.98625 0.09495 10.387 < 2e-16 ***
#> beta1.1 2.07987 0.16952 12.269 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
miscTools::stdEr(mle)
#> mu1 alpha1.1 alpha1.2 beta1.1
#> 0.01636127 0.07405118 0.09495461 0.16952428
Also see the extended vignette on GitHub.