The package greybox contains functions for model building,
which is currently done via the model selection and combinations based
on information criteria. The resulting model can then be used in
analysis and forecasting.
There are several groups of functions in the package.
Regression model functions
alm - Augmented Linear (regression) Model that implements likelihood
estimation of parameters for Normal, Laplace, Asymmetric Laplace,
Logistic, Student’s t, S, Generalised Normal, Folded Normal, Log Normal,
Box-Cox Normal, Logit Normal, Inverse Gaussian, Gamma, Poisson, Negative
Binomial, Cumulative Logistic and Cumulative Normal distributions. In a
sense this is similar to glm() function, but with a
different set of distributions and with a focus on forecasting.
sm - Scale Model which constructs a regression for scale parameter
of a distribution (e.g. for variance in normal distribution). Works like
a method applied to already existing model (lm / alm).
stepwise - function implements stepwise IC based on partial
correlations for the location model.
lmCombine - function combines the regression models from the
provided data, based on IC weights and returns the combined alm
object.
Exogenous
variables transformation functions
xregExpander - function produces lags and leads of the provided
data.
xregTransformer - function produces non-linear transformations of
the provided data (logs, inverse etc).
xregMultiplier - function produces cross-products of the variables
in the matrix. Could be useful when exploring interaction effects of
dummy variables.
The data analysis functions
cramer - calculates Cramer’s V for two categorical variables. Plus
tests the significance of such association.
mcor - function returns the coefficients of multiple correlation
between the variables. This is useful when measuring association between
categorical and numerical variables.
association (aka ‘assoc()’) - function returns matrix of measures of
association, choosing between cramer(), mcor() and cor() depending on
the types of variables.
determination (and the method ‘determ()’) - function returns the
vector of coefficients of determination (R^2) for the provided data.
This is useful for the diagnostics of multicollinearity.
tableplot - plots the graph for two categorical variables.
spread - plots the matrix of scatter / boxplot / tableplot diagrams
- depending on the type of the provided variables.
graphmaker - plots the original series, the fitted values and the
forecasts.
Models evaluation functions
ro - rolling origin evaluation (see the vignette).
rmcb - Regression for Multiple Comparison with the Best. This is a
simplified version of the nemenyi / MCB test, relying on regression on
ranks of methods.
measures - the error measures for the provided forecasts. Includes
MPE, MAPE, MASE, sMAE, sMSE, RelMAE, RelRMSE, MIS, sMIS, RelMIS, pinball
and others.
Distribution functions:
qlaplace, dlaplace, rlaplace, plaplace - functions for Laplace
distribution.
qalaplace, dalaplace, ralaplace, palaplace - functions for
Asymmetric Laplace distribution.
qs, ds, rs, ps - functions for S distribution.
qgnorm, dgnorm, rgnorm, pgnorm - functions for Generalised normal
distribution.
qfnorm, dfnorm, rfnorm, pfnorm - functions for folded normal
distribution.
qtplnorm, dtplnorm, rtplnorm, ptplnorm - functions for three
parameter log normal distribution.
qbcnorm, dbcnorm, rbcnorm, pbcnorm - functions for Box-Cox normal
distribution (discussed in Box & Cox, 1964).
qlogitnorm, dlogitnorm, rlogitnorm, plogitnorm - functions for
Logit-normal distribution.
Methods
for the introduced and some existing classes:
temporaldummy - the method that creates a matrix of dummy variables
for an object based on the selected frequency. e.g. this can create week
of year based on the provided zoo object.
outlierdummy - the method that creates a matrix of dummy variables
based on the residuals of an object, selected confidence level and type
of residuals.
pointLik - point likelihood method for the time series models.
pAIC, pAICc, pBIC, pBICc - respective point values for the
information criteria, based on pointLik.
coefbootstrap - the method that returns bootstrapped coefficients of
the model. Useful for the calculation of covariance matrix and
confidence intervals for parameters.
summary - returns summary of the regression (either selected or
combined).
vcov - covariance matrix for combined models. This is an approximate
thing. The real one is quite messy and not yet available.
confint - confidence intervals for combined models.
predict, forecast - point and interval forecasts for the response
variable. forecast method relies on the parameter h (the forecast
horizon), while predict is focused on the newdata. See vignettes for the
details.
nparam - returns the number of estimated parameters in the model
(including location, scale, shift).
nvariate - returns the number of dimensions of the response
variable.
actuals - returns the response variable from the model.
plot - plots several graphs for the analysis of the residuals (see
documentation for more details).
AICc - AICc for regression with normally distributed residuals.
BICc - BICc for regression with normally distributed residuals.
is.greybox, is.alm etc. - functions to check if the object was
generated by respective functions.
Experimental functions:
lmDynamic - linear regression with time varying parameters based on
pAIC.
Installation
The stable version of the package is available on CRAN, so you can
install it by running: > install.packages(“greybox”)
A recent, development version, is available via github and can be
installed using “remotes” in R. First make sure that you have remotes:
> if (!require(“remotes”)){install.packages(“remotes”)}
and after that run: >
remotes::install_github(“config-i1/greybox”)