Title: | Multivariate (Dynamic) Generalized Additive Models |
---|---|
Description: | Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2022) <doi:10.1111/2041-210X.13974>. |
Authors: | Nicholas J Clark [aut, cre] , Scott Pease [ctb] (broom enhancements), Matthijs Hollanders [ctb] (ggplot visualizations) |
Maintainer: | Nicholas J Clark <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.4 |
Built: | 2024-11-22 10:25:23 UTC |
Source: | https://github.com/nicholasjclark/mvgam |
Calculate randomized quantile residuals for mvgam objects
add_residuals(object, ...) ## S3 method for class 'mvgam' add_residuals(object, ...)
add_residuals(object, ...) ## S3 method for class 'mvgam' add_residuals(object, ...)
object |
|
... |
unused |
For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution
A list object of class mvgam
with residuals included in the 'resids'
slot
A dataset containing timeseries of Amblyomma americanum and Ixodes scapularis nymph abundances at NEON sites
all_neon_tick_data
all_neon_tick_data
A tibble/dataframe containing covariate information alongside the main fields of:
Year of sampling
Epidemiological week of sampling
NEON plot ID for survey location
NEON site ID for survey location
Counts of A. americanum nymphs
Counts of I. scapularis nymphs
https://www.neonscience.org/data
Add fits and residuals to the data, implementing the generic augment
from
the package broom.
## S3 method for class 'mvgam' augment(x, robust = FALSE, probs = c(0.025, 0.975), ...)
## S3 method for class 'mvgam' augment(x, robust = FALSE, probs = c(0.025, 0.975), ...)
x |
An object of class |
robust |
If |
probs |
The percentiles to be computed by the quantile function. |
... |
Unused, included for generic consistency only. |
A list
is returned if class(x$obs_data) == 'list'
, otherwise a tibble
is
returned, but the contents of either object is the same.
The arguments robust
and probs
are applied to both the fit and
residuals calls (see fitted.mvgam()
and residuals.mvgam()
for details).
A list
or tibble
(see details) combining:
The data supplied to mvgam()
.
The outcome variable, named as .observed
.
The fitted backcasts, along with their variability and credible bounds.
The residuals, along with their variability and credible bounds.
## Not run: set.seed(0) dat <- sim_mvgam(T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6) mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2) augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) ## End(Not run)
## Not run: set.seed(0) dat <- sim_mvgam(T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6) mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2) augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) ## End(Not run)
Generate Stan code and data objects for mvgam models
code(object) ## S3 method for class 'mvgam_prefit' stancode(object, ...) ## S3 method for class 'mvgam' stancode(object, ...) ## S3 method for class 'mvgam_prefit' standata(object, ...)
code(object) ## S3 method for class 'mvgam_prefit' stancode(object, ...) ## S3 method for class 'mvgam' stancode(object, ...) ## S3 method for class 'mvgam_prefit' standata(object, ...)
object |
An object of class |
... |
ignored |
Either a character string containing the fully commented Stan code to fit a mvgam model or a named list containing the data objects needed to fit the model in Stan.
simdat <- sim_mvgam() mod <- mvgam(y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE) # View Stan model code stancode(mod) # View Stan model data sdata <- standata(mod) str(sdata)
simdat <- sim_mvgam() mod <- mvgam(y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE) # View Stan model code stancode(mod) # View Stan model data sdata <- standata(mod) str(sdata)
Display conditional effects of one or more numeric and/or categorical
predictors in mvgam
models, including two-way interaction effects.
## S3 method for class 'mvgam' conditional_effects( x, effects = NULL, type = "response", points = FALSE, rug = FALSE, ... ) ## S3 method for class 'mvgam_conditional_effects' plot(x, plot = TRUE, ask = FALSE, ...) ## S3 method for class 'mvgam_conditional_effects' print(x, ...)
## S3 method for class 'mvgam' conditional_effects( x, effects = NULL, type = "response", points = FALSE, rug = FALSE, ... ) ## S3 method for class 'mvgam_conditional_effects' plot(x, plot = TRUE, ask = FALSE, ...) ## S3 method for class 'mvgam_conditional_effects' print(x, ...)
x |
Object of class |
effects |
An optional character vector naming effects (main effects or
interactions) for which to compute conditional plots. Interactions are
specified by a |
type |
|
points |
|
rug |
|
... |
other arguments to pass to |
plot |
Logical; indicates if plots should be
plotted directly in the active graphic device.
Defaults to |
ask |
|
This function acts as a wrapper to the more
flexible plot_predictions
.
When creating conditional_effects
for a particular predictor
(or interaction of two predictors), one has to choose the values of all
other predictors to condition on. By default, the mean is used for
continuous variables and the reference category is used for factors. Use
plot_predictions
to change these
and create more bespoke conditional effects plots.
conditional_effects
returns an object of class
mvgam_conditional_effects
which is a
named list with one slot per effect containing a ggplot
object,
which can be further customized using the ggplot2 package.
The corresponding plot
method will draw these plots in the active graphic device
Nicholas J Clark
# Simulate some data simdat <- sim_mvgam(family = poisson(), seasonality = 'hierarchical') # Fit a model mod <- mvgam(y ~ s(season, by = series, k = 5) + year:series, family = poisson(), data = simdat$data_train, chains = 2) # Plot all main effects on the response scale conditional_effects(mod) # Change the prediction interval to 70% using plot_predictions() argument # 'conf_level' conditional_effects(mod, conf_level = 0.7) # Plot all main effects on the link scale conditional_effects(mod, type = 'link') # Works the same for smooth terms, including smooth interactions set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), data = dat, family = gaussian(), chains = 2) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') ## Not run: # ggplot objects can be modified and combined with the help of many # additional packages. Here is an example using the patchwork package # Simulate some nonlinear data dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian()) # Extract the list of ggplot conditional_effect plots m <- plot(conditional_effects(mod), plot = FALSE) # Add custom labels and arrange plots together using patchwork::wrap_plots() library(patchwork) library(ggplot2) wrap_plots(m[[1]] + labs(title = 's(x1, bs = "moi")'), m[[2]] + labs(title = 'te(x0, x2)')) ## End(Not run)
# Simulate some data simdat <- sim_mvgam(family = poisson(), seasonality = 'hierarchical') # Fit a model mod <- mvgam(y ~ s(season, by = series, k = 5) + year:series, family = poisson(), data = simdat$data_train, chains = 2) # Plot all main effects on the response scale conditional_effects(mod) # Change the prediction interval to 70% using plot_predictions() argument # 'conf_level' conditional_effects(mod, conf_level = 0.7) # Plot all main effects on the link scale conditional_effects(mod, type = 'link') # Works the same for smooth terms, including smooth interactions set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), data = dat, family = gaussian(), chains = 2) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') ## Not run: # ggplot objects can be modified and combined with the help of many # additional packages. Here is an example using the patchwork package # Simulate some nonlinear data dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian()) # Extract the list of ggplot conditional_effect plots m <- plot(conditional_effects(mod), plot = FALSE) # Add custom labels and arrange plots together using patchwork::wrap_plots() library(patchwork) library(ggplot2) wrap_plots(m[[1]] + labs(title = 's(x1, bs = "moi")'), m[[2]] + labs(title = 'te(x0, x2)')) ## End(Not run)
Set up time-varying (dynamic) coefficients for use in mvgam models. Currently, only low-rank Gaussian Process smooths are available for estimating the dynamics of the time-varying coefficient.
dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)
dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)
variable |
The variable that the dynamic smooth will be a function of |
k |
Optional number of basis functions for computing approximate GPs. If missing,
|
rho |
Either a positive numeric stating the length scale to be used for approximating the
squared exponential Gaussian Process smooth (see |
stationary |
Logical. If |
scale |
Logical; If |
mvgam
currently sets up dynamic coefficients as low-rank
squared exponential Gaussian Process smooths via
the call s(time, by = variable, bs = "gp", m = c(2, rho, 2))
. These smooths, if specified with
reasonable values for the length scale parameter, will give more realistic out of sample forecasts
than standard splines such as thin plate or cubic. But the user must set the
value for rho
, as there is currently no support for estimating this value in mgcv
.
This may not be too big of a problem, as estimating latent length scales is often difficult anyway. The
rho
parameter should be thought of as a prior on the smoothness of the latent dynamic coefficient
function (where higher values of rho
lead to smoother functions with more temporal covariance structure.
Values of k
are
set automatically to ensure enough basis functions are used to approximate the expected
wiggliness of the underlying dynamic function (k
will increase as rho
decreases)
a list
object for internal usage in 'mvgam'
Nicholas J Clark
# Simulate a time-varying coefficient # (as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 # A function to simulate from a squared exponential Gaussian Process sim_gp = function(N, c, alpha, rho){ Sigma <- alpha ^ 2 * exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + diag(1e-9, N) c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) } beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) plot(beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred') # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot(out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred') # Gather into a data.frame and fit a dynamic coefficient model data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190,] data_test <- data[191:200,] # Fit a model using the dynamic function mod <- mvgam(out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train, chains = 2) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2)
# Simulate a time-varying coefficient # (as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 # A function to simulate from a squared exponential Gaussian Process sim_gp = function(N, c, alpha, rho){ Sigma <- alpha ^ 2 * exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + diag(1e-9, N) c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) } beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) plot(beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred') # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot(out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred') # Gather into a data.frame and fit a dynamic coefficient model data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190,] data_test <- data[191:200,] # Fit a model using the dynamic function mod <- mvgam(out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train, chains = 2) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2)
Generate evenly weighted ensemble forecast distributions from mvgam_forecast
objects
ensemble(object, ...) ## S3 method for class 'mvgam_forecast' ensemble(object, ..., ndraws = 5000)
ensemble(object, ...) ## S3 method for class 'mvgam_forecast' ensemble(object, ..., ndraws = 5000)
object |
|
... |
More |
ndraws |
Positive integer specifying the number of draws to use from each
forecast distribution for creating the ensemble. If some of the ensemble members have
fewer draws than |
It is widely recognised in the forecasting literature that combining forecasts
from different models often results in improved forecast accuracy. The simplest way to create
an ensemble is to use evenly weighted combinations of forecasts from the different models.
This is straightforward to do in a Bayesian setting with mvgam
as the posterior MCMC draws
contained in each mvgam_forecast
object will already implicitly capture correlations among
the temporal posterior predictions.
An object of class mvgam_forecast
containing the ensemble predictions. This
object can be readily used with the supplied S3 functions plot
and score
Nicholas J Clark
plot.mvgam_forecast
, score.mvgam_forecast
# Simulate some series and fit a few competing dynamic models set.seed(1) simdat <- sim_mvgam(n_series = 1, prop_trend = 0.6, mu = 1) plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test) m1 <- mvgam(y ~ 1, trend_formula = ~ time + s(season, bs = 'cc', k = 9), trend_model = AR(p = 1), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test) m2 <- mvgam(y ~ time, trend_model = RW(), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test) # Calculate forecast distributions for each model fc1 <- forecast(m1) fc2 <- forecast(m2) # Generate the ensemble forecast ensemble_fc <- ensemble(fc1, fc2) # Plot forecasts plot(fc1) plot(fc2) plot(ensemble_fc) # Score forecasts score(fc1) score(fc2) score(ensemble_fc)
# Simulate some series and fit a few competing dynamic models set.seed(1) simdat <- sim_mvgam(n_series = 1, prop_trend = 0.6, mu = 1) plot_mvgam_series(data = simdat$data_train, newdata = simdat$data_test) m1 <- mvgam(y ~ 1, trend_formula = ~ time + s(season, bs = 'cc', k = 9), trend_model = AR(p = 1), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test) m2 <- mvgam(y ~ time, trend_model = RW(), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test) # Calculate forecast distributions for each model fc1 <- forecast(m1) fc2 <- forecast(m2) # Generate the ensemble forecast ensemble_fc <- ensemble(fc1, fc2) # Plot forecasts plot(fc1) plot(fc2) plot(ensemble_fc) # Score forecasts score(fc1) score(fc2) score(ensemble_fc)
Evaluate forecasts from fitted mvgam objects
eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = "drps", log = FALSE, weights )
eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = "drps", log = FALSE, weights )
object |
|
n_samples |
|
eval_timepoint |
|
fc_horizon |
|
n_cores |
Deprecated. Parallel processing is no longer supported |
score |
|
log |
|
weights |
optional |
n_evaluations |
|
evaluation_seq |
Optional |
model1 |
|
model2 |
|
eval_mvgam
may be useful when both repeated fitting of a model using update.mvgam
for exact leave-future-out cross-validation and approximate
leave-future-out cross-validation using lfo_cv
are impractical. The function generates a set of samples representing fixed parameters estimated from the full
mvgam
model and latent trend states at a given point in time. The trends are rolled forward
a total of fc_horizon
timesteps according to their estimated state space dynamics to
generate an 'out-of-sample' forecast that is evaluated against the true observations in the horizon window.
This function therefore simulates a situation where the model's parameters had already been estimated but
we have only observed data up to the evaluation timepoint and would like to generate forecasts from the
latent trends that have been observed up to that timepoint. Evaluation involves calculating an
appropriate Rank Probability Score and a binary indicator
for whether or not the true value lies within the forecast's 90% prediction interval
roll_eval_mvgam
sets up a sequence of evaluation timepoints along a rolling window and iteratively
calls eval_mvgam
to evaluate 'out-of-sample' forecasts.
Evaluation involves calculating the Rank Probability Scores and a binary indicator
for whether or not the true value lies within the forecast's 90% prediction interval
compare_mvgams
automates the evaluation to compare two fitted models using rolling window forecast evaluation and
provides a series of summary plots to facilitate model selection. It is essentially a wrapper for
roll_eval_mvgam
For eval_mvgam
, a list
object containing information on specific evaluations for each series
(if using drps
or crps
as the score) or a vector of scores when using variogram
.
For roll_eval_mvgam
, a list
object containing information on specific evaluations for each series as well as
a total evaluation summary (taken by summing the forecast score for each series at each evaluation and averaging
the coverages at each evaluation)
For compare_mvgams
, a series of plots comparing forecast Rank Probability Scores for each competing
model. A lower score is preferred. Note however that it is possible to select a model that ultimately
would perform poorly in true out-of-sample forecasting. For example if a wiggly smooth function of 'year'
is included in the model then this function will be learned prior to evaluating rolling window forecasts,
and the model could generate very tight predictions as a result. But when forecasting ahead to timepoints
that the model has not seen (i.e. next year), the smooth function will end up extrapolating, sometimes
in very strange and unexpected ways. It is therefore recommended to only use smooth functions for
covariates that are adequately measured in the data (i.e. 'seasonality', for example) to reduce possible
extrapolation of smooths and let the latent trends in the mvgam
model capture any
temporal dependencies in the data. These trends are time series models and so will provide much more
stable forecasts
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc'), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams(mod_ar2, mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives # better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc'), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams(mod_ar2, mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives # better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)
Compute forecast error variance decompositions from
mvgam
models with Vector Autoregressive dynamics
fevd(object, ...) ## S3 method for class 'mvgam' fevd(object, h = 1, ...)
fevd(object, ...) ## S3 method for class 'mvgam' fevd(object, h = 1, ...)
object |
|
... |
ignored |
h |
Positive |
A forecast error variance decomposition is useful for quantifying the amount
of information each series that in a Vector Autoregression contributes to the forecast
distributions of the other series in the autoregression. This function calculates
the forecast error variance decomposition using the
orthogonalised impulse response coefficient matrices , which can be used to
quantify the contribution of series
to the
h-step forecast error variance of series
:
If the orthogonalised impulse reponses
are divided by the variance of the forecast error
,
this yields an interpretable percentage representing how much of the
forecast error variance for
can be explained by an exogenous shock to
.
An object of class mvgam_fevd
containing the posterior forecast error
variance decompositions. This
object can be used with the supplied S3 functions plot
Nicholas J Clark
Lütkepohl, H (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds)
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds)
This method extracts posterior estimates of the fitted values (i.e. the actual predictions, included estimates for any trend states, that were obtained when fitting the model). It also includes an option for obtaining summaries of the computed draws.
## S3 method for class 'mvgam' fitted( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
## S3 method for class 'mvgam' fitted( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
An object of class |
process_error |
Logical. If |
scale |
Either |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
Further arguments passed to |
This method gives the actual fitted values from the model (i.e. what you
will see if you generate hindcasts from the fitted model using hindcast.mvgam
with type = 'expected'
). These
predictions can be overly precise if a flexible dynamic trend component was included
in the model. This is in contrast to the set of predict functions (i.e.
posterior_epred.mvgam
or predict.mvgam
), which will assume
any dynamic trend component has reached stationarity when returning hypothetical predictions
An array
of predicted mean response values.
If summary = FALSE
the output resembles those of
posterior_epred.mvgam
and predict.mvgam
.
If summary = TRUE
the output is an n_observations
x E
matrix. The number of summary statistics E
is equal to 2 +
length(probs)
: The Estimate
column contains point estimates (either
mean or median depending on argument robust
), while the
Est.Error
column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust
). The remaining columns starting with Q
contain
quantile estimates as specified via argument probs
.
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train, chains = 2, burnin = 300, samples = 300) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) ## End(Not run)
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train, chains = 2, burnin = 300, samples = 300) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) ## End(Not run)
mvgam
objectExtract or compute hindcasts and forecasts for a fitted mvgam
object
forecast(object, ...) ## S3 method for class 'mvgam' forecast(object, newdata, data_test, n_cores = 1, type = "response", ...)
forecast(object, ...) ## S3 method for class 'mvgam' forecast(object, newdata, data_test, n_cores = 1, type = "response", ...)
object |
|
... |
Ignored |
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
n_cores |
Deprecated. Parallel processing is no longer supported |
type |
When this has the value |
Posterior predictions are drawn from the fitted mvgam
and used to simulate a forecast distribution
An object of class mvgam_forecast
containing hindcast and forecast distributions.
See mvgam_forecast-class
for details.
simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast(mod, newdata = simdat$data_test) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast(mod, newdata = simdat$data_test, type = 'expected') plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast(mod, newdata = simdat$data_test, type = 'trend') plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3)
simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast(mod, newdata = simdat$data_test) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast(mod, newdata = simdat$data_test, type = 'expected') plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast(mod, newdata = simdat$data_test, type = 'trend') plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3)
Extract formulae from mvgam objects
## S3 method for class 'mvgam' formula(x, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' formula(x, trend_effects = FALSE, ...)
## S3 method for class 'mvgam' formula(x, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' formula(x, trend_effects = FALSE, ...)
x |
|
trend_effects |
|
... |
Ignored |
A formula
object
Nicholas J Clark
This function lists the parameters that can have their prior distributions
changed for a given mvgam
model, as well listing their default distributions
get_mvgam_priors( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = "None", family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... )
get_mvgam_priors( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = "None", family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... )
formula |
A |
trend_formula |
An optional |
factor_formula |
Can be supplied instead |
knots |
An optional |
trend_knots |
As for |
trend_model |
For all trend types apart from |
family |
Default is |
data |
A
Note however that there are special cases where these identifiers are not needed. For
example, models with hierarchical temporal correlation processes (e.g. |
unit |
The unquoted name of the variable that represents the unit of analysis in |
species |
The unquoted name of the |
use_lv |
|
n_lv |
|
trend_map |
Optional |
... |
Not currently used |
Users can supply a model formula, prior to fitting the model, so that default priors can be inspected and
altered. To make alterations, change the contents of the prior
column and supplying this
data.frame
to the mvgam
function using the argument priors
. If using Stan
as the backend,
users can also modify the parameter bounds by modifying the new_lowerbound
and/or new_upperbound
columns.
This will be necessary if using restrictive distributions on some parameters, such as a Beta distribution
for the trend sd parameters for example (Beta only has support on (0,1)
), so the upperbound cannot
be above 1
. Another option is to make use of the prior modification functions in brms
(i.e. prior
) to change prior distributions and bounds (just use the name of the parameter that
you'd like to change as the class
argument; see examples below)
either a data.frame
containing the prior definitions (if any suitable
priors can be altered by the user) or NULL
, indicating that no priors in the model
can be modified through the mvgam
interface
Only the prior
, new_lowerbound
and/or new_upperbound
columns of the output
should be altered when defining the user-defined priors for the mvgam
model. Use only if you are
familiar with the underlying probabilistic programming language. There are no sanity checks done to
ensure that the code is legal (i.e. to check that lower bounds are smaller than upper bounds, for
example)
Nicholas J Clark
# Simulate three integer-valued time series library(mvgam) dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not always necessary, # but this can be useful for testing whether your updated priors are written correctly) mod_default <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), run_model = FALSE) # Inspect the model file with default mvgam priors code(mod_default) # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2)) test_priors # Make a few changes; first, change the population mean for the series-level # random intercepts test_priors$prior[2] <- 'mu_raw ~ normal(0.2, 0.5);' # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- 'ar2 ~ normal(0, 0.25);' # Check that the changes are made to the model file without any warnings by # setting 'run_model = FALSE' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE) code(mod) # No warnings, the model is ready for fitting now in the usual way with the addition # of the 'priors' argument # The same can be done using 'brms' functions; here we will also change the ar1 prior # and put some bounds on the ar coefficients to enforce stationarity; we set the # prior using the 'class' argument in all brms prior functions brmsprior <- c(prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1)) brmsprior mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = brmsprior, run_model = FALSE) code(mod) # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- 'ar2_bananas ~ normal(0, 0.25);' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE) code(mod) # Example of changing parametric (fixed effect) priors simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors(y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = AR()) # Change priors for the intercept and fake covariate effects priors$prior[1] <- '(Intercept) ~ normal(0, 1);' priors$prior[2] <- 'cov ~ normal(0, 0.1);' mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = priors, run_model = FALSE) code(mod2) # Likewise using 'brms' utilities (note that you can use # Intercept rather than `(Intercept)`) to change priors on the intercept brmsprior <- c(prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept)) brmsprior mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = brmsprior, run_model = FALSE) code(mod2) # The "class = 'b'" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam(y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = 'b'), data = dat, family = gaussian(), run_model = FALSE) code(mod)
# Simulate three integer-valued time series library(mvgam) dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not always necessary, # but this can be useful for testing whether your updated priors are written correctly) mod_default <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), run_model = FALSE) # Inspect the model file with default mvgam priors code(mod_default) # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2)) test_priors # Make a few changes; first, change the population mean for the series-level # random intercepts test_priors$prior[2] <- 'mu_raw ~ normal(0.2, 0.5);' # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- 'ar2 ~ normal(0, 0.25);' # Check that the changes are made to the model file without any warnings by # setting 'run_model = FALSE' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE) code(mod) # No warnings, the model is ready for fitting now in the usual way with the addition # of the 'priors' argument # The same can be done using 'brms' functions; here we will also change the ar1 prior # and put some bounds on the ar coefficients to enforce stationarity; we set the # prior using the 'class' argument in all brms prior functions brmsprior <- c(prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1)) brmsprior mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = brmsprior, run_model = FALSE) code(mod) # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- 'ar2_bananas ~ normal(0, 0.25);' mod <- mvgam(y ~ s(series, bs = 're') + s(season, bs = 'cc') - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE) code(mod) # Example of changing parametric (fixed effect) priors simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors(y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = AR()) # Change priors for the intercept and fake covariate effects priors$prior[1] <- '(Intercept) ~ normal(0, 1);' priors$prior[2] <- 'cov ~ normal(0, 0.1);' mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = priors, run_model = FALSE) code(mod2) # Likewise using 'brms' utilities (note that you can use # Intercept rather than `(Intercept)`) to change priors on the intercept brmsprior <- c(prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept)) brmsprior mod2 <- mvgam(y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = brmsprior, run_model = FALSE) code(mod2) # The "class = 'b'" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam(y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = 'b'), data = dat, family = gaussian(), run_model = FALSE) code(mod)
Set up low-rank approximate Gaussian Process trend models using Hilbert basis expansions in mvgam. This function does not evaluate its arguments – it exists purely to help set up a model with particular GP trend models.
GP(...)
GP(...)
... |
unused |
A GP trend is estimated for each series using
Hilbert space approximate Gaussian Processes.
In mvgam
, latent squared exponential GP trends are approximated using by
default 20
basis functions and using a multiplicative factor of c = 5/4
,
which saves computational costs compared to fitting full GPs while adequately estimating
GP alpha
and rho
parameters.
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
These evaluation and plotting functions exist to allow some popular gratia
methods to work with mvgam
models
drawDotmvgam( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) eval_smoothDothilbertDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmodDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmoiDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... )
drawDotmvgam( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) eval_smoothDothilbertDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmodDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmoiDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... )
object |
a fitted mvgam, the result of a call to |
trend_effects |
logical specifying whether smooth terms from the |
data |
a data frame of covariate values at which to evaluate the model's smooth functions. |
select |
character, logical, or numeric; which smooths to plot. If
|
parametric |
logical; plot parametric terms also? Note that |
terms |
character; which model parametric terms should be drawn? The
Default of |
residuals |
currently ignored for |
scales |
character; should all univariate smooths be plotted with the
same y-axis scale? If Currently does not affect the y-axis scale of plots of the parametric terms. |
ci_level |
numeric between 0 and 1; the coverage of credible interval. |
n |
numeric; the number of points over the range of the covariate at which to evaluate the smooth. |
n_3d , n_4d
|
numeric; the number of points over the range of last
covariate in a 3D or 4D smooth. The default is |
unconditional |
ignored for |
overall_uncertainty |
ignored for |
constant |
numeric; a constant to add to the estimated values of the
smooth. |
fun |
function; a function that will be applied to the estimated values
and confidence interval before plotting. Can be a function or the name of a
function. Function |
dist |
numeric; if greater than 0, this is used to determine when
a location is too far from data to be plotted when plotting 2-D smooths.
The data are scaled into the unit square before deciding what to exclude,
and |
rug |
logical; draw a rug plot at the bottom of each plot for 1-D smooths or plot locations of data for higher dimensions. |
contour |
logical; should contours be draw on the plot using
|
grouped_by |
logical; should factor by smooths be drawn as one panel
per level of the factor ( |
ci_alpha |
numeric; alpha transparency for confidence or simultaneous interval. |
ci_col |
colour specification for the confidence/credible intervals band. Affects the fill of the interval. |
smooth_col |
colour specification for the smooth line. |
resid_col |
colour specification for residual points. Ignored. |
contour_col |
colour specification for contour lines. |
n_contour |
numeric; the number of contour bins. Will result in
|
partial_match |
logical; should smooths be selected by partial matches
with |
discrete_colour |
a suitable colour scale to be used when plotting discrete variables. |
discrete_fill |
a suitable fill scale to be used when plotting discrete variables. |
continuous_colour |
a suitable colour scale to be used when plotting continuous variables. |
continuous_fill |
a suitable fill scale to be used when plotting continuous variables. |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
angle |
numeric; the angle at which the x axis tick labels are to be
drawn passed to the |
ncol , nrow
|
numeric; the numbers of rows and columns over which to spread the plots |
guides |
character; one of |
widths , heights
|
The relative widths and heights of each column and
row in the grid. Will get repeated to match the dimensions of the grid. If
there is more than 1 plot and |
crs |
the coordinate reference system (CRS) to use for the plot. All
data will be projected into this CRS. See |
default_crs |
the coordinate reference system (CRS) to use for the
non-sf layers in the plot. If left at the default |
lims_method |
character; affects how the axis limits are determined. See
|
wrap |
logical; wrap plots as a patchwork? If |
envir |
an environment to look up the data within. |
... |
additional arguments passed to other methods. |
smooth |
a smooth object of class |
model |
a fitted |
These methods allow mvgam
models to be Enhanced if users have the gratia
package installed, making available the popular draw()
function to plot partial effects
of mvgam
smooth functions using ggplot2::ggplot()
utilities
Nicholas J Clark
## Not run: # Fit a simple GAM and draw partial effects of smooths using gratia set.seed(0) library(ggplot2); theme_set(theme_bw()) library(gratia) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian()) draw(mod) ## End(Not run)
## Not run: # Fit a simple GAM and draw partial effects of smooths using gratia set.seed(0) library(ggplot2); theme_set(theme_bw()) library(gratia) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian()) draw(mod) ## End(Not run)
mvgam
objectExtract hindcasts for a fitted mvgam
object
hindcast(object, ...) ## S3 method for class 'mvgam' hindcast(object, type = "response", ...)
hindcast(object, ...) ## S3 method for class 'mvgam' hindcast(object, type = "response", ...)
object |
|
... |
Ignored |
type |
When this has the value |
Posterior retrodictions are drawn from the fitted mvgam
and
organized into a convenient format
An object of class mvgam_forecast
containing hindcast distributions.
See mvgam_forecast-class
for details.
simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3)
simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3)
mvgam
objectsIndex mvgam
objects
## S3 method for class 'mvgam' variables(x, ...)
## S3 method for class 'mvgam' variables(x, ...)
x |
|
... |
Arguments passed to individual methods (if applicable). |
a list
object of the variables that can be extracted, along
with their aliases
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, burnin = 300, samples = 300, chains = 2) variables(mod) ## End(Not run)
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, burnin = 300, samples = 300, chains = 2) variables(mod) ## End(Not run)
Compute Generalized or Orthogonalized Impulse Response Functions (IRFs) from
mvgam
models with Vector Autoregressive dynamics
irf(object, ...) ## S3 method for class 'mvgam' irf(object, h = 1, cumulative = FALSE, orthogonal = FALSE, ...)
irf(object, ...) ## S3 method for class 'mvgam' irf(object, h = 1, cumulative = FALSE, orthogonal = FALSE, ...)
object |
|
... |
ignored |
h |
Positive |
cumulative |
|
orthogonal |
|
Generalized or Orthogonalized Impulse Response Functions can be computed
using the posterior estimates of Vector Autoregressive parameters. This function
generates a positive "shock" for a target process at time t = 0
and then
calculates how each of the remaining processes in the latent VAR are expected
to respond over the forecast horizon h
. The function computes IRFs for all
processes in the object and returns them in an array that can be plotted using
the S3 plot
function. To inspect community-level metrics of stability using latent
VAR processes, you can use the related stability
function.
An object of class mvgam_irf
containing the posterior IRFs. This
object can be used with the supplied S3 functions plot
Nicholas J Clark
PH Pesaran & Shin Yongcheol (1998). Generalized impulse response analysis in linear multivariate models. Economics Letters 58: 17–29.
VAR
, plot.mvgam_irf
, stability
,
fevd
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate Generalized IRFs for each series irfs <- irf(mod, h = 12, cumulative = FALSE) # Plot them plot(irfs, series = 1) plot(irfs, series = 2) plot(irfs, series = 3)
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate Generalized IRFs for each series irfs <- irf(mod, h = 12, cumulative = FALSE) # Plot them plot(irfs, series = 1) plot(irfs, series = 2) plot(irfs, series = 3)
This function sets up a Joint Species Distribution Model whereby the residual associations among species can be modelled in a reduced-rank format using a set of latent factors. The factor specification is extremely flexible, allowing users to include spatial, temporal or any other type of predictor effects to more efficiently capture unmodelled residual associations, while the observation model can also be highly flexible (including all smooth, GP and other effects that mvgam can handle)
jsdgam( formula, factor_formula = ~-1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, ... )
jsdgam( formula, factor_formula = ~-1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, ... )
formula |
A |
factor_formula |
A |
knots |
An optional |
factor_knots |
An optional |
data |
A |
newdata |
Optional |
family |
Default is |
unit |
The unquoted name of the variable that represents the unit of analysis in |
species |
The unquoted name of the |
share_obs_params |
|
priors |
An optional |
n_lv |
|
backend |
Character string naming the package to use as the backend for fitting
the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally
for the current R session via the |
algorithm |
Character string naming the estimation approach to use.
Options are |
control |
A named |
chains |
|
burnin |
|
samples |
|
thin |
Thinning interval for monitors. Ignored
if |
parallel |
|
threads |
|
silent |
Verbosity level between |
run_model |
|
return_model_data |
|
... |
Other arguments to pass to mvgam |
Joint Species Distribution Models allow for responses of multiple species to be
learned hierarchically, whereby responses to environmental variables in formula
can be partially
pooled and any latent, unmodelled residual associations can also be learned. In mvgam, both of
these effects can be modelled with the full power of latent factor Hierarchical GAMs, providing unmatched
flexibility to model full communities of species. When calling jsdgam, an initial State-Space model using
trend = 'None'
is set up and then modified to include the latent factors and their linear predictors.
Consequently, you can inspect priors for these models using get_mvgam_priors by supplying the relevant
formula
, factor_formula
, data
and family
arguments and keeping the default trend = 'None'
.
In a JSDGAM, the expectation of response is modelled with
where is a known link function,
is a design matrix of linear predictors (with associated
coefficients),
are
-variate latent factors
(
<<
) and
are species-specific loadings on the latent factors, respectively. The design matrix
and
coefficients are constructed and modelled using
formula
and can contain
any of mvgam
's predictor effects, including random intercepts and slopes, multidimensional penalized
smooths, GP effects etc... The factor loadings are constrained for identifiability but can
be used to reconstruct an estimate of the species' residual variance-covariance matrix
using
(see the example below and
residual_cor()
for details).
The latent factors are further modelled using:
where the second design matrix and associated
coefficients are
constructed and modelled using
factor_formula
. Again, the effects that make up this linear
predictor can contain any of mvgam
's allowed predictor effects, providing enormous flexibility for
modelling species' communities.
A list
object of class mvgam
containing model output,
the text representation of the model file,
the mgcv model output (for easily generating simulations at
unsampled covariate values), Dunn-Smyth residuals for each species and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods
Nicholas J Clark
Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series.
Methods in Ecology and Evolution. 14:3, 771-784.
David I Warton, F Guillaume Blanchet, Robert B O’Hara, Otso Ovaskainen, Sara Taskinen, Steven C
Walker & Francis KC Hui (2015). So many variables: joint modeling in community ecology.
Trends in Ecology & Evolution 30:12, 766-779.
# Simulate latent count data for 500 spatial locations and 10 species set.seed(0) N_points <- 500 N_species <- 10 # Species-level intercepts (on the log scale) alphas <- runif(N_species, 2, 2.25) # Simulate a covariate and species-level responses to it temperature <- rnorm(N_points) betas <- runif(N_species, -0.5, 0.5) # Simulate points uniformly over a space lon <- runif(N_points, min = 150, max = 155) lat <- runif(N_points, min = -20, max = -19) # Set up spatial basis functions as a tensor product of lat and lon sm <- mgcv::smoothCon(mgcv::te(lon, lat, k = 5), data = data.frame(lon, lat), knots = NULL)[[1]] # The design matrix for this smooth is in the 'X' slot des_mat <- sm$X dim(des_mat) # Function to generate a random covariance matrix where all variables # have unit variance (i.e. diagonals are all 1) random_Sigma = function(N){ L_Omega <- matrix(0, N, N); L_Omega[1, 1] <- 1; for (i in 2 : N) { bound <- 1; for (j in 1 : (i - 1)) { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); bound <- bound - L_Omega[i, j] ^ 2; } L_Omega[i, i] <- sqrt(bound); } Sigma <- L_Omega %*% t(L_Omega); return(Sigma) } # Simulate a variance-covariance matrix for the correlations among # basis coefficients Sigma <- random_Sigma(N = NCOL(des_mat)) # Now simulate the species-level basis coefficients hierarchically, where # spatial basis function correlations are a convex sum of a base correlation # matrix and a species-level correlation matrix basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) for(t in 1:N_species){ corOmega <- (cov2cor(Sigma) * 0.7) + (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) } # Simulate the latent spatial processes st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ data.frame(lat = lat, lon = lon, species = paste0('species_', t), temperature = temperature, process = alphas[t] + betas[t] * temperature + des_mat %*% basis_coefs[t,]) })) # Now take noisy observations at some of the points (60) obs_points <- sample(1:N_points, size = 60, replace = FALSE) obs_points <- data.frame(lat = lat[obs_points], lon = lon[obs_points], site = 1:60) # Keep only the process data at these points st_process %>% dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>% # now take noisy Poisson observations of the process dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>% dplyr::mutate(species = factor(species, levels = paste0('species_', 1:N_species))) %>% dplyr::group_by(lat, lon) -> dat # View the count distributions for each species library(ggplot2) ggplot(dat, aes(x = count)) + geom_histogram() + facet_wrap(~ species, scales = 'free') ggplot(dat, aes(x = lat, y = lon, col = log(count + 1))) + geom_point(size = 2.25) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic() # Inspect default priors for a joint species model with three spatial factors priors <- get_mvgam_priors(formula = count ~ # Environmental model includes random slopes for # a linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lat, lon, k = 6, by = trend) - 1, n_lv = 3, # The data and grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson()) head(priors) # Fit a JSDM that estimates hierarchical temperature responses # and that uses three latent spatial factors mod <- jsdgam(formula = count ~ # Environmental model includes random slopes for a # linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lat, lon, k = 6, by = trend) - 1, n_lv = 3, # Change default priors for fixed random effect variances and # factor P marginal deviations to standard normal priors = c(prior(std_normal(), class = sigma_raw), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend1`), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend2`), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend3`)), # The data and the grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson(), chains = 2, silent = 2) # Plot species-level intercept estimates plot_predictions(mod, condition = 'species', type = 'link') # Plot species' hierarchical responses to temperature plot_predictions(mod, condition = c('temperature', 'species', 'species'), type = 'link') # Plot posterior median estimates of the latent spatial factors plot(mod, type = 'smooths', trend_effects = TRUE) # Or using gratia, if you have it installed if(requireNamespace('gratia', quietly = TRUE)){ gratia::draw(mod, trend_effects = TRUE) } # Calculate residual spatial correlations post_cors <- residual_cor(mod) names(post_cors) # Look at lower and upper credible interval estimates for # some of the estimated correlations post_cors$cor[1:5, 1:5] post_cors$cor_upper[1:5, 1:5] post_cors$cor_lower[1:5, 1:5] # A quick and dirty plot of the posterior median correlations image(post_cors$cor) # Posterior predictive checks and ELPD-LOO can ascertain model fit pp_check(mod, type = "pit_ecdf_grouped", group = "species", ndraws = 100) loo(mod) # Forecast log(counts) for entire region (site value doesn't matter as long # as each spatial location has a different and unique site identifier); # note this calculation takes a few minutes because of the need to calculate # draws from the stochastic latent factors newdata <- st_process %>% dplyr::mutate(species = factor(species, levels = paste0('species_', 1:N_species))) %>% dplyr::group_by(lat, lon) %>% dplyr::mutate(site = dplyr::cur_group_id()) %>% dplyr::ungroup() preds <- predict(mod, newdata = newdata) # Plot the median log(count) predictions on a grid newdata$log_count <- preds[,1] ggplot(newdata, aes(x = lat, y = lon, col = log_count)) + geom_point(size = 1.5) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic()
# Simulate latent count data for 500 spatial locations and 10 species set.seed(0) N_points <- 500 N_species <- 10 # Species-level intercepts (on the log scale) alphas <- runif(N_species, 2, 2.25) # Simulate a covariate and species-level responses to it temperature <- rnorm(N_points) betas <- runif(N_species, -0.5, 0.5) # Simulate points uniformly over a space lon <- runif(N_points, min = 150, max = 155) lat <- runif(N_points, min = -20, max = -19) # Set up spatial basis functions as a tensor product of lat and lon sm <- mgcv::smoothCon(mgcv::te(lon, lat, k = 5), data = data.frame(lon, lat), knots = NULL)[[1]] # The design matrix for this smooth is in the 'X' slot des_mat <- sm$X dim(des_mat) # Function to generate a random covariance matrix where all variables # have unit variance (i.e. diagonals are all 1) random_Sigma = function(N){ L_Omega <- matrix(0, N, N); L_Omega[1, 1] <- 1; for (i in 2 : N) { bound <- 1; for (j in 1 : (i - 1)) { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); bound <- bound - L_Omega[i, j] ^ 2; } L_Omega[i, i] <- sqrt(bound); } Sigma <- L_Omega %*% t(L_Omega); return(Sigma) } # Simulate a variance-covariance matrix for the correlations among # basis coefficients Sigma <- random_Sigma(N = NCOL(des_mat)) # Now simulate the species-level basis coefficients hierarchically, where # spatial basis function correlations are a convex sum of a base correlation # matrix and a species-level correlation matrix basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) for(t in 1:N_species){ corOmega <- (cov2cor(Sigma) * 0.7) + (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) } # Simulate the latent spatial processes st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ data.frame(lat = lat, lon = lon, species = paste0('species_', t), temperature = temperature, process = alphas[t] + betas[t] * temperature + des_mat %*% basis_coefs[t,]) })) # Now take noisy observations at some of the points (60) obs_points <- sample(1:N_points, size = 60, replace = FALSE) obs_points <- data.frame(lat = lat[obs_points], lon = lon[obs_points], site = 1:60) # Keep only the process data at these points st_process %>% dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>% # now take noisy Poisson observations of the process dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>% dplyr::mutate(species = factor(species, levels = paste0('species_', 1:N_species))) %>% dplyr::group_by(lat, lon) -> dat # View the count distributions for each species library(ggplot2) ggplot(dat, aes(x = count)) + geom_histogram() + facet_wrap(~ species, scales = 'free') ggplot(dat, aes(x = lat, y = lon, col = log(count + 1))) + geom_point(size = 2.25) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic() # Inspect default priors for a joint species model with three spatial factors priors <- get_mvgam_priors(formula = count ~ # Environmental model includes random slopes for # a linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lat, lon, k = 6, by = trend) - 1, n_lv = 3, # The data and grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson()) head(priors) # Fit a JSDM that estimates hierarchical temperature responses # and that uses three latent spatial factors mod <- jsdgam(formula = count ~ # Environmental model includes random slopes for a # linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lat, lon, k = 6, by = trend) - 1, n_lv = 3, # Change default priors for fixed random effect variances and # factor P marginal deviations to standard normal priors = c(prior(std_normal(), class = sigma_raw), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend1`), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend2`), prior(std_normal(), class = `alpha_gp_trend(lat, lon):trendtrend3`)), # The data and the grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson(), chains = 2, silent = 2) # Plot species-level intercept estimates plot_predictions(mod, condition = 'species', type = 'link') # Plot species' hierarchical responses to temperature plot_predictions(mod, condition = c('temperature', 'species', 'species'), type = 'link') # Plot posterior median estimates of the latent spatial factors plot(mod, type = 'smooths', trend_effects = TRUE) # Or using gratia, if you have it installed if(requireNamespace('gratia', quietly = TRUE)){ gratia::draw(mod, trend_effects = TRUE) } # Calculate residual spatial correlations post_cors <- residual_cor(mod) names(post_cors) # Look at lower and upper credible interval estimates for # some of the estimated correlations post_cors$cor[1:5, 1:5] post_cors$cor_upper[1:5, 1:5] post_cors$cor_lower[1:5, 1:5] # A quick and dirty plot of the posterior median correlations image(post_cors$cor) # Posterior predictive checks and ELPD-LOO can ascertain model fit pp_check(mod, type = "pit_ecdf_grouped", group = "species", ndraws = 100) loo(mod) # Forecast log(counts) for entire region (site value doesn't matter as long # as each spatial location has a different and unique site identifier); # note this calculation takes a few minutes because of the need to calculate # draws from the stochastic latent factors newdata <- st_process %>% dplyr::mutate(species = factor(species, levels = paste0('species_', 1:N_species))) %>% dplyr::group_by(lat, lon) %>% dplyr::mutate(site = dplyr::cur_group_id()) %>% dplyr::ungroup() preds <- predict(mod, newdata = newdata) # Plot the median log(count) predictions on a grid newdata$log_count <- preds[,1] ggplot(newdata, aes(x = lat, y = lon, col = log_count)) + geom_point(size = 1.5) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic()
mvgam
objectsApproximate leave-future-out cross-validation of fitted mvgam
objects
lfo_cv(object, ...) ## S3 method for class 'mvgam' lfo_cv( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... )
lfo_cv(object, ...) ## S3 method for class 'mvgam' lfo_cv( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... )
object |
|
... |
Ignored |
data |
A |
min_t |
Integer specifying the minimum training time required before making predictions
from the data. Default is either the |
fc_horizon |
Integer specifying the number of time steps ahead for evaluating forecasts |
pareto_k_threshold |
Proportion specifying the threshold over which the Pareto shape parameter
is considered unstable, triggering a model refit. Default is |
silent |
Verbosity level between |
Approximate leave-future-out cross-validation uses an expanding training window scheme
to evaluate a model on its forecasting ability. The steps used in this function mirror those laid out
in the lfo vignette from the loo
package,
written by Paul Bürkner, Jonah Gabry, Aki Vehtari. First, we refit the model using the first min_t
observations to perform a single exact fc_horizon
-ahead forecast step. This forecast is evaluated against
the min_t + fc_horizon
out of sample observations using the Expected Log Predictive Density (ELPD).
Next, we approximate each successive round of
expanding window forecasts by moving forward one step at a time for i in 1:N_evaluations
and re-weighting
draws from the model's posterior predictive distribution using Pareto Smoothed
Importance Sampling (PSIS). In each iteration i
, PSIS weights are obtained for the next observation
that would have been included in the model if we had re-fit (i.e. the last observation that would have
been in the training data, or min_t + i
). If these importance ratios are stable, we consider the
approximation adequate and use the re-weighted posterior's forecast for evaluating the next holdout
set of testing observations ((min_t + i + 1):(min_t + i + fc_horizon)
). At some point the
importance ratio variability will become too large and importance sampling will fail. This is
indicated by the estimated shape parameter k
of the generalized Pareto distribution
crossing a certain threshold pareto_k_threshold
. Only then do we refit the model using
all of the observations up to the time of the failure. We then restart the process and iterate forward
until the next refit is triggered (Bürkner et al. 2020).
A list
of class mvgam_lfo
containing the approximate ELPD scores,
the Pareto-k shape values and 'the specified pareto_k_threshold
Nicholas J Clark
Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models Journal of Statistical Computation and Simulation. 90:14, 2499-2523.
forecast
, score
, compare_mvgams
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, burnin = 300, samples = 300, chains = 2, silent = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, burnin = 300, samples = 300, chains = 2, silent = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3, silent = 2) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3, silent = 2) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, burnin = 300, samples = 300, chains = 2, silent = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, burnin = 300, samples = 300, chains = 2, silent = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3, silent = 2) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3, silent = 2) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)
mvgam
objectsCompute pointwise Log-Likelihoods from fitted mvgam
objects
## S3 method for class 'mvgam' logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)
## S3 method for class 'mvgam' logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)
object |
|
linpreds |
Optional |
newdata |
Optional |
family_pars |
Optional |
include_forecast |
Logical. If |
... |
Ignored |
A matrix
of dimension n_samples x n_observations
containing the pointwise
log-likelihood draws for all observations in newdata
. If no newdata
is supplied,
log-likelihood draws are returned for all observations that were originally fed to
the model (training observations and, if supplied to the
original model via the newdata
argument in mvgam
,
testing observations)
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, burnin = 300, samples = 300, chains = 2) # Extract logLikelihood values lls <- logLik(mod) str(lls) ## End(Not run)
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, burnin = 300, samples = 300, chains = 2) # Extract logLikelihood values lls <- logLik(mod) str(lls) ## End(Not run)
mvgam
modelsExtract the LOOIC (leave-one-out information criterion) using
loo::loo()
## S3 method for class 'mvgam' loo(x, incl_dynamics = TRUE, ...) ## S3 method for class 'mvgam' loo_compare(x, ..., model_names = NULL, incl_dynamics = TRUE)
## S3 method for class 'mvgam' loo(x, incl_dynamics = TRUE, ...) ## S3 method for class 'mvgam' loo_compare(x, ..., model_names = NULL, incl_dynamics = TRUE)
x |
Object of class |
incl_dynamics |
Logical; indicates if any latent dynamic structures that
were included in the model should be considered when calculating in-sample
log-likelihoods. Defaults to |
... |
More |
model_names |
If |
When comparing two (or more) fitted mvgam
models, we can estimate the
difference in their in-sample predictive accuracies using the Expcted Log Predictive
Density (ELPD). This metric can be approximated using Pareto Smoothed Importance Sampling, which
is a method to re-weight posterior draws to approximate what predictions the models might have
made for a given datapoint had that datapoint not been included in the original model fit (i.e.
if we were to run a leave-one-out cross-validation and then made a prediction for the held-out
datapoint). See details from loo::loo()
and loo::loo_compare()
for further information
on how this importance sampling works.
There are two fundamentally different ways to calculate ELPD from mvgam
models that included
dynamic latent processes (i.e. "trend_models"). The first is to use the predictions that were
generated when estimating these latent processes by setting incl_dynamics = TRUE
. This works
in the same way that setting incl_autocor = TRUE
in brms::prepare_predictions()
. But it may
also be desirable to compare predictions by considering that the dynamic processes are nuisance
parameters that we'd wish to account for when making inferences about other processes in the
model (i.e. the linear predictor effects). Setting incl_dynamics = FALSE
will accomplish
this by ignoring the dynamic processes when making predictions. This option matches up with
what mvgam
's prediction functions return (i.e. predict.mvgam
, ppc
,
pp_check.mvgam
, posterior_epred.mvgam
) and will be far less forgiving
of models that may be overfitting the training data due to highly flexible dynamic processes
(such as Random Walks, for example). However setting incl_dynamics = FALSE
will often result
in less stable Pareto k diagnostics for models with dynamic trends, making ELPD comparisons
difficult and unstable. It is therefore recommended to generally stick with
incl_dynamics = TRUE
when comparing models based on in-sample fits, and then to perhaps use
forecast evaluations for further scrutiny of models (see for example forecast.mvgam
,
score.mvgam_forecast
and lfo_cv
)
for loo.mvgam
, an object of class psis_loo
(see loo::loo()
for details). For loo_compare.mvgam
, an object of class compare.loo
(
loo::loo_compare()
for details)
# Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(111) simdat <- sim_mvgam(seasonality = 'hierarchical', trend_model = AR(), family = gaussian()) # Fit a model with shared seasonality mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian(), chains = 2, silent = 2) # Inspect the model and calculate LOO conditional_effects(mod1) mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) # Now fit a model with hierarchical seasonality mod2 <- update(mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), chains = 2, silent = 2) conditional_effects(mod2) loo(mod2) # Now add AR1 dynamic errors to mod2 mod3 <- update(mod2, trend_model = AR(), chains = 2, silent = 2) conditional_effects(mod3) plot(mod3, type = 'trend') loo(mod3) # Compare models using LOO loo_compare(mod1, mod2, mod3) options(mc.cores = mc.cores.def) # Compare forecast abilities using an expanding training window and # forecasting ahead 1 timepoint from each window; the first window by includes # the first 92 timepoints (of the 100 that were simulated) max(mod2$obs_data$time) lfo_mod2 <- lfo_cv(mod2, min_t = 92) lfo_mod3 <- lfo_cv(mod3, min_t = 92) # Take the difference in forecast ELPDs; a model with higher ELPD is preferred, # so negative values here indicate that mod3 gave better forecasts for a particular # out of sample timepoint plot(y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints, pch = 16, ylab = 'ELPD_mod2 - ELPD_mod3', xlab = 'Evaluation timepoint') abline(h = 0, lty = 'dashed')
# Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(111) simdat <- sim_mvgam(seasonality = 'hierarchical', trend_model = AR(), family = gaussian()) # Fit a model with shared seasonality mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian(), chains = 2, silent = 2) # Inspect the model and calculate LOO conditional_effects(mod1) mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) # Now fit a model with hierarchical seasonality mod2 <- update(mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), chains = 2, silent = 2) conditional_effects(mod2) loo(mod2) # Now add AR1 dynamic errors to mod2 mod3 <- update(mod2, trend_model = AR(), chains = 2, silent = 2) conditional_effects(mod3) plot(mod3, type = 'trend') loo(mod3) # Compare models using LOO loo_compare(mod1, mod2, mod3) options(mc.cores = mc.cores.def) # Compare forecast abilities using an expanding training window and # forecasting ahead 1 timepoint from each window; the first window by includes # the first 92 timepoints (of the 100 that were simulated) max(mod2$obs_data$time) lfo_mod2 <- lfo_cv(mod2, min_t = 92) lfo_mod3 <- lfo_cv(mod3, min_t = 92) # Take the difference in forecast ELPDs; a model with higher ELPD is preferred, # so negative values here indicate that mod3 gave better forecasts for a particular # out of sample timepoint plot(y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints, pch = 16, ylab = 'ELPD_mod2 - ELPD_mod3', xlab = 'Evaluation timepoint') abline(h = 0, lty = 'dashed')
This function uses samples of latent trends for each series from a fitted mvgam model to calculates correlations among series' trends
lv_correlations(object)
lv_correlations(object)
object |
|
A list
object containing the mean posterior correlations
and the full array of posterior correlations
## Not run: simdat <- sim_mvgam() mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, burnin = 300, samples = 300, chains = 2) lvcors <- lv_correlations(mod) names(lvcors) lapply(lvcors, class) ## End(Not run)
## Not run: simdat <- sim_mvgam() mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, burnin = 300, samples = 300, chains = 2) lvcors <- lv_correlations(mod) names(lvcors) lapply(lvcors, class) ## End(Not run)
Convenient way to call MCMC plotting functions implemented in the bayesplot package
## S3 method for class 'mvgam' mcmc_plot( object, type = "intervals", variable = NULL, regex = FALSE, use_alias = TRUE, ... )
## S3 method for class 'mvgam' mcmc_plot( object, type = "intervals", variable = NULL, regex = FALSE, use_alias = TRUE, ... )
object |
An R object typically of class |
type |
The type of the plot.
Supported types are (as names) |
variable |
Names of the variables (parameters) to plot, as given by a
character vector or a regular expression (if |
regex |
Logical; Indicates whether |
use_alias |
Logical. If more informative names for parameters are available
(i.e. for beta coefficients |
... |
Additional arguments passed to the plotting functions.
See |
A ggplot
object
that can be further customized using the ggplot2 package.
mvgam_draws
for an overview of some of the shortcut strings
that can be used for argument variable
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') ## End(Not run)
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') ## End(Not run)
Extract model.frame from a fitted mvgam object
## S3 method for class 'mvgam' model.frame(formula, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' model.frame(formula, trend_effects = FALSE, ...)
## S3 method for class 'mvgam' model.frame(formula, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' model.frame(formula, trend_effects = FALSE, ...)
formula |
|
trend_effects |
|
... |
Ignored |
A matrix
containing the fitted model frame
Nicholas J Clark
Uses constructors from package splines2 to build monotonically increasing or decreasing splines. Details also in Wang & Yan (2021).
## S3 method for class 'moi.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'mod.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'moi.smooth' Predict.matrix(object, data) ## S3 method for class 'mod.smooth' Predict.matrix(object, data)
## S3 method for class 'moi.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'mod.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'moi.smooth' Predict.matrix(object, data) ## S3 method for class 'mod.smooth' Predict.matrix(object, data)
object |
A smooth specification object, usually generated by a term
|
data |
a list containing just the data (including any |
knots |
a list containing any knots supplied for basis setup — in same order and with same names as |
The constructor is not normally called directly,
but is rather used internally by mvgam. If they are not supplied then the
knots of the spline are placed evenly throughout the covariate values to
which the term refers: For example, if fitting 101 data with an 11
knot spline of x then there would be a knot at every 10th (ordered) x value.
The spline is an implementation of the closed-form I-spline basis based
on the recursion formula given by Ramsay (1988), in which the basis coefficients
must be constrained to either be non-negative (for monotonically increasing
functions) or non-positive (monotonically decreasing)
Take note that when using either monotonic basis, the number of basis functions
k
must be supplied as an even integer due to the manner in
which monotonic basis functions are constructed
An object of class "moi.smooth"
or "mod.smooth"
. In addition to
the usual elements of a smooth class documented under smooth.construct
,
this object will contain a slot called boundary
that defines the endpoints beyond
which the spline will begin extrapolating (extrapolation is flat due to the first
order penalty placed on the smooth function)
This constructor will result in a valid smooth if using a call to
gam
or bam
, however the resulting
functions will not be guaranteed to be monotonic because constraints on
basis coefficients will not be enforced
Nicholas J Clark
Wang, Wenjie, and Jun Yan. "Shape-Restricted Regression Splines with R Package splines2."
Journal of Data Science 19.3 (2021).
Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3(4), 425–441.
# Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity library(mgcv) mod_data <- data.frame(y = y, x = x) mod <- gam(y ~ s(x, k = 16), data = mod_data, family = gaussian()) library(marginaleffects) plot_predictions(mod, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam(y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), chains = 2) plot_predictions(mod2, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c(f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam(y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), chains = 2) # Visualise the different monotonic functions plot_predictions(mod, condition = c('x', 'fac', 'fac'), points = 0.5) plot(mod, type = 'smooth', realisations = TRUE) # First derivatives (on the link scale) should never be # negative for either factor level (derivs <- slopes(mod, variables = 'x', by = c('x', 'fac'), type = 'link')) all(derivs$estimate > 0)
# Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity library(mgcv) mod_data <- data.frame(y = y, x = x) mod <- gam(y ~ s(x, k = 16), data = mod_data, family = gaussian()) library(marginaleffects) plot_predictions(mod, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam(y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), chains = 2) plot_predictions(mod2, by = 'x', newdata = data.frame(x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100)), points = 0.5) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c(f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam(y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), chains = 2) # Visualise the different monotonic functions plot_predictions(mod, condition = c('x', 'fac', 'fac'), points = 0.5) plot(mod, type = 'smooth', realisations = TRUE) # First derivatives (on the link scale) should never be # negative for either factor level (derivs <- slopes(mod, variables = 'x', by = c('x', 'fac'), type = 'link')) all(derivs$estimate > 0)
This function estimates the posterior distribution for Generalised Additive Models (GAMs) that can include
smooth spline functions, specified in the GAM formula, as well as latent temporal processes,
specified by trend_model
. Further modelling options include State-Space representations to allow covariates
and dynamic processes to occur on the latent 'State' level while also capturing observation-level effects.
Prior specifications are flexible and explicitly encourage users to apply
prior distributions that actually reflect their beliefs. In addition, model fits can easily be assessed and
compared with posterior predictive checks, forecast comparisons and leave-one-out / leave-future-out cross-validation.
mvgam( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... )
mvgam( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... )
formula |
A |
trend_formula |
An optional |
knots |
An optional |
trend_knots |
As for |
trend_model |
For all trend types apart from |
noncentred |
|
family |
Default is |
share_obs_params |
|
data |
A
Note however that there are special cases where these identifiers are not needed. For
example, models with hierarchical temporal correlation processes (e.g. |
newdata |
Optional |
use_lv |
|
n_lv |
|
trend_map |
Optional |
priors |
An optional |
run_model |
|
prior_simulation |
|
residuals |
Logical indicating whether to compute series-level randomized quantile residuals and include
them as part of the returned object. Defaults to |
return_model_data |
|
backend |
Character string naming the package to use as the backend for fitting
the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally
for the current R session via the |
algorithm |
Character string naming the estimation approach to use.
Options are |
control |
A named |
chains |
|
burnin |
|
samples |
|
thin |
Thinning interval for monitors. Ignored
if |
parallel |
|
threads |
|
save_all_pars |
|
silent |
Verbosity level between |
autoformat |
|
refit |
Logical indicating whether this is a refit, called using update.mvgam. Users should leave
as |
lfo |
Logical indicating whether this is part of a call to lfo_cv.mvgam. Returns a
lighter version of the model with no residuals and fewer monitored parameters to speed up
post-processing. But other downstream functions will not work properly, so users should always
leave this set as |
... |
Further arguments passed to Stan.
For |
Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence
but we do not want to rely on extrapolating from a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours).
In addition, smooths can often try to wiggle excessively to capture any autocorrelation that is present in a time series,
which exacerbates the problem of forecasting ahead. As GAMs are very naturally viewed through a Bayesian lens, and we often
must model time series that show complex distributional features and missing data, parameters for mvgam
models are estimated
in a Bayesian framework using Markov Chain Monte Carlo by default. A general overview is provided
in the primary vignettes: vignette("mvgam_overview")
and vignette("data_in_mvgam")
.
For a full list of available vignettes see vignette(package = "mvgam")
Formula syntax: Details of the formula syntax used by mvgam can be found in
mvgam_formulae
. Note that it is possible to supply an empty formula where
there are no predictors or intercepts in the observation model (i.e. y ~ 0
or y ~ -1
).
In this case, an intercept-only observation model will be set up but the intercept coefficient
will be fixed at zero. This can be handy if you wish to fit pure State-Space models where
the variation in the dynamic trend controls the average expectation, and/or where intercepts
are non-identifiable (as in piecewise trends, see examples below)
Families and link functions: Details of families supported by mvgam can be found in
mvgam_families
.
Trend models: Details of latent error process models supported by mvgam can be found in
mvgam_trends
.
Priors: Default priors for intercepts and any scale parameters are generated
using the same practice as brms. Prior distributions for most important model parameters can be
altered by the user to inspect model
sensitivities to given priors (see get_mvgam_priors
for details).
Note that latent trends are estimated on the link scale so choose priors
accordingly. However more control over the model specification can be accomplished by first using mvgam
as a
baseline, then editing the returned model accordingly. The model file can be edited and run outside
of mvgam
by setting run_model = FALSE
and this is encouraged for complex
modelling tasks. Note, no priors are
formally checked to ensure they are in the right syntax so it is
up to the user to ensure these are correct
Random effects: For any smooth terms using the random effect basis (smooth.construct.re.smooth.spec
),
a non-centred parameterisation is automatically employed to avoid degeneracies that are common in hierarchical models.
Note however that centred versions may perform better for series that are particularly informative, so as with any
foray into Bayesian modelling, it is worth building an understanding of the model's assumptions and limitations by following a
principled workflow. Also note that models are parameterised using drop.unused.levels = FALSE
in jagam
to ensure predictions can be made for all levels of the supplied factor variable
Observation level parameters: When more than one series is included in data
and an
observation family that contains more than one parameter is used, additional observation family parameters
(i.e. phi
for nb()
or sigma
for gaussian()
) are
by default estimated independently for each series. But if you wish for the series to share
the same observation parameters, set share_obs_params = TRUE
Residuals: For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics
If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no
autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent
draws from the model's posterior distribution
Using Stan: mvgam
is primarily designed to use Hamiltonian Monte Carlo for parameter estimation
via the software Stan
(using either the cmdstanr
or rstan
interface).
There are great advantages when using Stan
over Gibbs / Metropolis Hastings samplers, which includes the option
to estimate nonlinear effects via Hilbert space approximate Gaussian Processes,
the availability of a variety of inference algorithms (i.e. variational inference, laplacian inference etc...) and
capabilities to enforce stationarity for complex Vector Autoregressions.
Because of the many advantages of Stan
over JAGS
,
further development of the package will only be applied to Stan
. This includes the planned addition
of more response distributions, plans to handle zero-inflation, and plans to incorporate a greater
variety of trend models. Users are strongly encouraged to opt for Stan
over JAGS
in any proceeding workflows
How to start?: The mvgam
cheatsheet is a
good starting place if you are just learning to use the package. It gives an overview of the package's key functions and objects,
as well as providing a reasonable workflow that new users can follow. In general it is recommended to
1. Check that your time series data are in a suitable long format for mvgam
modeling (see the data formatting vignette for guidance)
2. Inspect features of the data using plot_mvgam_series
. Now is also a good time to familiarise yourself
with the package's example workflows that are detailed in the vignettes. In particular,
the getting started vignette,
the shared latent states vignette,
the time-varying effects vignette and
the State-Space models vignette all provide
detailed information about how to structure, fit and interrogate Dynamic Generalized Additive Models in mvgam
. Some
more specialized how-to articles include
"Incorporating time-varying seasonality in forecast models"
and "Temporal autocorrelation in GAMs and the mvgam
package"
3. Carefully think about how to structure linear predictor effects (i.e. smooth terms using s
,
te
or ti
, GPs using gp
, dynamic time-varying effects using dynamic
, and parametric terms), latent temporal trend components (see mvgam_trends
) and the appropriate
observation family (see mvgam_families
). Use get_mvgam_priors
to see default prior distributions
for stochastic parameters
4. Change default priors using appropriate prior knowledge (see prior
)
5. Fit the model using either Hamiltonian Monte Carlo or an approximation algorithm (i.e. change the backend
argument)
and use summary.mvgam
, conditional_effects.mvgam
, mcmc_plot.mvgam
, pp_check.mvgam
and
plot.mvgam
to inspect / interrogate the model
6. Update the model as needed and use loo_compare.mvgam
for in-sample model comparisons, or alternatively
use forecast.mvgam
and score.mvgam_forecast
to compare models based on out-of-sample forecasts (see the forecast evaluation vignette for guidance)
7. When satisfied with the model structure, use predict.mvgam
,
plot_predictions
and/or plot_slopes
for
more targeted inferences (see "How to interpret and report nonlinear effects from Generalized Additive Models" for some guidance on interpreting GAMs)
A list
object of class mvgam
containing model output, the text representation of the model file,
the mgcv model output (for easily generating simulations at
unsampled covariate values), Dunn-Smyth residuals for each series and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods.
Nicholas J Clark
Nicholas J Clark & Konstans Wells (2020). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784.
jagam
, gam
, gam.models
,
get_mvgam_priors
, jsdgam
# Simulate a collection of three time series that have shared seasonal dynamics # and independent AR1 trends, with a Poisson observation process set.seed(0) dat <- sim_mvgam(T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = 'all') # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent AR1 temporal process. # Note that 'noncentred = TRUE' will likely give performance gains. # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, run_model = FALSE) # View the model code in Stan language stancode(mod1) # View the data objects needed to fit the model in Stan sdata1 <- standata(mod1) str(sdata1) # Now fit the model mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2) # Extract the model summary summary(mod1) # Plot the estimated historical trend and forecast for one series plot(mod1, type = 'trend', series = 1) plot(mod1, type = 'forecast', series = 1) # Residual diagnostics plot(mod1, type = 'residuals', series = 1) resids <- residuals(mod1) str(resids) # Fitted values and residuals can also be added to training data augment(mod1) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = 'smooths') # Plot estimated first derivatives of the smooth plot(mod1, type = 'smooths', derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = 'smooths', residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = 'smooths', realisations = TRUE) # Plot conditional response predictions using marginaleffects library(marginaleffects) conditional_effects(mod1) plot_predictions(mod1, condition = 'season', points = 0.5) # Generate posterior predictive checks using bayesplot pp_check(mod1) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) # Investigate model fit mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) options(mc.cores = mc.cores.def) # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has it's own unique latent trend trend_map <- data.frame(series = unique(mod_data$series), trend = c(1, 1, 2)) # Fit the model using AR1 trends mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_map = trend_map, trend_model = AR(), data = mod_data, return_model_data = TRUE, chains = 2) # The mapping matrix is now supplied as data to the model in the 'Z' element mod$model_data$Z code(mod) # The first two series share an identical latent trend; the third is different plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'trend', series = 3) # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(123) N <- 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for(i in 2:N){ beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) } plot(beta_temp) # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate some noisy Gaussian observations out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5) # Gather necessary data into a data.frame; split into training / testing data = data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180,] data_test <- data[181:200,] # Fit the model using the dynamic() formula helper mod <- mvgam(out ~ dynamic(temp, scale = FALSE, k = 40), family = gaussian(), data = data_train, newdata = data_test, chains = 2) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = 'smooths') fc <- forecast(mod, newdata = data_test) plot(fc) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = 'dashed', lwd = 2) points(beta_temp, pch = 16) # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam(prop_trend = 0, mu = c(0, 2, 2), seasonality = 'hierarchical') # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam(formula = y ~ offset(offset) + s(series, bs = 're') + s(season, bs = 'cc') + s(season, by = series, m = 1, k = 5), data = dat$data_train, chains = 2) # Inspect the model file to see the modification to the linear predictor # (eta) code(mod) # Forecasts for the first two series will differ in magnitude fc <- forecast(mod, newdata = dat$data_test) layout(matrix(1:2, ncol = 2)) plot(fc, series = 1, ylim = c(0, 75)) plot(fc, series = 2, ylim = c(0, 75)) layout(1) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 fc <- forecast(mod, newdata = dat$data_test) plot(fc) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = 'link', newdata = dat$data_test, summary = FALSE) series1_inds <- which(dat$data_test$series == 'series_1') series2_inds <- which(dat$data_test$series == 'series_2') # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = 'l', col = 'grey75', ylim = range(preds_rr), ylab = 'Series1 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series1_inds], col = 'grey75') } plot(preds_rr[1, series2_inds], type = 'l', col = 'darkred', ylim = range(preds_rr), ylab = 'Series2 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series2_inds], col = 'darkred') } layout(1) # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9*x) detprob2 <- plogis(-0.1 -0.7*x) dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials), data.frame(y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials)) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) plot_mvgam_series(data = dat, series = 'all') # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat, chains = 2) summary(mod) pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50) conditional_effects(mod, type = 'link')
# Simulate a collection of three time series that have shared seasonal dynamics # and independent AR1 trends, with a Poisson observation process set.seed(0) dat <- sim_mvgam(T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = 'all') # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent AR1 temporal process. # Note that 'noncentred = TRUE' will likely give performance gains. # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, run_model = FALSE) # View the model code in Stan language stancode(mod1) # View the data objects needed to fit the model in Stan sdata1 <- standata(mod1) str(sdata1) # Now fit the model mod1 <- mvgam(formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2) # Extract the model summary summary(mod1) # Plot the estimated historical trend and forecast for one series plot(mod1, type = 'trend', series = 1) plot(mod1, type = 'forecast', series = 1) # Residual diagnostics plot(mod1, type = 'residuals', series = 1) resids <- residuals(mod1) str(resids) # Fitted values and residuals can also be added to training data augment(mod1) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = 'smooths') # Plot estimated first derivatives of the smooth plot(mod1, type = 'smooths', derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = 'smooths', residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = 'smooths', realisations = TRUE) # Plot conditional response predictions using marginaleffects library(marginaleffects) conditional_effects(mod1) plot_predictions(mod1, condition = 'season', points = 0.5) # Generate posterior predictive checks using bayesplot pp_check(mod1) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) # Investigate model fit mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) options(mc.cores = mc.cores.def) # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has it's own unique latent trend trend_map <- data.frame(series = unique(mod_data$series), trend = c(1, 1, 2)) # Fit the model using AR1 trends mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_map = trend_map, trend_model = AR(), data = mod_data, return_model_data = TRUE, chains = 2) # The mapping matrix is now supplied as data to the model in the 'Z' element mod$model_data$Z code(mod) # The first two series share an identical latent trend; the third is different plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'trend', series = 3) # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(123) N <- 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for(i in 2:N){ beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) } plot(beta_temp) # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate some noisy Gaussian observations out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5) # Gather necessary data into a data.frame; split into training / testing data = data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180,] data_test <- data[181:200,] # Fit the model using the dynamic() formula helper mod <- mvgam(out ~ dynamic(temp, scale = FALSE, k = 40), family = gaussian(), data = data_train, newdata = data_test, chains = 2) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = 'smooths') fc <- forecast(mod, newdata = data_test) plot(fc) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = 'dashed', lwd = 2) points(beta_temp, pch = 16) # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam(prop_trend = 0, mu = c(0, 2, 2), seasonality = 'hierarchical') # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam(formula = y ~ offset(offset) + s(series, bs = 're') + s(season, bs = 'cc') + s(season, by = series, m = 1, k = 5), data = dat$data_train, chains = 2) # Inspect the model file to see the modification to the linear predictor # (eta) code(mod) # Forecasts for the first two series will differ in magnitude fc <- forecast(mod, newdata = dat$data_test) layout(matrix(1:2, ncol = 2)) plot(fc, series = 1, ylim = c(0, 75)) plot(fc, series = 2, ylim = c(0, 75)) layout(1) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 fc <- forecast(mod, newdata = dat$data_test) plot(fc) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = 'link', newdata = dat$data_test, summary = FALSE) series1_inds <- which(dat$data_test$series == 'series_1') series2_inds <- which(dat$data_test$series == 'series_2') # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = 'l', col = 'grey75', ylim = range(preds_rr), ylab = 'Series1 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series1_inds], col = 'grey75') } plot(preds_rr[1, series2_inds], type = 'l', col = 'darkred', ylim = range(preds_rr), ylab = 'Series2 Relative Risk', xlab = 'Time') for(i in 2:50){ lines(preds_rr[i, series2_inds], col = 'darkred') } layout(1) # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9*x) detprob2 <- plogis(-0.1 -0.7*x) dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials), data.frame(y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials)) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) plot_mvgam_series(data = dat, series = 'all') # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat, chains = 2) summary(mod) pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50) conditional_effects(mod, type = 'link')
Extract quantities that can be used to diagnose sampling behavior of the algorithms applied by Stan at the back-end of mvgam.
## S3 method for class 'mvgam' nuts_params(object, pars = NULL, ...) ## S3 method for class 'mvgam' log_posterior(object, ...) ## S3 method for class 'mvgam' rhat(x, pars = NULL, ...) ## S3 method for class 'mvgam' neff_ratio(object, pars = NULL, ...)
## S3 method for class 'mvgam' nuts_params(object, pars = NULL, ...) ## S3 method for class 'mvgam' log_posterior(object, ...) ## S3 method for class 'mvgam' rhat(x, pars = NULL, ...) ## S3 method for class 'mvgam' neff_ratio(object, pars = NULL, ...)
object , x
|
A |
pars |
An optional character vector of parameter names.
For |
... |
Arguments passed to individual methods. |
For more details see
bayesplot-extractors
.
The exact form of the output depends on the method.
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == "divergent__")$Value) head(neff_ratio(mod))
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == "divergent__")$Value) head(neff_ratio(mod))
mvgam
objectsExtract posterior draws in conventional formats as data.frames, matrices, or arrays.
## S3 method for class 'mvgam' as.data.frame( x, row.names = NULL, optional = TRUE, variable = "betas", use_alias = TRUE, regex = FALSE, ... ) ## S3 method for class 'mvgam' as.matrix(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as.array(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as_draws( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_matrix( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_df( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_array( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_list( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)
## S3 method for class 'mvgam' as.data.frame( x, row.names = NULL, optional = TRUE, variable = "betas", use_alias = TRUE, regex = FALSE, ... ) ## S3 method for class 'mvgam' as.matrix(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as.array(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as_draws( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_matrix( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_df( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_array( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_list( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)
x |
|
row.names |
Ignored |
optional |
Ignored |
variable |
A character specifying which parameters to extract. Can either be one of the following options:
OR can be a character vector providing the variables to extract |
use_alias |
Logical. If more informative names for parameters are available
(i.e. for beta coefficients |
regex |
Logical. If not using one of the prespecified options for extractions,
should |
... |
Ignored |
inc_warmup |
Should warmup draws be included? Defaults to |
A data.frame
, matrix
, or array
containing the posterior draws.
## Not run: sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = sim$data_train, family = Gamma(), chains = 2, samples = 300) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars) ## End(Not run)
## Not run: sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = sim$data_train, family = Gamma(), chains = 2, samples = 300) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars) ## End(Not run)
Supported mvgam families
tweedie(link = "log") student_t(link = "identity") betar(...) nb(...) lognormal(...) student(...) bernoulli(...) beta_binomial(...) nmix(link = "log")
tweedie(link = "log") student_t(link = "identity") betar(...) nb(...) lognormal(...) student(...) bernoulli(...) beta_binomial(...) nmix(link = "log")
link |
a specification for the family link function. At present these cannot be changed |
... |
Arguments to be passed to the mgcv version of the associated functions |
mvgam
currently supports the following standard observation families:
gaussian
with identity link, for real-valued data
poisson
with log-link, for count data
Gamma
with log-link, for non-negative real-valued data
binomial
with logit-link, for count data when the number
of trials is known (and must be supplied)
In addition, the following extended families from the mgcv
and brms
packages are supported:
betar
with logit-link, for proportional data on (0,1)
nb
with log-link, for count data
lognormal
with identity-link, for non-negative real-valued data
bernoulli
with logit-link, for binary data
beta_binomial
with logit-link, as for binomial()
but allows
for overdispersion
Finally, mvgam
supports the three extended families described here:
tweedie
with log-link, for count data (power parameter p
fixed at 1.5
)
student_t()
(or student
) with identity-link, for real-valued data
nmix
for count data with imperfect detection modeled via a
State-Space N-Mixture model. The latent states are Poisson (with log link), capturing the 'true' latent
abundance, while the observation process is Binomial to account for imperfect detection. The
observation formula
in these models is used to set up a linear predictor for the detection
probability (with logit link). See the example below for a more detailed worked explanation
of the nmix()
family
Only poisson()
, nb()
, and tweedie()
are available if
using JAGS
. All families, apart from tweedie()
, are supported if
using Stan
.
Note that currently it is not possible to change the default link
functions in mvgam
, so any call to change these will be silently ignored
Objects of class family
Nicholas J Clark
# Example showing how to set up N-mixture models set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame(site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c(rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5)), # observations are taken with detection prob = 0.7 obs = c(rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7))) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 80) %>% dplyr::select(- replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat = testdat %>% dplyr::bind_rows(data.frame(site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c(rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5)), obs = c(rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45))) %>% dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), cap = 50) %>% dplyr::select(-replicate)) # series identifiers testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) # The trend_map to state how replicates are structured testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map # Fit a model mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c(prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend)), chains = 2) # The usual diagnostics summary(mod) # Plotting conditional effects library(ggplot2); library(marginaleffects) plot_predictions(mod, condition = 'species', type = 'detection') + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9*x) detprob2 <- plogis(-0.1 -0.7*x) dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials), data.frame(y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials)) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat) summary(mod)
# Example showing how to set up N-mixture models set.seed(999) # Simulate observations for species 1, which shows a declining trend and 0.7 detection probability data.frame(site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c(rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5)), # observations are taken with detection prob = 0.7 obs = c(rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7))) %>% # add 'series' information, which is an identifier of site, replicate and species dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 80) %>% dplyr::select(- replicate) -> testdat # Now add another species that has a different temporal trend and a smaller # detection probability (0.45 for this species) testdat = testdat %>% dplyr::bind_rows(data.frame(site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c(rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5)), obs = c(rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45))) %>% dplyr::mutate(series = paste0('site_', site, '_', species, '_rep_', replicate), time = as.numeric(time), cap = 50) %>% dplyr::select(-replicate)) # series identifiers testdat$species <- factor(testdat$species, levels = unique(testdat$species)) testdat$series <- factor(testdat$series, levels = unique(testdat$series)) # The trend_map to state how replicates are structured testdat %>% # each unique combination of site*species is a separate process dplyr::mutate(trend = as.numeric(factor(paste0(site, species)))) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map # Fit a model mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c(prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend)), chains = 2) # The usual diagnostics summary(mod) # Plotting conditional effects library(ggplot2); library(marginaleffects) plot_predictions(mod, condition = 'species', type = 'detection') + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9*x) detprob2 <- plogis(-0.1 -0.7*x) dat <- rbind(data.frame(y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials), data.frame(y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials)) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam(cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat) summary(mod)
mvgam_fevd
object descriptionA mvgam_fevd
object returned by function fevd
.
Run methods(class = "mvgam_fevd")
to see an overview of available methods.
A mvgam_fevd
object contains a list of posterior forecast
error variance decompositions, each stored as
its own list
Nicholas J Clark
mvgam_forecast
object descriptionA mvgam_forecast
object returned by function hindcast
or forecast
.
Run methods(class = "mvgam_forecast")
to see an overview of available methods.
A mvgam_forecast
object contains the following elements:
call
the original observation model formula
trend_call
If a trend_formula was supplied
, the original trend model formula is
returned. Otherwise NULL
family
character
description of the observation distribution
family_pars
list
containing draws of family-specific parameters (i.e.
shape, scale or overdispersion parameters). Only returned if type = link
. Otherwise NULL
trend_model
character
description of the latent trend model
drift
Logical specifying whether a drift term was used in the trend model
use_lv
Logical flag indicating whether latent dynamic factors were used in the model
fit_engine
Character
describing the fit engine, either as stan
or jags
type
The type of predictions included (either link
, response
or trend
)
series_names
Names of the time series, taken from levels(data$series)
in the original
model fit
train_observations
A list
of training observation vectors of length n_series
train_times
A vector
of the unique training times
test_observations
If the forecast
function was used,
a list
of test observation vectors of length n_series
. Otherwise NULL
test_times
If the forecast
function was used, a
vector
of the unique validation (testing) times. Otherwise NULL
hindcasts
A list
of posterior hindcast distributions of length n_series
.
forecasts
If the forecast
function was used,
a list
of posterior forecast distributions of length n_series
. Otherwise NULL
Nicholas J Clark
mvgam, hindcast.mvgam, forecast.mvgam
mvgam
Details of formula specifications in mvgam
mvgam
will accept an observation model formula and an optional
process model formula (via the argument trend_formula
). Neither of these formulae can
be specified as lists, contrary to the accepted behaviour in some mgcv
or brms
models.
Note that it is possible to supply an empty formula where
there are no predictors or intercepts in the observation model (i.e. y ~ 0
or y ~ -1
).
In this case, an intercept-only observation model will be set up but the intercept coefficient
will be fixed at zero. This can be handy if you wish to fit pure State-Space models where
the variation in the dynamic trend controls the average expectation, and/or where intercepts
are non-identifiable.
The formulae supplied to mvgam
are exactly like those supplied to
glm
except that smooth terms,
s
,
te
,
ti
and
t2
,
time-varying effects using dynamic
,
monotonically increasing (using s(x, bs = 'moi')
)
or decreasing splines (using s(x, bs = 'mod')
;
see smooth.construct.moi.smooth.spec
for
details), as well as
Gaussian Process functions using gp
and offsets using
offset
can be added to the right hand side (and .
is not supported in mvgam
formulae).
Further details on specifying different kinds of smooth functions, and how to control their behaviours
by modifying their potential complexities and / or how the penalties behave, can be found in the
extensive documentation for the mgcv
package.
Nicholas J Clark
mvgam
,
formula.gam
,
gam.models
,
jagam
,
gam
,
s
,
gp
,
formula
mvgam_irf
object descriptionA mvgam_irf
object returned by function irf
.
Run methods(class = "mvgam_irf")
to see an overview of available methods.
A mvgam_irf
object contains a list of posterior IRFs, each stored as
its own list
Nicholas J Clark
Helper functions for mvgam marginaleffects calculations
Functions needed for working with marginaleffects
Functions needed for getting data / objects with insight
## S3 method for class 'mvgam' get_coef(model, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' set_coef(model, coefs, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' get_vcov(model, vcov = NULL, ...) ## S3 method for class 'mvgam' get_predict(model, newdata, type = "response", process_error = FALSE, ...) ## S3 method for class 'mvgam' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam_prefit' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) ## S3 method for class 'mvgam_prefit' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... )
## S3 method for class 'mvgam' get_coef(model, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' set_coef(model, coefs, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' get_vcov(model, vcov = NULL, ...) ## S3 method for class 'mvgam' get_predict(model, newdata, type = "response", process_error = FALSE, ...) ## S3 method for class 'mvgam' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam_prefit' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) ## S3 method for class 'mvgam_prefit' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... )
model |
Model object |
trend_effects |
|
... |
Additional arguments are passed to the |
coefs |
vector of coefficients to insert in the model object |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
newdata |
Grid of predictor values at which we evaluate the slopes.
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
process_error |
|
x |
A fitted model. |
source |
String, indicating from where data should be recovered. If
|
verbose |
Toggle messages and warnings. |
effects |
Should model data for fixed effects ( |
component |
Should all predictor variables, predictor variables for the conditional model, the zero-inflated part of the model, the dispersion term or the instrumental variables be returned? Applies to models with zero-inflated and/or dispersion formula, or to models with instrumental variable (so called fixed-effects regressions). May be abbreviated. Note that the conditional component is also called count or mean component, depending on the model. |
flatten |
Logical, if |
Objects suitable for internal 'marginaleffects' functions to proceed.
See marginaleffects::get_coef()
, marginaleffects::set_coef()
,
marginaleffects::get_vcov()
, marginaleffects::get_predict()
,
insight::get_data()
and insight::find_predictors()
for details
Nicholas J Clark
Supported mvgam trend models
mvgam
currently supports the following dynamic trend models:
None
(no latent trend component; i.e. the GAM component is all that contributes to the linear predictor,
and the observation process is the only source of error; similarly to what is estimated by gam
)
ZMVN()
(zero-mean correlated errors, useful for modelling time series where no
autoregressive terms are needed or for modelling data that are not sampled as time series)
RW()
AR(p = 1, 2, or 3)
CAR(p = 1)
(continuous time autoregressive trends; only available in Stan
)
VAR()
(only available in Stan
)
PW()
(piecewise linear or logistic trends; only available in Stan
)
GP()
(Gaussian Process with squared exponential kernel;
only available in Stan
)
For most dynamic trend types available in mvgam
(see argument trend_model
), time should be
measured in discrete, regularly spaced intervals (i.e. c(1, 2, 3, ...)
). However you can
use irregularly spaced intervals if using trend_model = CAR(1)
, though note that any
temporal intervals that are exactly 0
will be adjusted to a very small number
(1e-12
) to prevent sampling errors. For all autoregressive trend types
apart from CAR()
, moving average and/or correlated
process error terms can also be estimated (for example, RW(cor = TRUE)
will set up a
multivariate Random Walk if data
contains >1
series). Hierarchical process error correlations
can also be handled if the data contain relevant observation units that are nested into
relevant grouping and subgrouping levels (i.e. using AR(gr = region, subgr = species)
)
Note that only RW
, AR1
, AR2
and AR3
are available if
using JAGS
. All trend models are supported if using Stan
.
Dynamic factor models can be used in which the latent factors evolve as either
RW
, AR1-3
, VAR
or GP
. For VAR
models
(i.e. VAR
and VARcor
models), users can either fix the trend error covariances to be 0
(using VAR
) or estimate them and potentially allow for contemporaneously correlated errors using
VARcor
. For all VAR
models, stationarity of
the latent process is enforced through the prior using the parameterisation given by
Heaps (2022). Stationarity is not enforced when using AR1
, AR2
or AR3
models,
though this can be changed by the user by specifying lower and upper bounds on autoregressive
parameters using functionality in get_mvgam_priors and the priors
argument in
mvgam. Piecewise trends follow the formulation in the popular prophet
package produced
by Facebook
, where users can allow for changepoints to control the potential flexibility
of the trend. See Taylor and Letham (2018) for details
Sarah E. Heaps (2022) Enforcing stationarity through the prior in Vector Autoregressions. Journal of Computational and Graphical Statistics. 32:1, 1-10.
Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American Statistician 72.1, 37-45.
RW
, AR
, CAR
,
VAR
, PW
, GP
, ZMVN
mvgam
object descriptionA fitted mvgam
object returned by function mvgam
.
Run methods(class = "mvgam")
to see an overview of available methods.
A mvgam
object contains the following elements:
call
the original observation model formula
trend_call
If a trend_formula was supplied
, the original trend model formula is
returned. Otherwise NULL
family
character
description of the observation distribution
trend_model
character
description of the latent trend model
trend_map
data.frame
describing the mapping of trend states to
observations, if supplied in the original model. Otherwise NULL
drift
Logical specifying whether a drift term was used in the trend model
priors
If the model priors were updated from their defaults, the prior dataframe
will be returned. Otherwise NULL
model_output
The MCMC
object returned by the fitting engine. If the model was fitted
using Stan
, this will be an object of class stanfit
(see stanfit-class
for details).
If JAGS
was used as the backend, this will be an object of class runjags
(see runjags-class
for details)
model_file
The character
string model file used to describe the model in either
Stan
or JAGS
syntax
model_data
If return_model_data
was set to TRUE
when fitting the model, the list
object
containing all data objects needed to condition the model is returned. Each item in the list
is described
in detail at the top of the model_file
. Otherwise NULL
inits
If return_model_data
was set to TRUE
when fitting the model, the initial value
functions used to initialise the MCMC chains will be returned. Otherwise NULL
monitor_pars
The parameters
that were monitored during MCMC sampling are returned as a character vector
sp_names
A character vector
specifying the names for each smoothing parameter
mgcv_model
An object of class gam
containing the mgcv
version of the observation model.
This object is used for generating the linear predictor matrix when making predictions for new data. The
coefficients in this model object will contain the posterior median coefficients from the GAM linear predictor,
but these are only used if generating plots of smooth functions that mvgam
currently cannot handle
(such as plots for three-dimensional smooths). This model therefore should not be used for inference.
See gamObject
for details
trend_mgcv_model
If a trend_formula was supplied
, an object of class gam
containing
the mgcv
version of the trend model. Otherwise NULL
ytimes
The matrix
object used in model fitting for indexing which series and timepoints
were observed in each row of the supplied data. Used internally by some downstream plotting
and prediction functions
resids
A named list
object containing posterior draws of Dunn-Smyth
randomized quantile residuals
use_lv
Logical flag indicating whether latent dynamic factors were used in the model
n_lv
If use_lv == TRUE
, the number of latent dynamic factors used in the model
upper_bounds
If bounds were supplied in the original model fit, they will be returned.
Otherwise NULL
obs_data
The original data object (either a list
or dataframe
) supplied in model
fitting.
test_data
If test data were supplied (as argument newdata
in the original model), it
will be returned. Othwerise NULL
fit_engine
Character
describing the fit engine, either as stan
or jags
backend
Character
describing the backend used for modelling, either as rstan
, cmdstanr
or rjags
algorithm
Character
describing the algorithm used for finding the posterior,
either as sampling
, laplace
, pathfinder
, meanfield
or fullrank
max_treedepth
If the model was fitted using Stan
, the value supplied for the maximum
treedepth tuning parameter is returned (see stan
for details).
Otherwise NULL
adapt_delta
If the model was fitted using Stan
, the value supplied for the adapt_delta
tuning parameter is returned (see stan
for details).
Otherwise NULL
Nicholas J Clark
mvgam
objectA pairs
method that is customized for MCMC output.
## S3 method for class 'mvgam' pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)
## S3 method for class 'mvgam' pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)
x |
An object of class |
variable |
Names of the variables (parameters) to plot, as given by a
character vector or a regular expression (if |
regex |
Logical; Indicates whether |
use_alias |
Logical. If more informative names for parameters are available
(i.e. for beta coefficients |
... |
Further arguments to be passed to
|
For a detailed description see
mcmc_pairs
.
Plottable objects whose classes depend on the arguments supplied.
See mcmc_pairs
for details.
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE)
simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE)
This function takes a fitted mvgam
object and returns plots and summary statistics for
the latent dynamic factors
plot_mvgam_factors(object, plot = TRUE)
plot_mvgam_factors(object, plot = TRUE)
object |
|
plot |
|
If the model in object
was estimated using dynamic factors, it is possible that not all factors
contributed to the estimated trends. This is due to the regularisation penalty that acts independently on each
factor's Gaussian precision, which will squeeze un-needed factors to a white noise process (effectively dropping
that factor from the model). In this function, each factor is tested against a null hypothesis of white noise by
calculating the sum of the factor's 2nd derivatives. A factor that has a larger contribution will have a larger
sum due to the weaker penalty on the factor's precision. If
plot == TRUE
, the factors are also plotted.
A dataframe
of factor contributions and,
optionally, a series of base R
plots
Nicholas J Clark
simdat <- sim_mvgam() mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, chains = 2) plot_mvgam_factors(mod)
simdat <- sim_mvgam() mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, chains = 2) plot_mvgam_factors(mod)
Plot mvgam posterior predictions for a specified series
plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) ## S3 method for class 'mvgam_forecast' plot( x, series = 1, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, return_score = FALSE, ... )
plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) ## S3 method for class 'mvgam_forecast' plot( x, series = 1, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, return_score = FALSE, ... )
object |
|
series |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
realisations |
|
n_realisations |
|
hide_xlabels |
|
xlab |
label for x axis. |
ylab |
label for y axis. |
ylim |
Optional |
n_cores |
|
return_forecasts |
|
return_score |
|
... |
further |
x |
Object of class |
plot_mvgam_fc
generates posterior predictions from an object of class mvgam
, calculates posterior
empirical quantiles and plots them against the observed data. If realisations = FALSE
, the returned plot shows
90, 60, 40 and 20 percent posterior quantiles (as ribbons of increasingly darker shades or red)
as well as the posterior median (as a dark red line). If realisations = FALSE
, a set of n_realisations
posterior
draws are shown.
plot.mvgam_forecast
takes an object of class mvgam_forecast
, in which forecasts have already
been computed, and plots the resulting forecast distribution.
If realisations = FALSE
, these posterior quantiles are plotted along
with the true observed data that was used to train the model. Otherwise, a spaghetti plot is returned
to show possible forecast paths.
A base R
graphics plot and an optional list
containing the forecast distribution
and the out of sample probabilistic forecast score
This function plots posterior empirical quantiles for partial effects of parametric terms
plot_mvgam_pterms(object, trend_effects = FALSE)
plot_mvgam_pterms(object, trend_effects = FALSE)
object |
|
trend_effects |
logical. If |
Posterior empirical quantiles of each parametric term's partial effect estimates
(on the link scale) are calculated and visualised as ribbon plots. These effects can
be interpreted as the partial effect that a parametric term contributes when all other
terms in the model have been set to 0
A base R
graphics plot
This function plots posterior empirical quantiles for random effect smooths (bs = re)
plot_mvgam_randomeffects(object, trend_effects = FALSE)
plot_mvgam_randomeffects(object, trend_effects = FALSE)
object |
|
trend_effects |
logical. If |
Posterior empirical quantiles of random effect coefficient estimates (on the link scale) are calculated and visualised as ribbon plots. Labels for coefficients are taken from the levels of the original factor variable that was used to specify the smooth in the model's formula
A base R
graphics plot
This function takes a fitted mvgam
object and returns various residual diagnostic plots
plot_mvgam_resids(object, series = 1)
plot_mvgam_resids(object, series = 1)
object |
|
series |
|
A total of four ggplot plots are generated to examine posterior Dunn-Smyth residuals for the specified series. Plots include a residuals vs fitted values plot, a Q-Q plot, and two plots to check for any remaining temporal autocorrelation in the residuals. Note, all plots use only report statistics from a sample of up to 20 posterior draws (to save computational time), so uncertainty in these relationships may not be adequately represented.
A series of facetted ggplot object
Nicholas J Clark
Nicholas J Clark and Matthijs Hollanders
## Not run: simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Plot Dunn Smyth residuals for some series plot_mvgam_resids(mod) plot_mvgam_resids(mod, series = 2) ## End(Not run)
## Not run: simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Plot Dunn Smyth residuals for some series plot_mvgam_resids(mod) plot_mvgam_resids(mod, series = 2) ## End(Not run)
This function takes either a fitted mvgam
object or a data.frame
object
and produces plots of observed time series, ACF, CDF and histograms for exploratory data analysis
plot_mvgam_series( object, data, newdata, y = "y", lines = TRUE, series = 1, n_bins, log_scale = FALSE )
plot_mvgam_series( object, data, newdata, y = "y", lines = TRUE, series = 1, n_bins, log_scale = FALSE )
object |
Optional |
data |
Optional |
newdata |
Optional |
y |
Character. What is the name of the outcome variable in the supplied data? Defaults to
|
lines |
Logical. If |
series |
Either a |
n_bins |
|
log_scale |
|
A set of ggplot objects. If series
is an integer, the plots will
show observed time series, autocorrelation and
cumulative distribution functions, and a histogram for the series. If series == 'all'
,
a set of observed time series plots is returned in which all series are shown on each plot but
only a single focal series is highlighted, with all remaining series shown as faint gray lines.
Nicholas J Clark and Matthijs Hollanders
# Simulate and plot series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend = 0.6) plot_mvgam_series(data = sim_data$data_train, series = 'all') plot_mvgam_series(data = sim_data$data_train, newdata = sim_data$data_test, series = 1) # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10) plot_mvgam_series(data = sim_data$data_train, series = 'all')
# Simulate and plot series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend = 0.6) plot_mvgam_series(data = sim_data$data_train, series = 'all') plot_mvgam_series(data = sim_data$data_train, newdata = sim_data$data_test, series = 1) # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10) plot_mvgam_series(data = sim_data$data_train, series = 'all')
This function plots posterior empirical quantiles for a series-specific smooth term
plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata )
plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata )
object |
|
trend_effects |
logical. If |
series |
|
smooth |
either a |
residuals |
|
n_resid_bins |
|
realisations |
|
n_realisations |
|
derivatives |
|
newdata |
Optional |
Smooth functions are shown as empirical quantiles (or spaghetti plots) of posterior partial expectations
across a sequence of values between the variable's min
and max
,
while zeroing out effects of all other variables. At present, only univariate and bivariate smooth plots
are allowed, though note that bivariate smooths rely on default behaviour from
plot.gam
. plot_mvgam_smooth
generates posterior predictions from an
object of class mvgam
, calculates posterior empirical quantiles and plots them.
If realisations = FALSE
, the returned plot shows 90, 60, 40 and 20 percent posterior
quantiles (as ribbons of increasingly darker shades or red) as well as the posterior
median (as a dark red line). If realisations = FALSE
, a set of n_realisations
posterior
draws are shown. For more nuanced visualisation, supply
newdata
just as you would when predicting from a gam
model or use the more flexible conditional_effects.mvgam
. Alternatively, if you prefer to use partial
effect plots in the style of gratia
, and if you have the gratia
package installed, you can
use draw.mvgam
. See gratia_mvgam_enhancements
for details.
A base R
graphics plot
Nicholas J Clark
plot.gam
, conditional_effects.mvgam
,
gratia_mvgam_enhancements
Plot mvgam latent trend for a specified series
plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, hide_xlabels = FALSE, xlab, ylab, ... )
plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, hide_xlabels = FALSE, xlab, ylab, ... )
object |
|
series |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
realisations |
|
n_realisations |
|
n_cores |
|
derivatives |
|
hide_xlabels |
|
xlab |
label for x axis. |
ylab |
label for y axis. |
... |
further |
A base R
graphics plot
simdat <- sim_mvgam(n_series = 3, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Plot estimated trends for some series plot_mvgam_trend(mod) plot_mvgam_trend(mod, series = 2) # Extrapolate trends forward in time and plot on response scale plot_mvgam_trend(mod, newdata = simdat$data_test) plot_mvgam_trend(mod, newdata = simdat$data_test, series = 2) # But it is recommended to compute extrapolations for all series # first and then plot trend_fc <- forecast(mod, newdata = simdat$data_test) plot(trend_fc, series = 1) plot(trend_fc, series = 2)
simdat <- sim_mvgam(n_series = 3, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Plot estimated trends for some series plot_mvgam_trend(mod) plot_mvgam_trend(mod, series = 2) # Extrapolate trends forward in time and plot on response scale plot_mvgam_trend(mod, newdata = simdat$data_test) plot_mvgam_trend(mod, newdata = simdat$data_test, series = 2) # But it is recommended to compute extrapolations for all series # first and then plot trend_fc <- forecast(mod, newdata = simdat$data_test) plot(trend_fc, series = 1) plot(trend_fc, series = 2)
Plot mvgam forecast uncertainty contributions for a specified series
plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = "topleft", hide_xlabels = FALSE )
plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = "topleft", hide_xlabels = FALSE )
object |
|
series |
|
newdata |
A |
data_test |
Deprecated. Still works in place of |
legend_position |
The location may also be specified by setting x to a single keyword from the list: "none", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location (if it is not "none"). |
hide_xlabels |
|
The basic idea of this function is to compute forecasts by ignoring one of the
two primary components in a correlated residual model (i.e. by either ignoring the
linear predictor effects or by ignoring the residual dynamics). Some caution is required
however, as this function was designed early in the mvgam development cycle and
there are now many types of models that it cannot handle very well. For example,
models with shared latent states, or any type of State-Space models that include terms
in the trend_formula
, will either fail or give nonsensical results. Improvements are
in the works to provide a more general way to decompose forecast uncertainties, so
please check back at a later date.
A base R
graphics plot
This function takes a fitted mvgam
object and produces plots of smooth functions, forecasts, trends and
uncertainty components
## S3 method for class 'mvgam' plot( x, type = "residuals", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... )
## S3 method for class 'mvgam' plot( x, type = "residuals", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... )
x |
|
type |
|
series |
|
residuals |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
trend_effects |
logical. If |
... |
Additional arguments for each individual plotting function. |
These plots are useful for getting an overview of the fitted model and its estimated
random effects or smooth functions,
but the individual plotting functions and the functions from the marginaleffects
and gratia
packages
offer far more more customisation.
A base R plot or set of plots
Nicholas J Clark
plot_mvgam_resids
, plot_mvgam_smooth
, plot_mvgam_fc
,
plot_mvgam_trend
, plot_mvgam_uncertainty
, plot_mvgam_factors
,
plot_mvgam_randomeffects
, conditional_effects.mvgam
,
plot_predictions
, plot_slopes
,
gratia_mvgam_enhancements
# Simulate some time series dat <- sim_mvgam(T = 80, n_series = 3) # Fit a basic model mod <- mvgam(y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = RW(), chains = 2) # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with 'marginaleffects' utilities library(marginaleffects) plot_predictions(mod, condition = 'season', type = 'link') plot_predictions(mod, condition = c('season', 'series', 'series'), type = 'link') plot_predictions(mod, condition = 'series', type = 'link') # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam(y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = RW(), chains = 2) plot(mod, type = 'smooths', trend_effects = TRUE) # But marginaleffects functions work without any modification plot_predictions(mod, condition = 'season', type = 'link')
# Simulate some time series dat <- sim_mvgam(T = 80, n_series = 3) # Fit a basic model mod <- mvgam(y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = RW(), chains = 2) # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with 'marginaleffects' utilities library(marginaleffects) plot_predictions(mod, condition = 'season', type = 'link') plot_predictions(mod, condition = c('season', 'series', 'series'), type = 'link') plot_predictions(mod, condition = 'series', type = 'link') # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam(y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = RW(), chains = 2) plot(mod, type = 'smooths', trend_effects = TRUE) # But marginaleffects functions work without any modification plot_predictions(mod, condition = 'season', type = 'link')
mvgam_fevd
object and produces
a plot of the posterior median contributions to forecast variance for each series
in the fitted Vector AutoregressionPlot forecast error variance decompositions from an mvgam_fevd object
This function takes an mvgam_fevd
object and produces
a plot of the posterior median contributions to forecast variance for each series
in the fitted Vector Autoregression
## S3 method for class 'mvgam_fevd' plot(x, ...)
## S3 method for class 'mvgam_fevd' plot(x, ...)
x |
|
... |
ignored |
A ggplot
object,
which can be further customized using the ggplot2 package
Nicholas J Clark
mvgam_irf
object and produces plots of Impulse Response FunctionsPlot impulse responses from an mvgam_irf object
This function takes an mvgam_irf
object and produces plots of Impulse Response Functions
## S3 method for class 'mvgam_irf' plot(x, series = 1, ...)
## S3 method for class 'mvgam_irf' plot(x, series = 1, ...)
x |
|
series |
|
... |
ignored |
A base R plot or set of plots
Nicholas J Clark
This function takes an object of class mvgam_lfo
and creates several
informative diagnostic plots
## S3 method for class 'mvgam_lfo' plot(x, ...)
## S3 method for class 'mvgam_lfo' plot(x, ...)
x |
An object of class |
... |
Ignored |
A ggplot object of Pareto-k and ELPD values over the evaluation timepoints. For the Pareto-k plot, a dashed red line indicates the specified threshold chosen for triggering model refits. For the ELPD plot, a dashed red line indicates the bottom 10% quantile of ELPD values. Points below this threshold may represent outliers that were more difficult to forecast
A dataset containing timeseries of total captures (across all control plots) for select rodent species from the Portal Project
portal_data
portal_data
A dataframe containing the following fields:
time of sampling in lunar cycles
Total captures of species Dipodomys merriami
Total captures of species Dipodomys ordii
Total captures of species Chaetodipus penicillatus
Total captures of species Onychomys torridus
Sampling year
Sampling month
Monthly mean minimum temperature
Monthly mean precipitation
Monthly mean Normalised Difference Vegetation Index
https://github.com/weecology/PortalData/blob/main/SiteandMethods/Methods.md
Compute posterior draws of the expected value of the posterior predictive
distribution (i.e. the conditional expectation).
Can be performed for the data used to fit the model (posterior
predictive checks) or for new data. By definition, these predictions have
smaller variance than the posterior predictions performed by the
posterior_predict.mvgam
method. This is because only the
uncertainty in the expected value of the posterior predictive distribution is
incorporated in the draws computed by posterior_epred
while the
residual error is ignored there. However, the estimated means of both methods
averaged across draws should be very similar.
## S3 method for class 'mvgam' posterior_epred( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
## S3 method for class 'mvgam' posterior_epred( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
ndraws |
Positive |
process_error |
Logical. If |
... |
Ignored |
Note that for all types of predictions for models that did not include
a trend_formula
, uncertainty in the dynamic trend
component can be ignored by setting process_error = FALSE
. However,
if a trend_formula
was supplied in the model, predictions for this component cannot be
ignored. If process_error = TRUE
, trend predictions will ignore autocorrelation
coefficients or GP length scale coefficients, ultimately assuming the process is stationary.
This method is similar to the types of posterior predictions returned from brms
models
when using autocorrelated error predictions for newdata.
This function is therefore more suited to posterior simulation from the GAM components
of a mvgam
model, while the forecasting functions
plot_mvgam_fc
and forecast.mvgam
are better suited to generate h-step ahead forecasts
that respect the temporal dynamics of estimated latent trends.
A matrix
of dimension n_samples x new_obs
,
where n_samples
is the number of posterior samples from the fitted object
and n_obs
is the number of observations in newdata
hindcast.mvgam
posterior_linpred.mvgam
posterior_predict.mvgam
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations)
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations)
Compute posterior draws of the linear predictor, that is draws before applying any link functions or other transformations. Can be performed for the data used to fit the model (posterior predictive checks) or for new data.
## S3 method for class 'mvgam' posterior_linpred( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... )
## S3 method for class 'mvgam' posterior_linpred( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... )
object |
|
transform |
Logical; if |
newdata |
Optional |
ndraws |
Positive |
data_test |
Deprecated. Still works in place of |
process_error |
Logical. If |
... |
Ignored |
Note that for all types of predictions for models that did not include
a trend_formula
, uncertainty in the dynamic trend
component can be ignored by setting process_error = FALSE
. However,
if a trend_formula
was supplied in the model, predictions for this component cannot be
ignored. If process_error = TRUE
, trend predictions will ignore autocorrelation
coefficients or GP length scale coefficients, ultimately assuming the process is stationary.
This method is similar to the types of posterior predictions returned from brms
models
when using autocorrelated error predictions for newdata.
This function is therefore more suited to posterior simulation from the GAM components
of a mvgam
model, while the forecasting functions
plot_mvgam_fc
and forecast.mvgam
are better suited to generate h-step ahead forecasts
that respect the temporal dynamics of estimated latent trends.
A matrix
of dimension n_samples x new_obs
,
where n_samples
is the number of posterior samples from the fitted object
and n_obs
is the number of observations in newdata
posterior_epred.mvgam
posterior_predict.mvgam
hindcast.mvgam
posterior_epred.mvgam
posterior_predict.mvgam
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds)
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds)
Compute posterior draws of the posterior predictive distribution. Can be
performed for the data used to fit the model (posterior predictive checks) or
for new data. By definition, these draws have higher variance than draws
of the expected value of the posterior predictive distribution computed by
posterior_epred.mvgam
. This is because the residual error
is incorporated in posterior_predict
. However, the estimated means of
both methods averaged across draws should be very similar.
## S3 method for class 'mvgam' posterior_predict( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
## S3 method for class 'mvgam' posterior_predict( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
ndraws |
Positive |
process_error |
Logical. If |
... |
Ignored |
Note that for all types of predictions for models that did not include
a trend_formula
, uncertainty in the dynamic trend
component can be ignored by setting process_error = FALSE
. However,
if a trend_formula
was supplied in the model, predictions for this component cannot be
ignored. If process_error = TRUE
, trend predictions will ignore autocorrelation
coefficients or GP length scale coefficients, ultimately assuming the process is stationary.
This method is similar to the types of posterior predictions returned from brms
models
when using autocorrelated error predictions for newdata.
This function is therefore more suited to posterior simulation from the GAM components
of a mvgam
model, while the forecasting functions
plot_mvgam_fc
and forecast.mvgam
are better suited to generate h-step ahead forecasts
that respect the temporal dynamics of estimated latent trends.
A matrix
of dimension n_samples x new_obs
,
where n_samples
is the number of posterior samples from the fitted object
and n_obs
is the number of observations in newdata
hindcast.mvgam
posterior_linpred.mvgam
posterior_epred.mvgam
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) ## End(Not run)
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = 'AR1', data = simdat$data_train) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) ## End(Not run)
mvgam
ObjectsPerform unconditional posterior predictive checks with the help of the bayesplot package.
## S3 method for class 'mvgam' pp_check( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... )
## S3 method for class 'mvgam' pp_check( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... )
object |
An object of class |
type |
Type of the ppc plot as given by a character string.
See |
ndraws |
Positive integer indicating how many
posterior draws should be used.
If |
prefix |
The prefix of the bayesplot function to be applied. Either '"ppc"' (posterior predictive check; the default) or '"ppd"' (posterior predictive distribution), the latter being the same as the former except that the observed data is not shown for '"ppd"'. |
group |
Optional name of a factor variable in the model
by which to stratify the ppc plot. This argument is required for
ppc |
x |
Optional name of a variable in the model.
Only used for ppc types having an |
newdata |
Optional |
... |
Further arguments passed to |
Unlike the conditional posterior checks provided by ppc
,
This function computes unconditional posterior predictive checks (i.e. it generates
predictions for fake data without considering the true observations associated with those
fake data). For a detailed explanation of each of the ppc functions,
see the PPC
documentation of the bayesplot
package.
A ggplot object that can be further customized using the ggplot2 package.
Nicholas J Clark
## Not run: simdat <- sim_mvgam(seasonality = 'hierarchical') mod <- mvgam(y ~ series + s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', k = 4), data = simdat$data_train, burnin = 300, samples = 300) # Use pp_check(mod, type = "xyz") for a list of available plot types # Default is a density overlay for all observations pp_check(mod) # Rootograms particularly useful for count data pp_check(mod, type = "rootogram") # Grouping plots by series is useful pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50) pp_check(mod, type = "stat_freqpoly_grouped", group = "series", ndraws = 50) # Custom functions accepted prop_zero <- function(x) mean(x == 0) pp_check(mod, type = "stat", stat = "prop_zero") pp_check(mod, type = "stat_grouped", stat = "prop_zero", group = "series") # Some functions accept covariates to set the x-axes pp_check(mod, x = "season", type = "ribbon_grouped", prob = 0.5, prob_outer = 0.8, group = "series") # Many plots can be made without the observed data pp_check(mod, prefix = "ppd") ## End(Not run)
## Not run: simdat <- sim_mvgam(seasonality = 'hierarchical') mod <- mvgam(y ~ series + s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', k = 4), data = simdat$data_train, burnin = 300, samples = 300) # Use pp_check(mod, type = "xyz") for a list of available plot types # Default is a density overlay for all observations pp_check(mod) # Rootograms particularly useful for count data pp_check(mod, type = "rootogram") # Grouping plots by series is useful pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50) pp_check(mod, type = "stat_freqpoly_grouped", group = "series", ndraws = 50) # Custom functions accepted prop_zero <- function(x) mean(x == 0) pp_check(mod, type = "stat", stat = "prop_zero") pp_check(mod, type = "stat_grouped", stat = "prop_zero", group = "series") # Some functions accept covariates to set the x-axes pp_check(mod, x = "season", type = "ribbon_grouped", prob = 0.5, prob_outer = 0.8, group = "series") # Many plots can be made without the observed data pp_check(mod, prefix = "ppd") ## End(Not run)
Plot mvgam conditional posterior predictive checks for a specified series
ppc(object, ...) ## S3 method for class 'mvgam' ppc( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... )
ppc(object, ...) ## S3 method for class 'mvgam' ppc( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... )
object |
|
... |
further |
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
series |
|
type |
|
n_bins |
|
legend_position |
The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location. Or alternatively, use "none" to hide the legend. |
xlab |
label for x axis. |
ylab |
label for y axis. |
Conditional posterior predictions are drawn from the fitted mvgam
and compared against
the empirical distribution of the observed data for a specified series to help evaluate the model's
ability to generate unbiased predictions. For all plots apart from type = 'rootogram'
, posterior predictions
can also be compared to out of sample observations as long as these observations were included as
'data_test' in the original model fit and supplied here. Rootograms are currently only plotted using the
'hanging' style.
Note that the predictions used for these plots are conditional on the observed data, i.e. they
are those predictions that have been generated directly within
the mvgam()
model. They can be misleading if the model included flexible dynamic trend components. For
a broader range of posterior checks that are created using unconditional "new data" predictions, see
pp_check.mvgam
A base R
graphics plot showing either a posterior rootogram (for type == 'rootogram'
),
the predicted vs observed mean for the
series (for type == 'mean'
), predicted vs observed proportion of zeroes for the
series (for type == 'prop_zero'
),predicted vs observed histogram for the
series (for type == 'hist'
), kernel density or empirical CDF estimates for
posterior predictions (for type == 'density'
or type == 'cdf'
) or a Probability
Integral Transform histogram (for type == 'pit'
).
Nicholas J Clark
# Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian(), chains = 2) # Posterior checks ppc(mod, type = 'hist') ppc(mod, type = 'density') ppc(mod, type = 'cdf') # Many more options are available with pp_check() pp_check(mod) pp_check(mod, type = "ecdf_overlay") pp_check(mod, type = 'freqpoly')
# Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian(), chains = 2) # Posterior checks ppc(mod, type = 'hist') ppc(mod, type = 'density') ppc(mod, type = 'cdf') # Many more options are available with pp_check() pp_check(mod) pp_check(mod, type = "ecdf_overlay") pp_check(mod, type = 'freqpoly')
Predict from the GAM component of an mvgam model
## S3 method for class 'mvgam' predict( object, newdata, data_test, type = "link", process_error = TRUE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
## S3 method for class 'mvgam' predict( object, newdata, data_test, type = "link", process_error = TRUE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
type |
When this has the value |
process_error |
Logical. If |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
Ignored |
Note that for all types of predictions for models that did not include
a trend_formula
, uncertainty in the dynamic trend
component can be ignored by setting process_error = FALSE
. However,
if a trend_formula
was supplied in the model, predictions for this component cannot be
ignored. If process_error = TRUE
, trend predictions will ignore autocorrelation
coefficients or GP length scale coefficients, ultimately assuming the process is stationary.
This method is similar to the types of posterior predictions returned from brms
models
when using autocorrelated error predictions for newdata.
This function is therefore more suited to posterior simulation from the GAM components
of a mvgam
model, while the forecasting functions
plot_mvgam_fc
and forecast.mvgam
are better suited to generate h-step ahead forecasts
that respect the temporal dynamics of estimated latent trends.
Predicted values on the appropriate scale.
If summary = FALSE
and type != "terms"
,
the output is a matrix of dimension n_draw x n_observations
containing predicted values for each posterior draw in object
.
If summary = TRUE
and type != "terms"
, the output is an n_observations
x E
matrix. The number of summary statistics E
is equal to 2 +
length(probs)
: The Estimate
column contains point estimates (either
mean or median depending on argument robust
), while the
Est.Error
column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust
). The remaining columns starting with Q
contain
quantile estimates as specified via argument probs
.
If type = "terms"
and summary = FALSE
, the output is a named list
containing a separate slot for each effect, with the effects returned as
matrices of dimension n_draw x 1
. If summary = TRUE
, the output resembles that
from predict.gam
when using the call
predict.gam(object, type = "terms", se.fit = TRUE)
, where mean contributions
from each effect are returned in matrix
form while standard errors (representing
the interval: (max(probs) - min(probs)) / 2
) are returned in a separate matrix
# Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(111) simdat <- sim_mvgam(seasonality = 'hierarchical', trend_model = 'AR1', family = gaussian()) # Fit a model with shared seasonality mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = AR(), noncentred = TRUE, chains = 2) # Generate predictions against observed data preds <- predict(mod1, summary = TRUE) head(preds) # Generate predictions against test data preds <- predict(mod1, newdata = simdat$data_test, summary = TRUE) head(preds)
# Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(111) simdat <- sim_mvgam(seasonality = 'hierarchical', trend_model = 'AR1', family = gaussian()) # Fit a model with shared seasonality mod1 <- mvgam(y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = AR(), noncentred = TRUE, chains = 2) # Generate predictions against observed data preds <- predict(mod1, summary = TRUE) head(preds) # Generate predictions against test data preds <- predict(mod1, newdata = simdat$data_test, summary = TRUE) head(preds)
This function takes a fitted mvgam
object and prints a quick summary
## S3 method for class 'mvgam' print(x, ...)
## S3 method for class 'mvgam' print(x, ...)
x |
|
... |
Ignored |
A brief summary of the model's call is printed
A list
is printed on-screen
Nicholas J Clark
Set up piecewise linear or logistic trend models
in mvgam
. These functions do not evaluate their arguments –
they exist purely to help set up a model with particular piecewise
trend models.
PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = "linear" )
PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = "linear" )
n_changepoints |
A non-negative integer specifying the number of potential
changepoints. Potential changepoints are selected uniformly from the
first |
changepoint_range |
Proportion of history in |
changepoint_scale |
Parameter modulating the flexibility of the
automatic changepoint selection by altering the scale parameter of a Laplace distribution.
The resulting prior will be |
growth |
Character string specifying either 'linear' or 'logistic' growth of
the trend. If 'logistic', a variable labelled |
Offsets and intercepts:
For each of these trend models, an offset parameter is included in the trend
estimation process. This parameter will be incredibly difficult to identify
if you also include an intercept in the observation formula. For that reason,
it is highly recommended that you drop the intercept from the formula
(i.e. y ~ x + 0
or y ~ x - 1
, where x
are your optional predictor terms).
Logistic growth and the cap variable:
When forecasting growth, there is often some maximum achievable point that
a time series can reach. For example, total market size, total population size
or carrying capacity in population dynamics. It can be advantageous for the forecast
to saturate at or near this point so that predictions are more sensible.
This function allows you to make forecasts using a logistic growth trend model,
with a specified carrying capacity. Note that this capacity does not need to be static
over time, it can vary with each series x timepoint combination if necessary. But you
must supply a cap
value for each observation in the data when using growth = 'logistic'
.
For observation families that use a non-identity link function, the cap
value will
be internally transformed to the link scale (i.e. your specified cap
will be log
transformed if you are using a poisson()
or nb()
family). It is therefore important
that you specify the cap
values on the scale of your outcome. Note also that
no missing values are allowed in cap
.
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37-45.
# Example of logistic growth with possible changepoints # Simple logistic growth model dNt = function(r, N, k){ r * N * (k - N) } # Iterate growth through time Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { # population at next time step is current population + growth, # but we introduce several 'shocks' as changepoints if(i %in% c(5, 15, 25, 41, 45, 60, 80)){ N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k)) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } # Simulate expected values set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') # Take Poisson draws y <- rpois(100, expected) plot(y, xlab = 'Time') # Assemble data into dataframe and model. We set a # fixed carrying capacity of 35 for this example, but note that # this value is not required to be fixed at each timepoint mod_data <- data.frame(y = y, time = 1:100, cap = 35, series = as.factor('series_1')) plot_mvgam_series(data = mod_data) # The intercept is nonidentifiable when using piecewise # trends because the trend functions have their own offset # parameters 'm'; it is recommended to always drop intercepts # when using these trend models mod <- mvgam(y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data, chains = 2) summary(mod) # Plot the posterior hindcast plot(mod, type = 'forecast') # View the changepoints with ggplot2 utilities library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs(y = 'Potential changepoint', x = 'Rate change')
# Example of logistic growth with possible changepoints # Simple logistic growth model dNt = function(r, N, k){ r * N * (k - N) } # Iterate growth through time Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { # population at next time step is current population + growth, # but we introduce several 'shocks' as changepoints if(i %in% c(5, 15, 25, 41, 45, 60, 80)){ N[i + 1] <- max(1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k)) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } # Simulate expected values set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') # Take Poisson draws y <- rpois(100, expected) plot(y, xlab = 'Time') # Assemble data into dataframe and model. We set a # fixed carrying capacity of 35 for this example, but note that # this value is not required to be fixed at each timepoint mod_data <- data.frame(y = y, time = 1:100, cap = 35, series = as.factor('series_1')) plot_mvgam_series(data = mod_data) # The intercept is nonidentifiable when using piecewise # trends because the trend functions have their own offset # parameters 'm'; it is recommended to always drop intercepts # when using these trend models mod <- mvgam(y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data, chains = 2) summary(mod) # Plot the posterior hindcast plot(mod, type = 'forecast') # View the changepoints with ggplot2 utilities library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs(y = 'Potential changepoint', x = 'Rate change')
Compute residual correlation estimates from Joint Species Distribution
jsdgam
models using latent factor loadings
residual_cor(object, ...) ## S3 method for class 'jsdgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
residual_cor(object, ...) ## S3 method for class 'jsdgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
|
... |
ignored |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
Hui (2016) provides an excellent description of the quantities that this function calculates, so this passage
is heavily paraphrased from his associated boral
package.
In Joint Species Distribution Models, the residual covariance matrix is calculated
based on the matrix of latent factor loading matrix , where the residual covariance
matrix
. A strong residual covariance/correlation matrix
between two species can be interpreted as evidence of species interaction (e.g.,
facilitation or competition),
missing covariates, as well as any additional species correlation not accounted for by shared
environmental captured in
formula
.
The residual precision matrix (also known as partial correlation matrix, Ovaskainen et al., 2016) is defined as the inverse of the residual correlation matrix. The precision matrix is often used to identify direct or causal relationships between two species e.g., two species can have a zero precision but still be correlated, which can be interpreted as saying that two species are not directly associated, but they are still correlated through other species. In other words, they are conditionally independent given the other species. It is important that the precision matrix does not exhibit the exact same properties of the correlation e.g., the diagonal elements are not equal to 1. Nevertheless, relatively larger values of precision may imply stronger direct relationships between two species.
In addition to the residual correlation and precision matrices, the median or mean point estimator
of trace of the residual covariance matrix is returned,
. Often used in other areas of multivariate
statistics, the trace may be interpreted as the amount of covariation explained by the latent factors.
One situation where the trace may be useful is when comparing a pure latent factor model
(where no terms are suppled to
formula
) versus a model with latent
factors and some additional predictors in formula
– the proportional difference in trace
between these two models may be interpreted as the proportion of covariation between species explained
by the predictors in formula
. Of course, the trace itself is random due to the MCMC sampling, and so it
is not always guaranteed to produce sensible answers.
If summary = TRUE
, a list
with the following components:
cor , cor_lower , cor_upper
|
A set of |
sig_cor |
A |
prec , prec_lower , prec_upper
|
A set of |
sig_prec |
A |
cov |
A |
trace |
The median/mean point estimator of the trace (sum of the diagonal elements)
of the residual covariance matrix |
If summary = FALSE
, this function returns a list
containing the following components:
all_cormat |
A |
all_covmat |
A |
all_presmat |
A |
all_trace |
A |
Nicholas J Clark
Francis KC Hui (2016). BORAL - Bayesian ordination and regression analysis of
multivariate abundance data in R. Methods in Ecology and Evolution. 7, 744-750.
Otso Ovaskainen et al. (2016). Using latent variable models to identify large networks of
species-to-species associations at different spatial scales. Methods in Ecology and Evolution,
7, 549-555.
mvgam
residualsThis method extracts posterior draws of Dunn-Smyth (randomized quantile) residuals in the order in which the data were supplied to the model. It included additional arguments for obtaining summaries of the computed residuals
## S3 method for class 'mvgam' residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)
## S3 method for class 'mvgam' residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)
object |
An object of class |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
ignored |
This method gives residuals as Dunn-Smyth (randomized quantile) residuals. Any
observations that were missing (i.e. NA
) in the original data will have missing values
in the residuals
An array
of randomized quantile residual values.
If summary = FALSE
the output resembles those of
posterior_epred.mvgam
and predict.mvgam
.
If summary = TRUE
the output is an n_observations
x E
matrix. The number of summary statistics E
is equal to 2 +
length(probs)
: The Estimate
column contains point estimates (either
mean or median depending on argument robust
), while the
Est.Error
column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust
). The remaining columns starting with Q
contain
quantile estimates as specified via argument probs
.
Nicholas J Clark
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Extract posterior residuals resids <- residuals(mod) str(resids) # Or add them directly to the observed data, along with fitted values augment(mod, robust = FALSE, probs = c(0.25, 0.75))
# Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Extract posterior residuals resids <- residuals(mod) str(resids) # Or add them directly to the observed data, along with fitted values augment(mod, robust = FALSE, probs = c(0.25, 0.75))
Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.
RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) CAR(p = 1) VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) CAR(p = 1) VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
ma |
|
cor |
|
gr |
An optional grouping variable, which must be a |
subgr |
A subgrouping |
p |
A non-negative integer specifying the autoregressive (AR) order.
Default is |
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
## Not run: # A short example to illustrate CAR(1) models # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 120, phi = 0.5, sigma = 1, sigma_obs = 0.75){ # Sample irregularly spaced time intervals time_dis <- c(0, runif(n - 1, -0.1, 1)) time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5 # Set up the latent dynamic process x <- vector(length = n); x[1] <- -0.3 for(i in 2:n){ # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if(time_dis[i] == 0){ x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma) } else { x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12) beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs), season = rep(1:12, 20)[1:n], time = cumsum(time_dis)) } # Sample two time series dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1')), dplyr::bind_cols(sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2'))) %>% dplyr::mutate(series = as.factor(series)) # mvgam with CAR(1) trends and series-level seasonal smooths; the # State-Space representation (using trend_formula) will be more efficient mod <- mvgam(formula = y ~ 1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), data = dat, family = gaussian(), samples = 300, chains = 2) # View usual summaries and plots summary(mod) conditional_effects(mod, type = 'expected') plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) # Now an example illustrating hierarchical dynamics set.seed(123) # Simulate three species monitored in three different # regions, where dynamics can potentially vary across regions simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) # Set up the data but DO NOT include 'series' all_dat <- rbind(simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic')) %>% dplyr::mutate(species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region)) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) # Check priors for a hierarchical AR1 model get_mvgam_priors(formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat) # Fit the model mod <- mvgam(formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat) # Check standard outputs summary(mod) conditional_effects(mod, type = 'link') # Inspect posterior estimates for process error and the # correlation weighting parameter mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') ## End(Not run)
## Not run: # A short example to illustrate CAR(1) models # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 120, phi = 0.5, sigma = 1, sigma_obs = 0.75){ # Sample irregularly spaced time intervals time_dis <- c(0, runif(n - 1, -0.1, 1)) time_dis[time_dis < 0] <- 0; time_dis <- time_dis * 5 # Set up the latent dynamic process x <- vector(length = n); x[1] <- -0.3 for(i in 2:n){ # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if(time_dis[i] == 0){ x[i] <- rnorm(1, mean = (phi ^ 1e-12) * x[i - 1], sd = sigma) } else { x[i] <- rnorm(1, mean = (phi ^ time_dis[i]) * x[i - 1], sd = sigma) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1 : n) / 12); cov2 <- cos(2 * pi * (1 : n) / 12) beta1 <- runif(1, 0.3, 0.7); beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame(y = rnorm(n, mean = x + seasonality, sd = sigma_obs), season = rep(1:12, 20)[1:n], time = cumsum(time_dis)) } # Sample two time series dat <- rbind(dplyr::bind_cols(sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1')), dplyr::bind_cols(sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2'))) %>% dplyr::mutate(series = as.factor(series)) # mvgam with CAR(1) trends and series-level seasonal smooths; the # State-Space representation (using trend_formula) will be more efficient mod <- mvgam(formula = y ~ 1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), data = dat, family = gaussian(), samples = 300, chains = 2) # View usual summaries and plots summary(mod) conditional_effects(mod, type = 'expected') plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) # Now an example illustrating hierarchical dynamics set.seed(123) # Simulate three species monitored in three different # regions, where dynamics can potentially vary across regions simdat1 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) simdat2 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) simdat3 <- sim_mvgam(trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3)) # Set up the data but DO NOT include 'series' all_dat <- rbind(simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic')) %>% dplyr::mutate(species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region)) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) # Check priors for a hierarchical AR1 model get_mvgam_priors(formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat) # Fit the model mod <- mvgam(formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat) # Check standard outputs summary(mod) conditional_effects(mod, type = 'link') # Inspect posterior estimates for process error and the # correlation weighting parameter mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') ## End(Not run)
Compute probabilistic forecast scores for mvgam objects
## S3 method for class 'mvgam_forecast' score( object, score = "crps", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...)
## S3 method for class 'mvgam_forecast' score( object, score = "crps", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...)
object |
|
score |
|
log |
|
weights |
optional |
interval_width |
proportional value on |
n_cores |
|
... |
Ignored |
a list
containing scores and interval coverages per forecast horizon.
If score %in% c('drps', 'crps', 'elpd')
,
the list will also contain return the sum of all series-level scores per horizon. If
score %in% c('energy','variogram')
, no series-level scores are computed and the only score returned
will be for all series. For all scores apart from elpd
, the in_interval
column in each series-level
slot is a binary indicator of whether or not the true value was within the forecast's corresponding
posterior empirical quantiles. Intervals are not calculated when using elpd
because forecasts
will only contain the linear predictors
# Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam(y ~ 1, trend_model = RW(), data = data$data_train, newdata = data$data_test, chains = 2) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores)
# Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam(y ~ 1, trend_model = RW(), data = data$data_train, newdata = data$data_test, chains = 2) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores)
xts
or ts
objects)
to the format necessary for mvgam
This function converts univariate or multivariate time series (xts
or ts
objects)
to the format necessary for mvgam
series_to_mvgam(series, freq, train_prop = 0.85)
series_to_mvgam(series, freq, train_prop = 0.85)
series |
|
freq |
|
train_prop |
|
A list
object containing outputs needed for mvgam
,
including 'data_train' and 'data_test'
# A ts object example data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) series_to_mvgam(series, frequency(series), 0.85) # An xts object example library(xts) dates <- seq(as.Date("2001-05-01"), length=30, by="quarter") data <- cbind(c(gas = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001))))) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) series_to_mvgam(series, freq = 4, train_prop = 0.85)
# A ts object example data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) series_to_mvgam(series, frequency(series), 0.85) # An xts object example library(xts) dates <- seq(as.Date("2001-05-01"), length=30, by="quarter") data <- cbind(c(gas = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1+rnorm(30, mean = 0.01, sd = 0.001))))) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) series_to_mvgam(series, freq = 4, train_prop = 0.85)
This function simulates sets of time series data for fitting a multivariate GAM that includes shared seasonality and dependence on state-space latent dynamic factors. Random dependencies among series, i.e. correlations in their long-term trends, are included in the form of correlated loadings on the latent dynamic factors
sim_mvgam( T = 100, n_series = 3, seasonality = "shared", use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 )
sim_mvgam( T = 100, n_series = 3, seasonality = "shared", use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 )
T |
|
n_series |
|
seasonality |
|
use_lv |
|
n_lv |
|
trend_model |
See mvgam_trends for more details |
drift |
|
prop_trend |
|
trend_rel |
Deprecated. Use |
freq |
|
family |
|
phi |
|
shape |
|
sigma |
|
nu |
|
mu |
|
prop_missing |
|
prop_train |
|
A list
object containing outputs needed for mvgam
,
including 'data_train' and 'data_test', as well as some additional information
about the simulated seasonality and trend dependencies
# Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend = 0.6) plot_mvgam_series(data = sim_data$data_train, series = 'all') # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10) plot_mvgam_series(data = sim_data$data_train, series = 'all')
# Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam(family = betar(), trend_model = RW(), prop_trend = 0.6) plot_mvgam_series(data = sim_data$data_train, series = 'all') # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam(family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10) plot_mvgam_series(data = sim_data$data_train, series = 'all')
Compute reactivity, return rates and contributions of interactions to
stationary forecast variance from
mvgam
models with Vector Autoregressive dynamics
stability(object, ...) ## S3 method for class 'mvgam' stability(object, ...)
stability(object, ...) ## S3 method for class 'mvgam' stability(object, ...)
object |
|
... |
ignored |
These measures of stability can be used to assess how important inter-series dependencies are to the variability of a multivariate system and to ask how systems are expected to respond to environmental perturbations. Using the formula for a latent VAR(1) as:
this function will calculate the long-term stationary forecast distribution of the system, which
has mean and variance
, to then calculate the following quantities:
prop_int
: Proportion of the volume of the stationary forecast distribution
that is attributable to lagged interactions (i.e. how important are the autoregressive
interaction coefficients in for explaining the shape of the stationary forecast distribution?):
prop_int_adj
: Same as prop_int
but scaled by the number of series to facilitate
direct comparisons among systems with different numbers of interacting variables:
prop_int_offdiag
: Sensitivity of prop_int
to inter-series
interactions (i.e. how important are the off-diagonals of the autoregressive coefficient
matrix for shaping
prop_int
?), calculated as the relative magnitude of the off-diagonals in
the partial derivative matrix:
prop_int_diag
: Sensitivity of prop_int
to intra-series
interactions (i.e. how important are the diagonals of the autoregressive coefficient matrix
for shaping
prop_int
?), calculated as the relative magnitude of the diagonals in the partial derivative
matrix:
prop_cov_offdiag
: Sensitivity of to inter-series error correlations
(i.e. how important are off-diagonal covariances in
for shaping
?), calculated as the relative magnitude of the off-diagonals in
the partial derivative matrix:
prop_cov_diag
: Sensitivity of to error variances
(i.e. how important are diagonal variances in
for shaping
?), calculated as the relative magnitude of the diagonals in
the partial derivative matrix:
reactivity
: A measure of the degree to which the system moves
away from a stable equilibrium following a perturbation.
Values > 0
suggest the system is reactive, whereby a
perturbation of the system in one period can be amplified in the next period. If
is the largest singular value of
, then reactivity is defined as:
mean_return_rate
: Asymptotic (long-term) return rate of the mean of the transition distribution
to the stationary mean, calculated using the largest eigenvalue of the matrix :
Lower values suggest greater stability
var_return_rate
: Asymptotic (long-term) return rate of the variance of the transition distribution
to the stationary variance:
Again, lower values suggest greater stability
Major advantages of using mvgam to compute these metrics are that well-calibrated uncertainties are
available and that VAR processes are forced to be stationary. These properties make it simple and
insightful to calculate and inspect aspects of both long-term and short-term stability.
But it is also possible to more directly inspect possible interactions among the
time series in a latent VAR process. To do so, you can calculate and plot
Generalized or Orthogonalized Impulse Response Functions using the irf
function.
A data.frame
containing posterior draws for each stability metric.
Nicholas J Clark
AR Ives, B Dennis, KL Cottingham & SR Carpenter (2003). Estimating community stability and ecological interactions from time-series data. Ecological Monographs. 73, 301-330.
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate stability metrics for this system metrics <- stability(mod) # Proportion of stationary forecast distribution # attributable to lagged interactions hist(metrics$prop_int, xlim = c(0, 1), xlab = 'Prop_int', main = '', col = '#B97C7C', border = 'white') # Within this contribution of interactions, how important # are inter-series interactions (offdiagonals of the A matrix) vs # intra-series density dependence (diagonals of the A matrix)? layout(matrix(1:2, nrow = 2)) hist(metrics$prop_int_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series interactions', col = '#B97C7C', border = 'white') hist(metrics$prop_int_diag, xlim = c(0, 1), xlab = 'Contribution to interaction effect', main = 'Intra-series interactions (density dependence)', col = 'darkblue', border = 'white') layout(1) # How important are inter-series error covariances # (offdiagonals of the Sigma matrix) vs # intra-series variances (diagonals of the Sigma matrix) for explaining # the variance of the stationary forecast distribution? layout(matrix(1:2, nrow = 2)) hist(metrics$prop_cov_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series covariances', col = '#B97C7C', border = 'white') hist(metrics$prop_cov_diag, xlim = c(0, 1), xlab = 'Contribution to forecast variance', main = 'Intra-series variances', col = 'darkblue', border = 'white') layout(1) # Reactivity, i.e. degree to which the system moves # away from a stable equilibrium following a perturbation # (values > 1 suggest a more reactive, less stable system) hist(metrics$reactivity, main = '', xlab = 'Reactivity', col = '#B97C7C', border = 'white', xlim = c(-1*max(abs(metrics$reactivity)), max(abs(metrics$reactivity)))) abline(v = 0, lwd = 2.5)
# Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam(family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam(y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2) # Calulate stability metrics for this system metrics <- stability(mod) # Proportion of stationary forecast distribution # attributable to lagged interactions hist(metrics$prop_int, xlim = c(0, 1), xlab = 'Prop_int', main = '', col = '#B97C7C', border = 'white') # Within this contribution of interactions, how important # are inter-series interactions (offdiagonals of the A matrix) vs # intra-series density dependence (diagonals of the A matrix)? layout(matrix(1:2, nrow = 2)) hist(metrics$prop_int_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series interactions', col = '#B97C7C', border = 'white') hist(metrics$prop_int_diag, xlim = c(0, 1), xlab = 'Contribution to interaction effect', main = 'Intra-series interactions (density dependence)', col = 'darkblue', border = 'white') layout(1) # How important are inter-series error covariances # (offdiagonals of the Sigma matrix) vs # intra-series variances (diagonals of the Sigma matrix) for explaining # the variance of the stationary forecast distribution? layout(matrix(1:2, nrow = 2)) hist(metrics$prop_cov_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series covariances', col = '#B97C7C', border = 'white') hist(metrics$prop_cov_diag, xlim = c(0, 1), xlab = 'Contribution to forecast variance', main = 'Intra-series variances', col = 'darkblue', border = 'white') layout(1) # Reactivity, i.e. degree to which the system moves # away from a stable equilibrium following a perturbation # (values > 1 suggest a more reactive, less stable system) hist(metrics$reactivity, main = '', xlab = 'Reactivity', col = '#B97C7C', border = 'white', xlim = c(-1*max(abs(metrics$reactivity)), max(abs(metrics$reactivity)))) abline(v = 0, lwd = 2.5)
These functions take a fitted mvgam
object and return various useful summaries
## S3 method for class 'mvgam' summary(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...) ## S3 method for class 'mvgam_prefit' summary(object, ...) ## S3 method for class 'mvgam' coef(object, summarise = TRUE, ...)
## S3 method for class 'mvgam' summary(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...) ## S3 method for class 'mvgam_prefit' summary(object, ...) ## S3 method for class 'mvgam' coef(object, summarise = TRUE, ...)
object |
|
include_betas |
Logical. Print a summary that includes posterior summaries
of all linear predictor beta coefficients (including spline coefficients)?
Defaults to |
smooth_test |
Logical. Compute estimated degrees of freedom and approximate
p-values for smooth terms? Defaults to |
digits |
The number of significant digits for printing out the summary;
defaults to |
... |
Ignored |
summarise |
|
summary.mvgam
and summary.mvgam_prefit
return brief summaries of the model's call, along with posterior intervals for
some of the key parameters in the model. Note that some smooths have extra penalties on the null space,
so summaries for the rho
parameters may include more penalty terms than the number of smooths in
the original model formula. Approximate p-values for smooth terms are also returned,
with methods used for their
calculation following those used for mgcv
equivalents (see summary.gam
for details).
The Estimated Degrees of Freedom (edf) for smooth terms is computed using
either edf.type = 1
for models with no trend component, or edf.type = 0
for models with
trend components. These are described in the documentation for jagam
. Experiments suggest
these p-values tend to be more conservative than those that might be returned from an equivalent
model fit with summary.gam
using method = 'REML'
coef.mvgam
returns either summaries or full posterior estimates for GAM
component
coefficients
For summary.mvgam
and summary.mvgam_prefit
, a list
is printed
on-screen showing the summaries for the model
For coef.mvgam
, either a matrix
of posterior coefficient distributions
(if summarise == FALSE
or data.frame
of coefficient summaries)
Nicholas J Clark
mvgam
objectThis function allows a previously fitted mvgam
model to be updated
## S3 method for class 'mvgam' update( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) ## S3 method for class 'jsdgam' update( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... )
## S3 method for class 'mvgam' update( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) ## S3 method for class 'jsdgam' update( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... )
object |
|
formula |
Optional new |
trend_formula |
An optional |
knots |
An optional |
trend_knots |
As for |
trend_model |
For all trend types apart from |
family |
Default is |
share_obs_params |
|
data |
A
Note however that there are special cases where these identifiers are not needed. For
example, models with hierarchical temporal correlation processes (e.g. |
newdata |
Optional |
trend_map |
Optional |
use_lv |
|
n_lv |
|
priors |
An optional |
chains |
|
burnin |
|
samples |
|
threads |
|
algorithm |
Character string naming the estimation approach to use.
Options are |
lfo |
Logical indicating whether this is part of a call to lfo_cv.mvgam. Returns a
lighter version of the model with no residuals and fewer monitored parameters to speed up
post-processing. But other downstream functions will not work properly, so users should always
leave this set as |
... |
|
factor_formula |
Optional new |
factor_knots |
An optional |
A list
object of class mvgam
containing model output, the text representation of the model file,
the mgcv model output (for easily generating simulations at
unsampled covariate values), Dunn-Smyth residuals for each outcome variable and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods.
A list
object of class mvgam
containing model output, the text representation of the model file,
the mgcv model output (for easily generating simulations at
unsampled covariate values), Dunn-Smyth residuals for each series and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods.
Nicholas J Clark
# Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) summary(mod) conditional_effects(mod, type = 'link') # Update to an AR2 model updated_mod <- update(mod, trend_model = AR(p = 2), noncentred = TRUE) summary(updated_mod) conditional_effects(updated_mod, type = 'link') # Now update to a Binomial AR1 by adding information on trials # requires that we supply newdata that contains the 'trials' variable simdat$data_train$trials <- max(simdat$data_train$y) + 15 updated_mod <- update(mod, formula = cbind(y, trials) ~ s(season, bs = 'cc'), noncentred = TRUE, data = simdat$data_train, family = binomial()) summary(updated_mod) conditional_effects(updated_mod, type = 'link')
# Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) summary(mod) conditional_effects(mod, type = 'link') # Update to an AR2 model updated_mod <- update(mod, trend_model = AR(p = 2), noncentred = TRUE) summary(updated_mod) conditional_effects(updated_mod, type = 'link') # Now update to a Binomial AR1 by adding information on trials # requires that we supply newdata that contains the 'trials' variable simdat$data_train$trials <- max(simdat$data_train$y) + 15 updated_mod <- update(mod, formula = cbind(y, trials) ~ s(season, bs = 'cc'), noncentred = TRUE, data = simdat$data_train, family = binomial()) summary(updated_mod) conditional_effects(updated_mod, type = 'link')
Set up latent correlated multivariate Gaussian residual processes in mvgam. This function does not evaluate it's arguments – it exists purely to help set up a model with particular error processes.
ZMVN(unit = time, gr = NA, subgr = series)
ZMVN(unit = time, gr = NA, subgr = series)
unit |
The unquoted name of the variable that represents the unit of analysis in |
gr |
An optional grouping variable, which must be a |
subgr |
A subgrouping |
An object of class mvgam_trend
, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
## Not run: # Simulate counts of four species over ten sampling locations site_dat <- data.frame(site = rep(1:10, 4), species = as.factor(sort(rep(letters[1:4], 10))), y = c(NA, rpois(39, 3))) head(site_dat) # Set up a correlated residual (i.e. Joint Species Distribution) model, # where 'site' represents the unit of analysis trend_model <- ZMVN(unit = site, subgr = species) mod <- mvgam(y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, chains = 2, silent = 2) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') # A hierarchical correlation example; set up correlated counts # for three species across two sampling locations Sigma <- matrix(c(1, -0.4, 0.5, -0.4, 1, 0.3, 0.5, 0.3, 1), byrow = TRUE, nrow = 3) Sigma make_site_dat = function(...){ errors <- mgcv::rmvn(n = 30, mu = c(0.6, 0.8, 1.8), V = Sigma) site_dat <- do.call(rbind, lapply(1:3, function(spec){ data.frame(y = rpois(30, lambda = exp(errors[, spec])), species = paste0('species', spec), site = 1:30) })) site_dat } site_dat <- rbind(make_site_dat() %>% dplyr::mutate(group = 'group1'), make_site_dat() %>% dplyr::mutate(group = 'group2')) %>% dplyr::mutate(species = as.factor(species), group = as.factor(group)) # Fit the hierarchical correlated residual model mod <- mvgam(y ~ species, trend_model = ZMVN(unit = site, gr = group, subgr = species), data = site_dat) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') ## End(Not run)
## Not run: # Simulate counts of four species over ten sampling locations site_dat <- data.frame(site = rep(1:10, 4), species = as.factor(sort(rep(letters[1:4], 10))), y = c(NA, rpois(39, 3))) head(site_dat) # Set up a correlated residual (i.e. Joint Species Distribution) model, # where 'site' represents the unit of analysis trend_model <- ZMVN(unit = site, subgr = species) mod <- mvgam(y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, chains = 2, silent = 2) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') # A hierarchical correlation example; set up correlated counts # for three species across two sampling locations Sigma <- matrix(c(1, -0.4, 0.5, -0.4, 1, 0.3, 0.5, 0.3, 1), byrow = TRUE, nrow = 3) Sigma make_site_dat = function(...){ errors <- mgcv::rmvn(n = 30, mu = c(0.6, 0.8, 1.8), V = Sigma) site_dat <- do.call(rbind, lapply(1:3, function(spec){ data.frame(y = rpois(30, lambda = exp(errors[, spec])), species = paste0('species', spec), site = 1:30) })) site_dat } site_dat <- rbind(make_site_dat() %>% dplyr::mutate(group = 'group1'), make_site_dat() %>% dplyr::mutate(group = 'group2')) %>% dplyr::mutate(species = as.factor(species), group = as.factor(group)) # Fit the hierarchical correlated residual model mod <- mvgam(y ~ species, trend_model = ZMVN(unit = site, gr = group, subgr = species), data = site_dat) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') ## End(Not run)