| Title: | Multivariate (Dynamic) Generalized Additive Models |
|---|---|
| Description: | Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>. |
| Authors: | Nicholas J Clark [aut, cre] (ORCID: <https://orcid.org/0000-0001-7131-3301>), KANK Karunarathna [ctb] (ARMA parameterisations and factor models, ORCID: <https://orcid.org/0000-0002-8995-5502>), Sarah Heaps [ctb] (VARMA parameterisations, ORCID: <https://orcid.org/0000-0002-5543-037X>), Scott Pease [ctb] (broom enhancements, ORCID: <https://orcid.org/0009-0006-8977-9285>), Matthijs Hollanders [ctb] (ggplot visualizations, ORCID: <https://orcid.org/0000-0003-0796-1018>) |
| Maintainer: | Nicholas J Clark <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.595 |
| Built: | 2026-05-14 09:19:37 UTC |
| Source: | https://github.com/nicholasjclark/mvgam |
Calculate randomized quantile residuals for mvgam objects
add_residuals(object, ...) ## S3 method for class 'mvgam' add_residuals(object, ...)add_residuals(object, ...) ## S3 method for class 'mvgam' add_residuals(object, ...)
object |
|
... |
unused |
For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution
A list object of class mvgam with residuals included in the 'resids' slot
A dataset containing timeseries of Amblyomma americanum and Ixodes scapularis nymph abundances at NEON sites
all_neon_tick_dataall_neon_tick_data
A tibble/dataframe containing covariate information alongside the main fields of:
Year of sampling
Epidemiological week of sampling
NEON plot ID for survey location
NEON site ID for survey location
Counts of A. americanum nymphs
Counts of I. scapularis nymphs
https://www.neonscience.org/data
mvgam object's dataAdd fits and residuals to the data, implementing the generic augment from
the package broom.
## S3 method for class 'mvgam' augment(x, robust = FALSE, probs = c(0.025, 0.975), ...)## S3 method for class 'mvgam' augment(x, robust = FALSE, probs = c(0.025, 0.975), ...)
x |
An object of class |
robust |
If |
probs |
The percentiles to be computed by the quantile function. |
... |
Unused, included for generic consistency only. |
A list is returned if class(x$obs_data) == 'list', otherwise a tibble
is returned, but the contents of either object is the same.
The arguments robust and probs are applied to both the fit and residuals
calls (see fitted.mvgam() and residuals.mvgam() for details).
A list or tibble (see details) combining:
The data supplied to mvgam().
The outcome variable, named as .observed.
The fitted backcasts, along with their variability and credible bounds.
The residuals, along with their variability and credible bounds.
Other tidiers:
tidy.mvgam()
## Not run: set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) mod1 <- mvgam( formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) ## End(Not run)## Not run: set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) mod1 <- mvgam( formula = y ~ s(season, bs = 'cc', k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) augment(mod1, robust = TRUE, probs = c(0.25, 0.75)) ## End(Not run)
Generate Stan code and data objects for mvgam models
code(object) ## S3 method for class 'mvgam_prefit' stancode(object, ...) ## S3 method for class 'mvgam' stancode(object, ...) ## S3 method for class 'mvgam_prefit' standata(object, ...)code(object) ## S3 method for class 'mvgam_prefit' stancode(object, ...) ## S3 method for class 'mvgam' stancode(object, ...) ## S3 method for class 'mvgam_prefit' standata(object, ...)
object |
An object of class |
... |
ignored |
Either a character string containing the fully commented Stan code to fit a mvgam model or a named list containing the data objects needed to fit the model in Stan.
## Not run: simdat <- sim_mvgam() mod <- mvgam(y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE) # View Stan model code stancode(mod) # View Stan model data sdata <- standata(mod) str(sdata) ## End(Not run)## Not run: simdat <- sim_mvgam() mod <- mvgam(y ~ s(season) + s(time, by = series), family = poisson(), data = simdat$data_train, run_model = FALSE) # View Stan model code stancode(mod) # View Stan model data sdata <- standata(mod) str(sdata) ## End(Not run)
Display conditional effects of one or more numeric and/or categorical
predictors in models of class mvgam and jsdgam, including two-way
interaction effects.
## S3 method for class 'mvgam' conditional_effects( x, effects = NULL, type = "expected", points = FALSE, rug = FALSE, ... ) ## S3 method for class 'mvgam_conditional_effects' plot(x, plot = TRUE, ask = FALSE, ...) ## S3 method for class 'mvgam_conditional_effects' print(x, ...)## S3 method for class 'mvgam' conditional_effects( x, effects = NULL, type = "expected", points = FALSE, rug = FALSE, ... ) ## S3 method for class 'mvgam_conditional_effects' plot(x, plot = TRUE, ask = FALSE, ...) ## S3 method for class 'mvgam_conditional_effects' print(x, ...)
x |
Object of class |
effects |
An optional character vector naming effects (main effects or
interactions) for which to compute conditional plots. Interactions are
specified by a |
type |
|
points |
|
rug |
|
... |
other arguments to pass to |
plot |
Logical; indicates if plots should be
plotted directly in the active graphic device.
Defaults to |
ask |
|
This function acts as a wrapper to the more flexible
plot_predictions. When creating
conditional_effects for a particular predictor (or interaction of
two predictors), one has to choose the values of all other predictors to
condition on. By default, the mean is used for continuous variables and the
reference category is used for factors. Use
plot_predictions to change these and create
more bespoke conditional effects plots.
conditional_effects returns an object of class
mvgam_conditional_effects which is a named list with one slot per
effect containing a ggplot object, which can be
further customized using the ggplot2 package. The corresponding
plot method will draw these plots in the active graphic device.
Nicholas J Clark
## Not run: # Simulate some data simdat <- sim_mvgam( family = poisson(), seasonality = 'hierarchical' ) # Fit a model mod <- mvgam( y ~ s(season, by = series, k = 5) + year:series, family = poisson(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot all main effects on the response scale conditional_effects(mod) # Change the prediction interval to 70% using plot_predictions() argument # 'conf_level' conditional_effects(mod, conf_level = 0.7) # Plot all main effects on the link scale conditional_effects(mod, type = 'link') # Works the same for smooth terms, including smooth interactions set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), data = dat, family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') # ggplot objects can be modified and combined with the help of many # additional packages. Here is an example using the patchwork package # Simulate some nonlinear data dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Extract the list of ggplot conditional_effect plots m <- plot(conditional_effects(mod), plot = FALSE) # Add custom labels and arrange plots together using patchwork::wrap_plots() library(patchwork) library(ggplot2) wrap_plots( m[[1]] + labs(title = 's(x1, bs = "moi")'), m[[2]] + labs(title = 'te(x0, x2)') ) ## End(Not run)## Not run: # Simulate some data simdat <- sim_mvgam( family = poisson(), seasonality = 'hierarchical' ) # Fit a model mod <- mvgam( y ~ s(season, by = series, k = 5) + year:series, family = poisson(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot all main effects on the response scale conditional_effects(mod) # Change the prediction interval to 70% using plot_predictions() argument # 'conf_level' conditional_effects(mod, conf_level = 0.7) # Plot all main effects on the link scale conditional_effects(mod, type = 'link') # Works the same for smooth terms, including smooth interactions set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ te(x0, x1, k = 5) + s(x2, k = 6) + s(x3, k = 6), data = dat, family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod) conditional_effects(mod, conf_level = 0.5, type = 'link') # ggplot objects can be modified and combined with the help of many # additional packages. Here is an example using the patchwork package # Simulate some nonlinear data dat <- mgcv::gamSim(1, n = 200, scale = 2) mod <- mvgam( y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Extract the list of ggplot conditional_effect plots m <- plot(conditional_effects(mod), plot = FALSE) # Add custom labels and arrange plots together using patchwork::wrap_plots() library(patchwork) library(ggplot2) wrap_plots( m[[1]] + labs(title = 's(x1, bs = "moi")'), m[[2]] + labs(title = 'te(x0, x2)') ) ## End(Not run)
Set up time-varying (dynamic) coefficients for use in mvgam models. Currently, only low-rank Gaussian Process smooths are available for estimating the dynamics of the time-varying coefficient.
dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)dynamic(variable, k, rho = 5, stationary = TRUE, scale = TRUE)
variable |
The variable that the dynamic smooth will be a function of |
k |
Optional number of basis functions for computing approximate GPs.
If missing, |
rho |
Either a positive numeric stating the length scale to be used for
approximating the squared exponential Gaussian Process smooth (see
|
stationary |
Logical. If |
scale |
Logical; If |
mvgam currently sets up dynamic coefficients as low-rank
squared exponential Gaussian Process smooths via the call
s(time, by = variable, bs = "gp", m = c(2, rho, 2)). These smooths,
if specified with reasonable values for the length scale parameter, will
give more realistic out of sample forecasts than standard splines such as
thin plate or cubic. But the user must set the value for rho, as there
is currently no support for estimating this value in mgcv. This may
not be too big of a problem, as estimating latent length scales is often
difficult anyway. The rho parameter should be thought of as a prior
on the smoothness of the latent dynamic coefficient function (where higher
values of rho lead to smoother functions with more temporal
covariance structure). Values of k are set automatically to ensure
enough basis functions are used to approximate the expected wiggliness of
the underlying dynamic function (k will increase as rho
decreases).
a list object for internal usage in 'mvgam'
Nicholas J Clark
## Not run: # Simulate a time-varying coefficient # (as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 # A function to simulate from a squared exponential Gaussian Process sim_gp <- function(N, c, alpha, rho) { Sigma <- alpha ^ 2 * exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + diag(1e-9, N) c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) } beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) plot( beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred' ) # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot( out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred' ) # Gather into a data.frame and fit a dynamic coefficient model data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190, ] data_test <- data[191:200, ] # Fit a model using the dynamic function mod <- mvgam( out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train, chains = 2, silent = 2 ) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2) ## End(Not run)## Not run: # Simulate a time-varying coefficient # (as a Gaussian Process with length scale = 10) set.seed(1111) N <- 200 # A function to simulate from a squared exponential Gaussian Process sim_gp <- function(N, c, alpha, rho) { Sigma <- alpha ^ 2 * exp(-0.5 * ((outer(1:N, 1:N, "-") / rho) ^ 2)) + diag(1e-9, N) c + mgcv::rmvn(1, mu = rep(0, N), V = Sigma) } beta <- sim_gp(alpha = 0.75, rho = 10, c = 0.5, N = N) plot( beta, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Coefficient', col = 'darkred' ) # Simulate the predictor as a standard normal predictor <- rnorm(N, sd = 1) # Simulate a Gaussian outcome variable out <- rnorm(N, mean = 4 + beta * predictor, sd = 0.25) time <- seq_along(predictor) plot( out, type = 'l', lwd = 3, bty = 'l', xlab = 'Time', ylab = 'Outcome', col = 'darkred' ) # Gather into a data.frame and fit a dynamic coefficient model data <- data.frame(out, predictor, time) # Split into training and testing data_train <- data[1:190, ] data_test <- data[191:200, ] # Fit a model using the dynamic function mod <- mvgam( out ~ # mis-specify the length scale slightly as this # won't be known in practice dynamic(predictor, rho = 8, stationary = TRUE), family = gaussian(), data = data_train, chains = 2, silent = 2 ) # Inspect the summary summary(mod) # Plot the time-varying coefficient estimates plot(mod, type = 'smooths') # Extrapolate the coefficient forward in time plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 190, lty = 'dashed', lwd = 2) # Overlay the true simulated time-varying coefficient lines(beta, lwd = 2.5, col = 'white') lines(beta, lwd = 2) ## End(Not run)
Generate evenly weighted ensemble forecast distributions from
mvgam_forecast objects.
ensemble(object, ...) ## S3 method for class 'mvgam_forecast' ensemble(object, ..., ndraws = 5000)ensemble(object, ...) ## S3 method for class 'mvgam_forecast' ensemble(object, ..., ndraws = 5000)
object |
|
... |
More |
ndraws |
Positive integer specifying the number of draws to use from each
forecast distribution for creating the ensemble. If some of the ensemble members have
fewer draws than |
It is widely recognised in the forecasting literature that
combining forecasts from different models often results in improved
forecast accuracy. The simplest way to create an ensemble is to use
evenly weighted combinations of forecasts from the different models.
This is straightforward to do in a Bayesian setting with mvgam as
the posterior MCMC draws contained in each mvgam_forecast object
will already implicitly capture correlations among the temporal posterior
predictions.
An object of class mvgam_forecast containing the ensemble
predictions. This object can be readily used with the supplied S3
functions plot and score.
Nicholas J Clark
plot.mvgam_forecast,
score.mvgam_forecast
## Not run: # Simulate some series and fit a few competing dynamic models set.seed(1) simdat <- sim_mvgam( n_series = 1, prop_trend = 0.6, mu = 1 ) plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test ) m1 <- mvgam( y ~ 1, trend_formula = ~ time + s(season, bs = 'cc', k = 9), trend_model = AR(p = 1), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) m2 <- mvgam( y ~ time, trend_model = RW(), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) # Calculate forecast distributions for each model fc1 <- forecast(m1) fc2 <- forecast(m2) # Generate the ensemble forecast ensemble_fc <- ensemble(fc1, fc2) # Plot forecasts plot(fc1) plot(fc2) plot(ensemble_fc) # Score forecasts score(fc1) score(fc2) score(ensemble_fc) ## End(Not run)## Not run: # Simulate some series and fit a few competing dynamic models set.seed(1) simdat <- sim_mvgam( n_series = 1, prop_trend = 0.6, mu = 1 ) plot_mvgam_series( data = simdat$data_train, newdata = simdat$data_test ) m1 <- mvgam( y ~ 1, trend_formula = ~ time + s(season, bs = 'cc', k = 9), trend_model = AR(p = 1), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) m2 <- mvgam( y ~ time, trend_model = RW(), noncentred = TRUE, data = simdat$data_train, newdata = simdat$data_test, chains = 2, silent = 2 ) # Calculate forecast distributions for each model fc1 <- forecast(m1) fc2 <- forecast(m2) # Generate the ensemble forecast ensemble_fc <- ensemble(fc1, fc2) # Plot forecasts plot(fc1) plot(fc2) plot(ensemble_fc) # Score forecasts score(fc1) score(fc2) score(ensemble_fc) ## End(Not run)
Evaluate forecasts from fitted mvgam objects
eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = "drps", log = FALSE, weights )eval_mvgam( object, n_samples = 5000, eval_timepoint = 3, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) roll_eval_mvgam( object, n_evaluations = 5, evaluation_seq, n_samples = 5000, fc_horizon = 3, n_cores = 1, score = "drps", log = FALSE, weights ) compare_mvgams( model1, model2, n_samples = 1000, fc_horizon = 3, n_evaluations = 10, n_cores = 1, score = "drps", log = FALSE, weights )
object |
|
n_samples |
|
eval_timepoint |
|
fc_horizon |
|
n_cores |
Deprecated. Parallel processing is no longer supported |
score |
|
log |
|
weights |
optional |
n_evaluations |
|
evaluation_seq |
Optional |
model1 |
|
model2 |
|
eval_mvgam may be useful when both repeated fitting of a model
using update.mvgam for exact leave-future-out cross-validation
and approximate leave-future-out cross-validation using lfo_cv
are impractical. The function generates a set of samples representing fixed
parameters estimated from the full mvgam model and latent trend states
at a given point in time. The trends are rolled forward a total of
fc_horizon timesteps according to their estimated state space dynamics
to generate an 'out-of-sample' forecast that is evaluated against the true
observations in the horizon window. This function therefore simulates a
situation where the model's parameters had already been estimated but we have
only observed data up to the evaluation timepoint and would like to generate
forecasts from the latent trends that have been observed up to that timepoint.
Evaluation involves calculating an appropriate Rank Probability Score and a
binary indicator for whether or not the true value lies within the forecast's
90% prediction interval
roll_eval_mvgam sets up a sequence of evaluation timepoints along a rolling
window and iteratively calls eval_mvgam to evaluate 'out-of-sample'
forecasts. Evaluation involves calculating the Rank Probability Scores and a
binary indicator for whether or not the true value lies within the forecast's
90% prediction interval
compare_mvgams automates the evaluation to compare two fitted models using
rolling window forecast evaluation and provides a series of summary plots to
facilitate model selection. It is essentially a wrapper for
roll_eval_mvgam
For eval_mvgam, a list object containing information on
specific evaluations for each series (if using drps or crps as the score)
or a vector of scores when using variogram.
For roll_eval_mvgam, a list object containing information on specific
evaluations for each series as well as a total evaluation summary (taken by
summing the forecast score for each series at each evaluation and averaging
the coverages at each evaluation)
For compare_mvgams, a series of plots comparing forecast Rank Probability
Scores for each competing model. A lower score is preferred. Note however
that it is possible to select a model that ultimately would perform poorly
in true out-of-sample forecasting. For example if a wiggly smooth function
of 'year' is included in the model then this function will be learned prior
to evaluating rolling window forecasts, and the model could generate very
tight predictions as a result. But when forecasting ahead to timepoints that
the model has not seen (i.e. next year), the smooth function will end up
extrapolating, sometimes in very strange and unexpected ways. It is therefore
recommended to only use smooth functions for covariates that are adequately
measured in the data (i.e. 'seasonality', for example) to reduce possible
extrapolation of smooths and let the latent trends in the mvgam model
capture any temporal dependencies in the data. These trends are time series
models and so will provide much more stable forecasts
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(1) dat <- sim_mvgam( T = 75, n_series = 1, prop_trend = 0.75, trend_model = AR(p = 2), family = poisson() ) # Fit an appropriate model mod_ar2 <- mvgam( formula = y ~ s(season, bs = 'cc'), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Fit a less appropriate model mod_rw <- mvgam( formula = y ~ 1, trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score( object = fc_ar2, score = 'drps' ) score_rw <- score( object = fc_rw, score = 'drps' ) sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams( model1 = mod_ar2, model2 = mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5 ) # A more appropriate comparison would be to use approximate # leave-future-out CV to compare forecasts (see ?mvgam::lfo_cv()) ## End(Not run)## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(1) dat <- sim_mvgam( T = 75, n_series = 1, prop_trend = 0.75, trend_model = AR(p = 2), family = poisson() ) # Fit an appropriate model mod_ar2 <- mvgam( formula = y ~ s(season, bs = 'cc'), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Fit a less appropriate model mod_rw <- mvgam( formula = y ~ 1, trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2 ) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score( object = fc_ar2, score = 'drps' ) score_rw <- score( object = fc_rw, score = 'drps' ) sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Use rolling evaluation for approximate comparisons of 3-step ahead # forecasts across the training period compare_mvgams( model1 = mod_ar2, model2 = mod_rw, fc_horizon = 3, n_samples = 1000, n_evaluations = 5 ) # A more appropriate comparison would be to use approximate # leave-future-out CV to compare forecasts (see ?mvgam::lfo_cv()) ## End(Not run)
Compute forecast error variance decompositions from
mvgam models with Vector Autoregressive dynamics
fevd(object, ...) ## S3 method for class 'mvgam' fevd(object, h = 10, ...)fevd(object, ...) ## S3 method for class 'mvgam' fevd(object, h = 10, ...)
object |
|
... |
ignored |
h |
Positive |
See mvgam_fevd-class for a full description of the quantities that are
computed and returned by this function, along with key references.
Nicholas J Clark
Lütkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. 2nd ed. Springer-Verlag Berlin Heidelberg.
VAR(), irf(), stability(), mvgam_fevd-class
## Not run: # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = "all") # Fit a model that uses a latent VAR(1) mod <- mvgam( formula = y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # View a summary of the error variance decompositions summary(fevds) ## End(Not run)## Not run: # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = "all") # Fit a model that uses a latent VAR(1) mod <- mvgam( formula = y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # View a summary of the error variance decompositions summary(fevds) ## End(Not run)
This method extracts posterior estimates of the fitted values (i.e. the actual predictions, including estimates for any trend states, that were obtained when fitting the model). It also includes an option for obtaining summaries of the computed draws.
## S3 method for class 'mvgam' fitted( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )## S3 method for class 'mvgam' fitted( object, process_error = TRUE, scale = c("response", "linear"), summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
An object of class |
process_error |
Logical. If |
scale |
Either |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
Further arguments passed to |
This method gives the actual fitted values from the model (i.e. what
you will see if you generate hindcasts from the fitted model using
hindcast.mvgam with type = 'expected'). These predictions
can be overly precise if a flexible dynamic trend component was included in
the model. This is in contrast to the set of predict functions (i.e.
posterior_epred.mvgam or predict.mvgam), which
will assume any dynamic trend component has reached stationarity when
returning hypothetical predictions.
An array of predicted mean response values.
If summary = FALSE the output resembles those of
posterior_epred.mvgam and predict.mvgam.
If summary = TRUE the output is an n_observations x E
matrix. The number of summary statistics E is equal to 2 +
length(probs): The Estimate column contains point estimates (either
mean or median depending on argument robust), while the
Est.Error column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust). The remaining columns starting with Q contain
quantile estimates as specified via argument probs.
Nicholas J Clark
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract fitted values (posterior expectations) expectations <- fitted(mod) str(expectations) ## End(Not run)
mvgam objectExtract or compute hindcasts and forecasts for a fitted
mvgam object
## S3 method for class 'mvgam' forecast(object, newdata, data_test, n_cores = 1, type = "response", ...)## S3 method for class 'mvgam' forecast(object, newdata, data_test, n_cores = 1, type = "response", ...)
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
n_cores |
Deprecated. Parallel processing is no longer supported |
type |
When this has the value |
... |
Ignored |
Posterior predictions are drawn from the fitted mvgam and
used to simulate a forecast distribution
An object of class mvgam_forecast containing hindcast and
forecast distributions. See mvgam_forecast-class for
details.
hindcast.mvgam(), plot.mvgam_forecast(),
summary.mvgam_forecast(), score.mvgam_forecast()
ensemble.mvgam_forecast()
## Not run: # Simulate data with 3 series and AR trend model simdat <- sim_mvgam(n_series = 3, trend_model = AR()) # Fit mvgam model mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) # Use summary() to extract hindcasts / forecasts for custom plotting head(summary(hc), 12) # Or just use the plot() function for quick plots plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) ## End(Not run)## Not run: # Simulate data with 3 series and AR trend model simdat <- sim_mvgam(n_series = 3, trend_model = AR()) # Fit mvgam model mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) # Use summary() to extract hindcasts / forecasts for custom plotting head(summary(hc), 12) # Or just use the plot() function for quick plots plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) head(summary(fc), 12) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) ## End(Not run)
Extract formulae from mvgam objects
## S3 method for class 'mvgam' formula(x, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' formula(x, trend_effects = FALSE, ...)## S3 method for class 'mvgam' formula(x, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' formula(x, trend_effects = FALSE, ...)
x |
|
trend_effects |
|
... |
Ignored |
A formula object
Nicholas J Clark
This function lists the parameters that can have their prior distributions changed for a given model, as well listing their default distributions
get_mvgam_priors( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = "None", family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... )get_mvgam_priors( formula, trend_formula, factor_formula, knots, trend_knots, trend_model = "None", family = poisson(), data, unit = time, species = series, use_lv = FALSE, n_lv, trend_map, ... )
formula |
A In |
trend_formula |
An optional Important notes:
|
factor_formula |
Can be supplied instead |
knots |
An optional |
trend_knots |
As for |
trend_model |
Available options:
Additional features:
|
family |
Supported families:
See |
data |
A Required columns for most models:
Special cases:
|
unit |
The unquoted name of the variable that represents the unit of
analysis in |
species |
The unquoted name of the |
use_lv |
|
n_lv |
|
trend_map |
Optional Required structure:
Notes:
|
... |
Not currently used |
Users can supply a model formula, prior to fitting the model, so
that default priors can be inspected and altered. To make alterations,
change the contents of the prior column and supplying this
data.frame to the mvgam or jsdgam
functions using the argument priors. If using Stan as the backend,
users can also modify the parameter bounds by modifying the
new_lowerbound and/or new_upperbound columns. This will be necessary
if using restrictive distributions on some parameters, such as a Beta
distribution for the trend sd parameters for example (Beta only has
support on (0,1)), so the upperbound cannot be above 1. Another
option is to make use of the prior modification functions in brms
(i.e. prior) to change prior distributions and bounds
(just use the name of the parameter that you'd like to change as the
class argument; see examples below)
either a data.frame containing the prior definitions (if any
suitable priors can be altered by the user) or NULL, indicating
that no priors in the model can be modified
Only the prior, new_lowerbound and/or new_upperbound columns of
the output should be altered when defining the user-defined priors for
the model. Use only if you are familiar with the underlying probabilistic
programming language. There are no sanity checks done to ensure that the
code is legal (i.e. to check that lower bounds are smaller than upper
bounds, for example)
Nicholas J Clark
## Not run: # ======================================================================== # Example 1: Simulate data and inspect default priors # ======================================================================== dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not # always necessary, but this can be useful for testing whether your # updated priors are written correctly) mod_default <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), run_model = FALSE ) # Inspect the model file with default mvgam priors stancode(mod_default) # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2) ) test_priors # ======================================================================== # Example 2: Modify priors manually # ======================================================================== # Make a few changes; first, change the population mean for the # series-level random intercepts test_priors$prior[2] <- "mu_raw ~ normal(0.2, 0.5);" # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- "ar2 ~ normal(0, 0.25);" # Check that the changes are made to the model file without any warnings # by setting 'run_model = FALSE' mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # No warnings, the model is ready for fitting now in the usual way with # the addition of the 'priors' argument # ======================================================================== # Example 3: Use brms syntax for prior modification # ======================================================================== # The same can be done using 'brms' functions; here we will also change # the ar1 prior and put some bounds on the ar coefficients to enforce # stationarity; we set the prior using the 'class' argument in all brms # prior functions brmsprior <- c( prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1) ) brmsprior mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = brmsprior, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 4: Error handling example # ======================================================================== # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- "ar2_bananas ~ normal(0, 0.25);" mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 5: Parametric (fixed effect) priors # ======================================================================== simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors( y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = AR() ) # Change priors for the intercept and fake covariate effects priors$prior[1] <- "(Intercept) ~ normal(0, 1);" priors$prior[2] <- "cov ~ normal(0, 0.1);" mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = priors, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 6: Alternative brms syntax for fixed effects # ======================================================================== # Likewise using 'brms' utilities (note that you can use Intercept rather # than `(Intercept)`) to change priors on the intercept brmsprior <- c( prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept) ) brmsprior mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = brmsprior, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 7: Bulk prior assignment # ======================================================================== # The "class = 'b'" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam( y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = "b"), data = dat, family = gaussian(), run_model = FALSE ) stancode(mod) ## End(Not run)## Not run: # ======================================================================== # Example 1: Simulate data and inspect default priors # ======================================================================== dat <- sim_mvgam(trend_rel = 0.5) # Get a model file that uses default mvgam priors for inspection (not # always necessary, but this can be useful for testing whether your # updated priors are written correctly) mod_default <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), run_model = FALSE ) # Inspect the model file with default mvgam priors stancode(mod_default) # Look at which priors can be updated in mvgam test_priors <- get_mvgam_priors( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2) ) test_priors # ======================================================================== # Example 2: Modify priors manually # ======================================================================== # Make a few changes; first, change the population mean for the # series-level random intercepts test_priors$prior[2] <- "mu_raw ~ normal(0.2, 0.5);" # Now use stronger regularisation for the series-level AR2 coefficients test_priors$prior[5] <- "ar2 ~ normal(0, 0.25);" # Check that the changes are made to the model file without any warnings # by setting 'run_model = FALSE' mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # No warnings, the model is ready for fitting now in the usual way with # the addition of the 'priors' argument # ======================================================================== # Example 3: Use brms syntax for prior modification # ======================================================================== # The same can be done using 'brms' functions; here we will also change # the ar1 prior and put some bounds on the ar coefficients to enforce # stationarity; we set the prior using the 'class' argument in all brms # prior functions brmsprior <- c( prior(normal(0.2, 0.5), class = mu_raw), prior(normal(0, 0.25), class = ar1, lb = -1, ub = 1), prior(normal(0, 0.25), class = ar2, lb = -1, ub = 1) ) brmsprior mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = brmsprior, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 4: Error handling example # ======================================================================== # Look at what is returned when an incorrect spelling is used test_priors$prior[5] <- "ar2_bananas ~ normal(0, 0.25);" mod <- mvgam( y ~ s(series, bs = "re") + s(season, bs = "cc") - 1, family = nb(), data = dat$data_train, trend_model = AR(p = 2), priors = test_priors, run_model = FALSE ) stancode(mod) # ======================================================================== # Example 5: Parametric (fixed effect) priors # ======================================================================== simdat <- sim_mvgam() # Add a fake covariate simdat$data_train$cov <- rnorm(NROW(simdat$data_train)) priors <- get_mvgam_priors( y ~ cov + s(season), data = simdat$data_train, family = poisson(), trend_model = AR() ) # Change priors for the intercept and fake covariate effects priors$prior[1] <- "(Intercept) ~ normal(0, 1);" priors$prior[2] <- "cov ~ normal(0, 0.1);" mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = priors, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 6: Alternative brms syntax for fixed effects # ======================================================================== # Likewise using 'brms' utilities (note that you can use Intercept rather # than `(Intercept)`) to change priors on the intercept brmsprior <- c( prior(normal(0.2, 0.5), class = cov), prior(normal(0, 0.25), class = Intercept) ) brmsprior mod2 <- mvgam( y ~ cov + s(season), data = simdat$data_train, trend_model = AR(), family = poisson(), priors = brmsprior, run_model = FALSE ) stancode(mod2) # ======================================================================== # Example 7: Bulk prior assignment # ======================================================================== # The "class = 'b'" shortcut can be used to put the same prior on all # 'fixed' effect coefficients (apart from any intercepts) set.seed(0) dat <- mgcv::gamSim(1, n = 200, scale = 2) dat$time <- 1:NROW(dat) mod <- mvgam( y ~ x0 + x1 + s(x2) + s(x3), priors = prior(normal(0, 0.75), class = "b"), data = dat, family = gaussian(), run_model = FALSE ) stancode(mod) ## End(Not run)
Set up low-rank approximate Gaussian Process trend models using Hilbert basis expansions in mvgam. This function does not evaluate its arguments – it exists purely to help set up a model with particular GP trend models.
GP(...)GP(...)
... |
unused |
A GP trend is estimated for each series using Hilbert space
approximate Gaussian Processes. In mvgam, latent squared exponential GP
trends are approximated using by default 20 basis functions and
using a multiplicative factor of c = 5/4, which saves computational
costs compared to fitting full GPs while adequately estimating GP
alpha and rho parameters.
An object of class mvgam_trend, which contains a list of
arguments to be interpreted by the parsing functions in mvgam.
Nicholas J Clark
Riutort-Mayol G, Burkner PC, Andersen MR, Solin A and Vehtari A (2023). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Statistics and Computing 33, 1. https://doi.org/10.1007/s11222-022-10167-2
These evaluation and plotting functions exist to allow some popular gratia
methods to work with mvgam or jsdgam models
drawDotmvgam( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) eval_smoothDothilbertDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmodDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmoiDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... )drawDotmvgam( object, trend_effects = FALSE, data = NULL, select = NULL, parametric = FALSE, terms = NULL, residuals = FALSE, scales = c("free", "fixed"), ci_level = 0.95, n = 100, n_3d = 16, n_4d = 4, unconditional = FALSE, overall_uncertainty = TRUE, constant = NULL, fun = NULL, dist = 0.1, rug = TRUE, contour = TRUE, grouped_by = FALSE, ci_alpha = 0.2, ci_col = "black", smooth_col = "black", resid_col = "steelblue3", contour_col = "black", n_contour = NULL, partial_match = FALSE, discrete_colour = NULL, discrete_fill = NULL, continuous_colour = NULL, continuous_fill = NULL, position = "identity", angle = NULL, ncol = NULL, nrow = NULL, guides = "keep", widths = NULL, heights = NULL, crs = NULL, default_crs = NULL, lims_method = "cross", wrap = TRUE, envir = environment(formula(object)), ... ) eval_smoothDothilbertDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmodDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... ) eval_smoothDotmoiDotsmooth( smooth, model, n = 100, n_3d = NULL, n_4d = NULL, data = NULL, unconditional = FALSE, overall_uncertainty = TRUE, dist = NULL, ... )
object |
a fitted mvgam, the result of a call to |
trend_effects |
logical specifying whether smooth terms from the |
data |
a data frame of covariate values at which to evaluate the model's smooth functions |
select |
character, logical, or numeric; which smooths to plot. If
|
parametric |
logical; plot parametric terms also? Note that |
terms |
character; which model parametric terms should be drawn? The
Default of |
residuals |
currently ignored for |
scales |
character; should all univariate smooths be plotted with the
same y-axis scale? If Currently does not affect the y-axis scale of plots of the parametric terms |
ci_level |
numeric between 0 and 1; the coverage of credible interval. |
n |
numeric; the number of points over the range of the covariate at which to evaluate the smooth |
n_3d, n_4d
|
numeric; the number of points over the range of last
covariate in a 3D or 4D smooth. The default is |
unconditional |
ignored for |
overall_uncertainty |
ignored for |
constant |
numeric; a constant to add to the estimated values of the
smooth. |
fun |
function; a function that will be applied to the estimated values
and confidence interval before plotting. Can be a function or the name of a
function. Function |
dist |
numeric; if greater than 0, this is used to determine when
a location is too far from data to be plotted when plotting 2-D smooths.
The data are scaled into the unit square before deciding what to exclude,
and |
rug |
logical; draw a rug plot at the bottom of each plot for 1-D smooths or plot locations of data for higher dimensions. |
contour |
logical; should contours be draw on the plot using
|
grouped_by |
logical; should factor by smooths be drawn as one panel
per level of the factor ( |
ci_alpha |
numeric; alpha transparency for confidence or simultaneous interval |
ci_col |
colour specification for the confidence/credible intervals band. Affects the fill of the interval |
smooth_col |
colour specification for the smooth line |
resid_col |
colour specification for residual points. Ignored |
contour_col |
colour specification for contour lines |
n_contour |
numeric; the number of contour bins. Will result in
|
partial_match |
logical; should smooths be selected by partial matches
with |
discrete_colour |
a suitable colour scale to be used when plotting discrete variables |
discrete_fill |
a suitable fill scale to be used when plotting discrete variables. |
continuous_colour |
a suitable colour scale to be used when plotting continuous variables |
continuous_fill |
a suitable fill scale to be used when plotting continuous variables |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function |
angle |
numeric; the angle at which the x axis tick labels are to be
drawn passed to the |
ncol, nrow
|
numeric; the numbers of rows and columns over which to spread the plots |
guides |
character; one of |
widths, heights
|
The relative widths and heights of each column and
row in the grid. Will get repeated to match the dimensions of the grid. If
there is more than 1 plot and |
crs |
the coordinate reference system (CRS) to use for the plot. All
data will be projected into this CRS. See |
default_crs |
the coordinate reference system (CRS) to use for the
non-sf layers in the plot. If left at the default |
lims_method |
character; affects how the axis limits are determined. See
|
wrap |
logical; wrap plots as a patchwork? If |
envir |
an environment to look up the data within |
... |
additional arguments passed to other methods |
smooth |
a smooth object of class |
model |
a fitted |
These methods allow mvgam models to be Enhanced if users have the gratia
package installed, making available the popular draw() function to plot partial effects
of mvgam smooth functions using ggplot2::ggplot() utilities
Nicholas J Clark
## Not run: # Fit a simple GAM and draw partial effects of smooths using 'gratia' set.seed(0) dat <- mgcv::gamSim( eg = 1, n = 200, scale = 2 ) mod <- mvgam( formula = y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) if (require("gratia")) { gratia::draw(mod) } ## End(Not run)## Not run: # Fit a simple GAM and draw partial effects of smooths using 'gratia' set.seed(0) dat <- mgcv::gamSim( eg = 1, n = 200, scale = 2 ) mod <- mvgam( formula = y ~ s(x1, bs = 'moi') + te(x0, x2), data = dat, family = gaussian(), chains = 2, silent = 2 ) if (require("gratia")) { gratia::draw(mod) } ## End(Not run)
mvgam objectExtract hindcasts for a fitted mvgam object
hindcast(object, ...) ## S3 method for class 'mvgam' hindcast(object, type = "response", ...)hindcast(object, ...) ## S3 method for class 'mvgam' hindcast(object, type = "response", ...)
object |
|
... |
Ignored |
type |
When this has the value |
Posterior hindcasts (i.e. retrodictions) are drawn from the fitted mvgam and
organized into a convenient format for plotting
An object of class mvgam_forecast containing hindcast distributions.
See mvgam_forecast-class for details.
plot.mvgam_forecast(), summary.mvgam_forecast(),
forecast.mvgam(), fitted.mvgam(), predict.mvgam()
## Not run: simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) ## End(Not run)## Not run: simdat <- sim_mvgam(n_series = 3, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) # Hindcasts on response scale hc <- hindcast(mod) str(hc) head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Hindcasts as expectations hc <- hindcast(mod, type = 'expected') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Estimated latent trends hc <- hindcast(mod, type = 'trend') head(summary(hc), 12) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) ## End(Not run)
Create a brief but fully referenced methods description, along with a useful
list of references, for fitted mvgam and jsdgam models.
how_to_cite(object, ...) ## S3 method for class 'mvgam' how_to_cite(object, ...)how_to_cite(object, ...) ## S3 method for class 'mvgam' how_to_cite(object, ...)
object |
|
... |
ignored |
This function uses the model's structure to come up with a very basic but hopefully useful methods description that can help users to appropriately acknowledge the hard work of developers and champion open science. Please do not consider the text returned by this function to be a completely adequate methods section; it is only meant to get you started.
An object of class how_to_cite containing a text description
of the methods as well as lists of both primary and additional references.
Nicholas J Clark
## Not run: #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and a VAR(1) dynamic process #-------------------------------------------------- set.seed(0) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = VAR(cor = TRUE), family = gaussian() ) # Fit an appropriate model mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = VAR(cor = TRUE), chains = 2, silent = 2 ) how_to_cite(mod1) #-------------------------------------------------- # For a GP example, simulate data using the mgcv package #-------------------------------------------------- dat <- mgcv::gamSim(1, n = 30, scale = 2) # Fit a model that uses an approximate GP from brms mod2 <- mvgam( y ~ gp(x2, k = 12), data = dat, family = gaussian(), chains = 2, silent = 2 ) how_to_cite(mod2) ## End(Not run)## Not run: #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and a VAR(1) dynamic process #-------------------------------------------------- set.seed(0) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = VAR(cor = TRUE), family = gaussian() ) # Fit an appropriate model mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = VAR(cor = TRUE), chains = 2, silent = 2 ) how_to_cite(mod1) #-------------------------------------------------- # For a GP example, simulate data using the mgcv package #-------------------------------------------------- dat <- mgcv::gamSim(1, n = 30, scale = 2) # Fit a model that uses an approximate GP from brms mod2 <- mvgam( y ~ gp(x2, k = 12), data = dat, family = gaussian(), chains = 2, silent = 2 ) how_to_cite(mod2) ## End(Not run)
mvgam objectsIndex mvgam objects
## S3 method for class 'mvgam' variables(x, ...)## S3 method for class 'mvgam' variables(x, ...)
x |
|
... |
Arguments passed to individual methods (if applicable). |
a list object of the variables that can be extracted, along
with their aliases
Nicholas J Clark
## Not run: # Simulate data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract model variables variables(mod) ## End(Not run)## Not run: # Simulate data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract model variables variables(mod) ## End(Not run)
Compute Generalized or Orthogonalized Impulse Response Functions (IRFs) from
mvgam models with Vector Autoregressive dynamics
irf(object, ...) ## S3 method for class 'mvgam' irf(object, h = 10, cumulative = FALSE, orthogonal = FALSE, ...)irf(object, ...) ## S3 method for class 'mvgam' irf(object, h = 10, cumulative = FALSE, orthogonal = FALSE, ...)
object |
|
... |
ignored |
h |
Positive |
cumulative |
|
orthogonal |
|
See mvgam_irf-class for a full description of the quantities that are
computed and returned by this function, along with key references.
An object of mvgam_irf-class containing the posterior IRFs. This
object can be used with the supplied S3 functions plot.mvgam_irf()
and summary.mvgam_irf()
Nicholas J Clark
mvgam_irf-class, VAR(), plot.mvgam_irf(), stability(), fevd()
## Not run: # Fit a model to the portal time series that uses a latent VAR(1) mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate Generalized IRFs for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot them plot(irfs, series = 1) plot(irfs, series = 2) plot(irfs, series = 3) plot(irfs, series = 4) # Calculate posterior median, upper and lower 95th quantiles # of the impulse responses summary(irfs) ## End(Not run)## Not run: # Fit a model to the portal time series that uses a latent VAR(1) mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets # correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calulate Generalized IRFs for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot them plot(irfs, series = 1) plot(irfs, series = 2) plot(irfs, series = 3) plot(irfs, series = 4) # Calculate posterior median, upper and lower 95th quantiles # of the impulse responses summary(irfs) ## End(Not run)
This function sets up a Joint Species Distribution Model whereby the residual associations among species can be modelled in a reduced-rank format using a set of latent factors. The factor specification is extremely flexible, allowing users to include spatial, temporal or any other type of predictor effects to more efficiently capture unmodelled residual associations, while the observation model can also be highly flexible (including all smooth, GP and other effects that mvgam can handle)
jsdgam( formula, factor_formula = ~-1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, residuals = TRUE, ... )jsdgam( formula, factor_formula = ~-1, knots, factor_knots, data, newdata, family = poisson(), unit = time, species = series, share_obs_params = FALSE, priors, n_lv = 2, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, silent = 1, run_model = TRUE, return_model_data = FALSE, residuals = TRUE, ... )
formula |
A |
factor_formula |
A |
knots |
An optional |
factor_knots |
An optional |
data |
A |
newdata |
Optional |
family |
Default is |
unit |
The unquoted name of the variable that represents the unit of
analysis in |
species |
The unquoted name of the |
share_obs_params |
|
priors |
An optional |
n_lv |
|
backend |
Character string naming the package for Stan model fitting.
Options are |
algorithm |
Character string naming the estimation approach:
Can be set globally via |
control |
Named |
chains |
|
burnin |
|
samples |
|
thin |
Thinning interval for monitors. Ignored for variational inference algorithms. |
parallel |
|
threads |
|
silent |
Verbosity level between |
run_model |
|
return_model_data |
|
residuals |
|
... |
Other arguments to pass to mvgam |
Joint Species Distribution Models allow for responses of multiple
species to be learned hierarchically, whereby responses to environmental
variables in formula can be partially pooled and any latent, unmodelled
residual associations can also be learned. In mvgam, both of these
effects can be modelled with the full power of latent factor Hierarchical
GAMs, providing unmatched flexibility to model full communities of species.
When calling jsdgam, an initial State-Space model using trend = 'None' is
set up and then modified to include the latent factors and their linear
predictors. Consequently, you can inspect priors for these models using
get_mvgam_priors by supplying the relevant formula, factor_formula,
data and family arguments and keeping the default trend = 'None'.
In a JSDGAM, the expectation of response is modelled with
where is a known link function,
is a design matrix of linear predictors (with associated
coefficients), are -variate latent factors
(<<) and are species-specific
loadings on the latent factors, respectively. The design matrix and
coefficients are constructed and modelled using formula and
can contain any of mvgam's predictor effects, including random intercepts
and slopes, multidimensional penalized smooths, GP effects etc... The factor
loadings are constrained for identifiability but can be used
to reconstruct an estimate of the species' residual variance-covariance
matrix using (see the example below and
residual_cor() for details). The latent factors are further modelled using:
where the second design matrix and associated
coefficients are constructed and modelled using factor_formula. Again, the
effects that make up this linear predictor can contain any of mvgam's
allowed predictor effects, providing enormous flexibility for modelling
species' communities.
A list object of class mvgam containing model output,
the text representation of the model file, the mgcv model output (for easily
generating simulations at unsampled covariate values), Dunn-Smyth residuals
for each species and key information needed for other functions in the
package. See mvgam-class for details. Use
methods(class = "mvgam") for an overview on available methods
Nicholas J Clark
Nicholas J Clark & Konstans Wells (2023). Dynamic generalised
additive models (DGAMs) for forecasting discrete ecological time series.
Methods in Ecology and Evolution. 14:3, 771-784.
David I Warton, F Guillaume Blanchet, Robert B O'Hara, Otso Ovaskainen, Sara
Taskinen, Steven C Walker & Francis KC Hui (2015). So many variables: joint
modeling in community ecology. Trends in Ecology & Evolution 30:12, 766-779.
## Not run: # ======================================================================== # Example 1: Basic JSDGAM with Portal Data # ======================================================================== # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot covariate effects library(ggplot2); theme_set(theme_bw()) plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series') ) plot_predictions( mod, condition = c('mintemp', 'series', 'series') ) # A residual correlation plot plot(residual_cor(mod)) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod, alpha = 0.7) } # ======================================================================== # Example 2: Advanced JSDGAM with Spatial Predictors # ======================================================================== # Simulate latent count data for 500 spatial locations and 10 species set.seed(0) N_points <- 500 N_species <- 10 # Species-level intercepts (on the log scale) alphas <- runif(N_species, 2, 2.25) # Simulate a covariate and species-level responses to it temperature <- rnorm(N_points) betas <- runif(N_species, -0.5, 0.5) # Simulate points uniformly over a space lon <- runif(N_points, min = 150, max = 155) lat <- runif(N_points, min = -20, max = -19) # Set up spatial basis functions as a tensor product of lat and lon sm <- mgcv::smoothCon( mgcv::te(lon, lat, k = 5), data = data.frame(lon, lat), knots = NULL )[[1]] # The design matrix for this smooth is in the 'X' slot des_mat <- sm$X dim(des_mat) # Function to generate a random covariance matrix where all variables # have unit variance (i.e. diagonals are all 1) random_Sigma = function(N){ L_Omega <- matrix(0, N, N); L_Omega[1, 1] <- 1; for (i in 2 : N) { bound <- 1; for (j in 1 : (i - 1)) { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); bound <- bound - L_Omega[i, j] ^ 2; } L_Omega[i, i] <- sqrt(bound); } Sigma <- L_Omega %*% t(L_Omega); return(Sigma) } # Simulate a variance-covariance matrix for the correlations among # basis coefficients Sigma <- random_Sigma(N = NCOL(des_mat)) # Now simulate the species-level basis coefficients hierarchically, where # spatial basis function correlations are a convex sum of a base correlation # matrix and a species-level correlation matrix basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) for(t in 1:N_species){ corOmega <- (cov2cor(Sigma) * 0.7) + (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) } # Simulate the latent spatial processes st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ data.frame( lat = lat, lon = lon, species = paste0('species_', t), temperature = temperature, process = alphas[t] + betas[t] * temperature + des_mat %*% basis_coefs[t,] ) })) # Now take noisy observations at some of the points (60) obs_points <- sample(1:N_points, size = 60, replace = FALSE) obs_points <- data.frame( lat = lat[obs_points], lon = lon[obs_points], site = 1:60 ) # Keep only the process data at these points st_process %>% dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>% # now take noisy Poisson observations of the process dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) %>% dplyr::group_by(lat, lon) -> dat # View the count distributions for each species ggplot(dat, aes(x = count)) + geom_histogram() + facet_wrap(~ species, scales = 'free') ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) + geom_point(size = 2.25) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() # ------------------------------------------------------------------------ # Model Fitting with Custom Priors # ------------------------------------------------------------------------ # Inspect default priors for a joint species model with three spatial factors priors <- get_mvgam_priors( formula = count ~ # Environmental model includes random slopes for # a linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # The data and grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson() ) head(priors) # Fit a JSDM that estimates hierarchical temperature responses # and that uses three latent spatial factors mod <- jsdgam( formula = count ~ # Environmental model includes random slopes for a # linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # Change default priors for fixed random effect variances and # factor GP marginal deviations to standard normal priors = c( prior(std_normal(), class = sigma_raw), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend1`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend2`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend3`) ), # The data and the grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson(), chains = 2, silent = 2 ) # ------------------------------------------------------------------------ # Model Visualization and Diagnostics # ------------------------------------------------------------------------ # Plot the implicit species-level intercept estimates plot_predictions(mod, condition = 'species', type = 'link') # Plot species' hierarchical responses to temperature plot_predictions( mod, condition = c('temperature', 'species', 'species'), type = 'link' ) # Plot posterior median estimates of the latent spatial factors plot(mod, type = 'smooths', trend_effects = TRUE) # Or using gratia, if you have it installed if(requireNamespace('gratia', quietly = TRUE)){ gratia::draw(mod, trend_effects = TRUE, dist = 0) } # Plot species' randomized quantile residual distributions # as a function of latitude pp_check( mod, type = 'resid_ribbon_grouped', group = 'species', x = 'lat', ndraws = 200 ) # ------------------------------------------------------------------------ # Residual Correlation Analysis # ------------------------------------------------------------------------ # Calculate residual spatial correlations post_cors <- residual_cor(mod) names(post_cors) # Look at lower and upper credible interval estimates for # some of the estimated correlations post_cors$cor[1:5, 1:5] post_cors$cor_upper[1:5, 1:5] post_cors$cor_lower[1:5, 1:5] # Plot of the posterior median correlations for those estimated # to be non-zero plot(post_cors, cluster = TRUE) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # ------------------------------------------------------------------------ # Model Validation and Prediction # ------------------------------------------------------------------------ # Posterior predictive checks and ELPD-LOO can ascertain model fit pp_check( mod, type = "pit_ecdf_grouped", group = "species", ndraws = 200 ) loo(mod) # Forecast log(counts) for entire region (site value doesn't matter as long # as each spatial location has a different and unique site identifier); # note this calculation takes a few minutes because of the need to calculate # draws from the stochastic latent factors newdata <- st_process %>% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) %>% dplyr::group_by(lat, lon) %>% dplyr::mutate(site = dplyr::cur_group_id()) %>% dplyr::ungroup() preds <- predict(mod, newdata = newdata) # Plot the median log(count) predictions on a grid newdata$log_count <- preds[,1] ggplot(newdata, aes(x = lon, y = lat, col = log_count)) + geom_point(size = 1.5) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic() # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)## Not run: # ======================================================================== # Example 1: Basic JSDGAM with Portal Data # ======================================================================== # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot covariate effects library(ggplot2); theme_set(theme_bw()) plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series') ) plot_predictions( mod, condition = c('mintemp', 'series', 'series') ) # A residual correlation plot plot(residual_cor(mod)) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod, alpha = 0.7) } # ======================================================================== # Example 2: Advanced JSDGAM with Spatial Predictors # ======================================================================== # Simulate latent count data for 500 spatial locations and 10 species set.seed(0) N_points <- 500 N_species <- 10 # Species-level intercepts (on the log scale) alphas <- runif(N_species, 2, 2.25) # Simulate a covariate and species-level responses to it temperature <- rnorm(N_points) betas <- runif(N_species, -0.5, 0.5) # Simulate points uniformly over a space lon <- runif(N_points, min = 150, max = 155) lat <- runif(N_points, min = -20, max = -19) # Set up spatial basis functions as a tensor product of lat and lon sm <- mgcv::smoothCon( mgcv::te(lon, lat, k = 5), data = data.frame(lon, lat), knots = NULL )[[1]] # The design matrix for this smooth is in the 'X' slot des_mat <- sm$X dim(des_mat) # Function to generate a random covariance matrix where all variables # have unit variance (i.e. diagonals are all 1) random_Sigma = function(N){ L_Omega <- matrix(0, N, N); L_Omega[1, 1] <- 1; for (i in 2 : N) { bound <- 1; for (j in 1 : (i - 1)) { L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound)); bound <- bound - L_Omega[i, j] ^ 2; } L_Omega[i, i] <- sqrt(bound); } Sigma <- L_Omega %*% t(L_Omega); return(Sigma) } # Simulate a variance-covariance matrix for the correlations among # basis coefficients Sigma <- random_Sigma(N = NCOL(des_mat)) # Now simulate the species-level basis coefficients hierarchically, where # spatial basis function correlations are a convex sum of a base correlation # matrix and a species-level correlation matrix basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma)) base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma) for(t in 1:N_species){ corOmega <- (cov2cor(Sigma) * 0.7) + (0.3 * cov2cor(random_Sigma(N = NCOL(des_mat)))) basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega) } # Simulate the latent spatial processes st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){ data.frame( lat = lat, lon = lon, species = paste0('species_', t), temperature = temperature, process = alphas[t] + betas[t] * temperature + des_mat %*% basis_coefs[t,] ) })) # Now take noisy observations at some of the points (60) obs_points <- sample(1:N_points, size = 60, replace = FALSE) obs_points <- data.frame( lat = lat[obs_points], lon = lon[obs_points], site = 1:60 ) # Keep only the process data at these points st_process %>% dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>% # now take noisy Poisson observations of the process dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) %>% dplyr::group_by(lat, lon) -> dat # View the count distributions for each species ggplot(dat, aes(x = count)) + geom_histogram() + facet_wrap(~ species, scales = 'free') ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) + geom_point(size = 2.25) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() # ------------------------------------------------------------------------ # Model Fitting with Custom Priors # ------------------------------------------------------------------------ # Inspect default priors for a joint species model with three spatial factors priors <- get_mvgam_priors( formula = count ~ # Environmental model includes random slopes for # a linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # The data and grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson() ) head(priors) # Fit a JSDM that estimates hierarchical temperature responses # and that uses three latent spatial factors mod <- jsdgam( formula = count ~ # Environmental model includes random slopes for a # linear effect of temperature s(species, bs = 're', by = temperature), # Each factor estimates a different nonlinear spatial process, using # 'by = trend' as in other mvgam State-Space models factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1, n_lv = 3, # Change default priors for fixed random effect variances and # factor GP marginal deviations to standard normal priors = c( prior(std_normal(), class = sigma_raw), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend1`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend2`), prior(std_normal(), class = `alpha_gp_trend(lon, lat):trendtrend3`) ), # The data and the grouping variables data = dat, unit = site, species = species, # Poisson observations family = poisson(), chains = 2, silent = 2 ) # ------------------------------------------------------------------------ # Model Visualization and Diagnostics # ------------------------------------------------------------------------ # Plot the implicit species-level intercept estimates plot_predictions(mod, condition = 'species', type = 'link') # Plot species' hierarchical responses to temperature plot_predictions( mod, condition = c('temperature', 'species', 'species'), type = 'link' ) # Plot posterior median estimates of the latent spatial factors plot(mod, type = 'smooths', trend_effects = TRUE) # Or using gratia, if you have it installed if(requireNamespace('gratia', quietly = TRUE)){ gratia::draw(mod, trend_effects = TRUE, dist = 0) } # Plot species' randomized quantile residual distributions # as a function of latitude pp_check( mod, type = 'resid_ribbon_grouped', group = 'species', x = 'lat', ndraws = 200 ) # ------------------------------------------------------------------------ # Residual Correlation Analysis # ------------------------------------------------------------------------ # Calculate residual spatial correlations post_cors <- residual_cor(mod) names(post_cors) # Look at lower and upper credible interval estimates for # some of the estimated correlations post_cors$cor[1:5, 1:5] post_cors$cor_upper[1:5, 1:5] post_cors$cor_lower[1:5, 1:5] # Plot of the posterior median correlations for those estimated # to be non-zero plot(post_cors, cluster = TRUE) # An ordination biplot can also be constructed # from the factor scores and their loadings if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # ------------------------------------------------------------------------ # Model Validation and Prediction # ------------------------------------------------------------------------ # Posterior predictive checks and ELPD-LOO can ascertain model fit pp_check( mod, type = "pit_ecdf_grouped", group = "species", ndraws = 200 ) loo(mod) # Forecast log(counts) for entire region (site value doesn't matter as long # as each spatial location has a different and unique site identifier); # note this calculation takes a few minutes because of the need to calculate # draws from the stochastic latent factors newdata <- st_process %>% dplyr::mutate(species = factor( species, levels = paste0('species_', 1:N_species) )) %>% dplyr::group_by(lat, lon) %>% dplyr::mutate(site = dplyr::cur_group_id()) %>% dplyr::ungroup() preds <- predict(mod, newdata = newdata) # Plot the median log(count) predictions on a grid newdata$log_count <- preds[,1] ggplot(newdata, aes(x = lon, y = lat, col = log_count)) + geom_point(size = 1.5) + facet_wrap(~ species, scales = 'free') + scale_color_viridis_c() + theme_classic() # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)
Approximate leave-future-out cross-validation of fitted mvgam objects
lfo_cv(object, ...) ## S3 method for class 'mvgam' lfo_cv( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... )lfo_cv(object, ...) ## S3 method for class 'mvgam' lfo_cv( object, data, min_t, fc_horizon = 1, pareto_k_threshold = 0.7, silent = 1, ... )
object |
|
... |
Ignored |
data |
A |
min_t |
Integer specifying the minimum training time required before making predictions
from the data. Default is either the |
fc_horizon |
Integer specifying the number of time steps ahead for evaluating forecasts |
pareto_k_threshold |
Proportion specifying the threshold over which the Pareto shape parameter
is considered unstable, triggering a model refit. Default is |
silent |
Verbosity level between |
Approximate leave-future-out cross-validation uses an expanding training window scheme
to evaluate a model on its forecasting ability. The steps used in this function mirror those laid out
in the lfo vignette from the loo package,
written by Paul Bürkner, Jonah Gabry, Aki Vehtari. First, we refit the model using the first min_t
observations to perform a single exact fc_horizon-ahead forecast step. This forecast is evaluated against
the min_t + fc_horizon out of sample observations using the Expected Log Predictive Density (ELPD).
Next, we approximate each successive round of
expanding window forecasts by moving forward one step at a time for i in 1:N_evaluations and re-weighting
draws from the model's posterior predictive distribution using Pareto Smoothed
Importance Sampling (PSIS). In each iteration i, PSIS weights are obtained for the next observation
that would have been included in the model if we had re-fit (i.e. the last observation that would have
been in the training data, or min_t + i). If these importance ratios are stable, we consider the
approximation adequate and use the re-weighted posterior's forecast for evaluating the next holdout
set of testing observations ((min_t + i + 1):(min_t + i + fc_horizon)). At some point the
importance ratio variability will become too large and importance sampling will fail. This is
indicated by the estimated shape parameter k of the generalized Pareto distribution
crossing a certain threshold pareto_k_threshold. Only then do we refit the model using
all of the observations up to the time of the failure. We then restart the process and iterate forward
until the next refit is triggered (Bürkner et al. 2020).
A list of class mvgam_lfo containing the approximate ELPD scores,
the Pareto-k shape values and 'the specified pareto_k_threshold
Nicholas J Clark
Paul-Christian Bürkner, Jonah Gabry & Aki Vehtari (2020). Approximate leave-future-out cross-validation for Bayesian time series models Journal of Statistical Computation and Simulation. 90:14, 2499-2523.
forecast, score, compare_mvgams
## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3, silent = 2) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3, silent = 2) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)## Not run: # Simulate from a Poisson-AR2 model with a seasonal smooth set.seed(100) dat <- sim_mvgam(T = 75, n_series = 1, prop_trend = 0.75, trend_model = 'AR2', family = poisson()) # Plot the time series plot_mvgam_series(data = dat$data_train, newdata = dat$data_test, series = 1) # Fit an appropriate model mod_ar2 <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(p = 2), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Fit a less appropriate model mod_rw <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = RW(), family = poisson(), data = dat$data_train, newdata = dat$data_test, chains = 2, silent = 2) # Compare Discrete Ranked Probability Scores for the testing period fc_ar2 <- forecast(mod_ar2) fc_rw <- forecast(mod_rw) score_ar2 <- score(fc_ar2, score = 'drps') score_rw <- score(fc_rw, score = 'drps') sum(score_ar2$series_1$score) sum(score_rw$series_1$score) # Now use approximate leave-future-out CV to compare # rolling forecasts; start at time point 40 to reduce # computational time and to ensure enough data is available # for estimating model parameters lfo_ar2 <- lfo_cv(mod_ar2, min_t = 40, fc_horizon = 3, silent = 2) lfo_rw <- lfo_cv(mod_rw, min_t = 40, fc_horizon = 3, silent = 2) # Plot Pareto-K values and ELPD estimates plot(lfo_ar2) plot(lfo_rw) # Proportion of timepoints in which AR2 model gives better forecasts length(which((lfo_ar2$elpds - lfo_rw$elpds) > 0)) / length(lfo_ar2$elpds) # A higher total ELPD is preferred lfo_ar2$sum_ELPD lfo_rw$sum_ELPD ## End(Not run)
Compute pointwise Log-Likelihoods from fitted mvgam objects
## S3 method for class 'mvgam' logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)## S3 method for class 'mvgam' logLik(object, linpreds, newdata, family_pars, include_forecast = TRUE, ...)
object |
|
linpreds |
Optional |
newdata |
Optional |
family_pars |
Optional |
include_forecast |
Logical. If |
... |
Ignored |
A matrix of dimension n_samples x n_observations containing the
pointwise log-likelihood draws for all observations in newdata. If no
newdata is supplied, log-likelihood draws are returned for all observations
that were originally fed to the model (training observations and, if supplied
to the original model via the newdata argument in mvgam,
testing observations).
Nicholas J Clark
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract log-likelihood values lls <- logLik(mod) str(lls) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Extract log-likelihood values lls <- logLik(mod) str(lls) ## End(Not run)
Extract the LOOIC (leave-one-out information criterion) using loo::loo().
## S3 method for class 'mvgam' loo(x, incl_dynamics = FALSE, ...) ## S3 method for class 'mvgam' loo_compare(x, ..., model_names = NULL, incl_dynamics = FALSE)## S3 method for class 'mvgam' loo(x, incl_dynamics = FALSE, ...) ## S3 method for class 'mvgam' loo_compare(x, ..., model_names = NULL, incl_dynamics = FALSE)
x |
Object of class |
incl_dynamics |
Deprecated and currently ignored |
... |
More |
model_names |
If |
When comparing two (or more) fitted mvgam models, we can estimate the
difference in their in-sample predictive accuracies using the Expected Log
Predictive Density (ELPD). This metric can be approximated using Pareto
Smoothed Importance Sampling (PSIS), which re-weights posterior draws to
approximate predictions for a datapoint had it not been included in the
original model fit (i.e. leave-one-out cross-validation).
See loo::loo() and loo::loo_compare() for further details on how this
importance sampling works.
Note: In-sample predictive metrics such as PSIS-LOO can sometimes be overly
optimistic for models that include process error components (e.g. those with
trend_model, trend_formula, or factor_formula). Consider using
out-of-sample evaluations for further scrutiny (see
forecast.mvgam, score.mvgam_forecast,
lfo_cv).
For loo.mvgam, an object of class psis_loo (see loo::loo()
for details). For loo_compare.mvgam, an object of class compare.loo
(see loo::loo_compare() for details).
Nicholas J Clark
## Not run: #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes #-------------------------------------------------- set.seed(111) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod1) mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) # Fit a model with hierarchical seasonality mod2 <- update( mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), chains = 2, silent = 2 ) conditional_effects(mod2) loo(mod2) # Add AR1 dynamic errors to mod2 mod3 <- update( mod2, trend_model = AR(), chains = 2, silent = 2 ) conditional_effects(mod3) plot(mod3, type = 'trend') loo(mod3) #-------------------------------------------------- # Compare models using LOO #-------------------------------------------------- loo_compare(mod1, mod2, mod3) options(mc.cores = mc.cores.def) #-------------------------------------------------- # Compare forecast abilities using LFO-CV #-------------------------------------------------- lfo_mod2 <- lfo_cv(mod2, min_t = 92) lfo_mod3 <- lfo_cv(mod3, min_t = 92) # Plot forecast ELPD differences plot( y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints, pch = 16, ylab = 'ELPD_mod2 - ELPD_mod3', xlab = 'Evaluation timepoint' ) abline(h = 0, lty = 'dashed') ## End(Not run)## Not run: #-------------------------------------------------- # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes #-------------------------------------------------- set.seed(111) simdat <- sim_mvgam( seasonality = 'hierarchical', trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = rbind(simdat$data_train, simdat$data_test), family = gaussian(), chains = 2, silent = 2 ) conditional_effects(mod1) mc.cores.def <- getOption('mc.cores') options(mc.cores = 1) loo(mod1) # Fit a model with hierarchical seasonality mod2 <- update( mod1, formula = y ~ s(season, bs = 'cc', k = 6) + s(season, series, bs = 'fs', xt = list(bs = 'cc'), k = 4), chains = 2, silent = 2 ) conditional_effects(mod2) loo(mod2) # Add AR1 dynamic errors to mod2 mod3 <- update( mod2, trend_model = AR(), chains = 2, silent = 2 ) conditional_effects(mod3) plot(mod3, type = 'trend') loo(mod3) #-------------------------------------------------- # Compare models using LOO #-------------------------------------------------- loo_compare(mod1, mod2, mod3) options(mc.cores = mc.cores.def) #-------------------------------------------------- # Compare forecast abilities using LFO-CV #-------------------------------------------------- lfo_mod2 <- lfo_cv(mod2, min_t = 92) lfo_mod3 <- lfo_cv(mod3, min_t = 92) # Plot forecast ELPD differences plot( y = lfo_mod2$elpds - lfo_mod3$elpds, x = lfo_mod2$eval_timepoints, pch = 16, ylab = 'ELPD_mod2 - ELPD_mod3', xlab = 'Evaluation timepoint' ) abline(h = 0, lty = 'dashed') ## End(Not run)
This function uses factor loadings from a fitted dynamic factor
mvgam model to calculate temporal correlations among series' trends.
lv_correlations(object)lv_correlations(object)
object |
|
Although this function will still work, it is now recommended to use
residual_cor() to obtain residual correlation information in a more
user-friendly format that allows for a deeper investigation of relationships
among the time series.
A list object containing the mean posterior correlations and
the full array of posterior correlations.
residual_cor(), plot.mvgam_residcor()
## Not run: #-------------------------------------------------- # Fit a model that uses two AR(1) dynamic factors to model # the temporal dynamics of the four rodent species in the portal_data #-------------------------------------------------- mod <- mvgam( captures ~ series, trend_model = AR(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the two dynamic factors plot(mod, type = 'factors') # Calculate correlations among the series lvcors <- lv_correlations(mod) names(lvcors) lapply(lvcors, class) # Recommended: use residual_cor() instead lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # Plot credible correlations as a matrix plot(lvcors, cluster = TRUE) # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)## Not run: #-------------------------------------------------- # Fit a model that uses two AR(1) dynamic factors to model # the temporal dynamics of the four rodent species in the portal_data #-------------------------------------------------- mod <- mvgam( captures ~ series, trend_model = AR(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the two dynamic factors plot(mod, type = 'factors') # Calculate correlations among the series lvcors <- lv_correlations(mod) names(lvcors) lapply(lvcors, class) # Recommended: use residual_cor() instead lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # Plot credible correlations as a matrix plot(lvcors, cluster = TRUE) # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)
Convenient way to call MCMC plotting functions implemented in the bayesplot package for mvgam models
## S3 method for class 'mvgam' mcmc_plot( object, type = "intervals", variable = NULL, regex = FALSE, use_alias = TRUE, ... )## S3 method for class 'mvgam' mcmc_plot( object, type = "intervals", variable = NULL, regex = FALSE, use_alias = TRUE, ... )
object |
An R object typically of class |
type |
The type of the plot.
Supported types are (as names) |
variable |
Names of the variables (parameters) to plot, as given by a
character vector or a regular expression (if |
regex |
Logical; Indicates whether |
use_alias |
Logical. If more informative names for parameters are
available (i.e. for beta coefficients |
... |
Additional arguments passed to the plotting functions.
See |
A ggplot object
that can be further customized using the ggplot2 package.
mvgam_draws for an overview of some of the shortcut strings
that can be used for argument variable
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') ## End(Not run)## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2) mcmc_plot(mod) mcmc_plot(mod, type = 'neff_hist') mcmc_plot(mod, variable = 'betas', type = 'areas') mcmc_plot(mod, variable = 'trend_params', type = 'combo') ## End(Not run)
Extract model.frame from a fitted mvgam object
## S3 method for class 'mvgam' model.frame(formula, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' model.frame(formula, trend_effects = FALSE, ...)## S3 method for class 'mvgam' model.frame(formula, trend_effects = FALSE, ...) ## S3 method for class 'mvgam_prefit' model.frame(formula, trend_effects = FALSE, ...)
formula |
|
trend_effects |
|
... |
Ignored |
A matrix containing the fitted model frame
Nicholas J Clark
Uses constructors from package splines2 to build monotonically increasing or decreasing splines. Details also in Wang & Yan (2021).
## S3 method for class 'moi.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'mod.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'moi.smooth' Predict.matrix(object, data) ## S3 method for class 'mod.smooth' Predict.matrix(object, data)## S3 method for class 'moi.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'mod.smooth.spec' smooth.construct(object, data, knots) ## S3 method for class 'moi.smooth' Predict.matrix(object, data) ## S3 method for class 'mod.smooth' Predict.matrix(object, data)
object |
A smooth specification object, usually generated by a term
|
data |
a list containing just the data (including any |
knots |
a list containing any knots supplied for basis setup — in same order and with same names as |
The constructor is not normally called directly,
but is rather used internally by mvgam. If they are not supplied then the
knots of the spline are placed evenly throughout the covariate values to
which the term refers: For example, if fitting 101 data with an 11
knot spline of x then there would be a knot at every 10th (ordered) x value.
The spline is an implementation of the closed-form I-spline basis based
on the recursion formula given by Ramsay (1988), in which the basis coefficients
must be constrained to either be non-negative (for monotonically increasing
functions) or non-positive (monotonically decreasing)
Take note that when using either monotonic basis, the number of basis functions
k must be supplied as an even integer due to the manner in
which monotonic basis functions are constructed
An object of class "moi.smooth" or "mod.smooth". In addition to
the usual elements of a smooth class documented under smooth.construct,
this object will contain a slot called boundary that defines the endpoints beyond
which the spline will begin extrapolating (extrapolation is flat due to the first
order penalty placed on the smooth function)
This constructor will result in a valid smooth if using a call to
gam or bam, however the resulting
functions will not be guaranteed to be monotonic because constraints on
basis coefficients will not be enforced
Nicholas J Clark
Wang, Wenjie, and Jun Yan. "Shape-Restricted Regression Splines with R Package splines2."
Journal of Data Science 19.3 (2021).
Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science, 3(4), 425–441.
## Not run: # Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity library(mgcv) mod_data <- data.frame(y = y, x = x) mod <- gam( y ~ s(x, k = 16), data = mod_data, family = gaussian() ) library(marginaleffects) plot_predictions( mod, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam( y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) plot_predictions( mod2, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c( f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2 ) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam( y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) # Visualise the different monotonic functions plot_predictions( mod, condition = c('x', 'fac', 'fac'), points = 0.5 ) plot(mod, type = 'smooth', realisations = TRUE) # First derivatives (on the link scale) should never be # negative for either factor level (derivs <- slopes( mod, variables = 'x', by = c('x', 'fac'), type = 'link' )) all(derivs$estimate > 0) ## End(Not run)## Not run: # Simulate data from a monotonically increasing function set.seed(123123) x <- runif(80) * 4 - 1 x <- sort(x) f <- exp(4 * x) / (1 + exp(4 * x)) y <- f + rnorm(80) * 0.1 plot(x, y) # A standard TRPS smooth doesn't capture monotonicity library(mgcv) mod_data <- data.frame(y = y, x = x) mod <- gam( y ~ s(x, k = 16), data = mod_data, family = gaussian() ) library(marginaleffects) plot_predictions( mod, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) # Using the 'moi' basis in mvgam rectifies this mod_data$time <- 1:NROW(mod_data) mod2 <- mvgam( y ~ s(x, bs = 'moi', k = 18), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) plot_predictions( mod2, by = 'x', newdata = data.frame( x = seq(min(x) - 0.5, max(x) + 0.5, length.out = 100) ), points = 0.5 ) plot(mod2, type = 'smooth', realisations = TRUE) # 'by' terms that produce a different smooth for each level of the 'by' # factor are also allowed x <- runif(80) * 4 - 1 x <- sort(x) # Two different monotonic smooths, one for each factor level f <- exp(4 * x) / (1 + exp(4 * x)) f2 <- exp(3.5 * x) / (1 + exp(3 * x)) fac <- c(rep('a', 80), rep('b', 80)) y <- c( f + rnorm(80) * 0.1, f2 + rnorm(80) * 0.2 ) plot(x, y[1:80]) plot(x, y[81:160]) # Gather all data into a data.frame, including the factor 'by' variable mod_data <- data.frame(y, x, fac = as.factor(fac)) mod_data$time <- 1:NROW(mod_data) # Fit a model with different smooths per factor level mod <- mvgam( y ~ s(x, bs = 'moi', by = fac, k = 8), data = mod_data, family = gaussian(), chains = 2, silent = 2 ) # Visualise the different monotonic functions plot_predictions( mod, condition = c('x', 'fac', 'fac'), points = 0.5 ) plot(mod, type = 'smooth', realisations = TRUE) # First derivatives (on the link scale) should never be # negative for either factor level (derivs <- slopes( mod, variables = 'x', by = c('x', 'fac'), type = 'link' )) all(derivs$estimate > 0) ## End(Not run)
This function estimates the posterior distribution for Generalised Additive
Models (GAMs) that can include smooth spline functions, specified in the GAM
formula, as well as latent temporal processes, specified by trend_model.
Further modelling options include State-Space representations to allow covariates and dynamic processes to occur on the latent 'State' level while also capturing observation-level effects. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs.
In addition, model fits can easily be assessed and compared with posterior predictive checks, forecast comparisons and leave-one-out / leave-future-out cross-validation.
mvgam( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... )mvgam( formula, trend_formula, knots, trend_knots, trend_model = "None", noncentred = FALSE, family = poisson(), share_obs_params = FALSE, data, newdata, use_lv = FALSE, n_lv, trend_map, priors, run_model = TRUE, prior_simulation = FALSE, residuals = TRUE, return_model_data = FALSE, backend = getOption("brms.backend", "cmdstanr"), algorithm = getOption("brms.algorithm", "sampling"), control = list(max_treedepth = 10, adapt_delta = 0.8), chains = 4, burnin = 500, samples = 500, thin = 1, parallel = TRUE, threads = 1, save_all_pars = FALSE, silent = 1, autoformat = TRUE, refit = FALSE, lfo = FALSE, ... )
formula |
A In |
trend_formula |
An optional Important notes:
|
knots |
An optional |
trend_knots |
As for |
trend_model |
Available options:
Additional features:
|
noncentred |
|
family |
Supported families:
See |
share_obs_params |
|
data |
A Required columns for most models:
Special cases:
|
newdata |
Optional |
use_lv |
|
n_lv |
|
trend_map |
Optional Required structure:
Notes:
|
priors |
An optional |
run_model |
|
prior_simulation |
|
residuals |
|
return_model_data |
|
backend |
Character string naming the package for Stan model fitting.
Options are |
algorithm |
Character string naming the estimation approach:
Can be set globally via |
control |
Named |
chains |
|
burnin |
|
samples |
|
thin |
Thinning interval for monitors. Ignored for variational inference algorithms. |
parallel |
|
threads |
|
save_all_pars |
|
silent |
Verbosity level between |
autoformat |
|
refit |
|
lfo |
|
... |
Further arguments passed to Stan: |
Dynamic GAMs are useful when we wish to predict future values from time series that show temporal dependence but we do not want to rely on extrapolating from a smooth term (which can sometimes lead to unpredictable and unrealistic behaviours). In addition, smooths can often try to wiggle excessively to capture any autocorrelation that is present in a time series, which exacerbates the problem of forecasting ahead.
As GAMs are very naturally viewed through a Bayesian lens, and we often must model time series that show complex distributional features and missing data, parameters for mvgam models are estimated in a Bayesian framework using Markov Chain Monte Carlo by default.
Getting Started Resources:
General overview: vignette("mvgam_overview") and vignette("data_in_mvgam")
Full list of vignettes: vignette(package = "mvgam")
Real-world examples: mvgam_use_cases
Quick reference: mvgam cheatsheet
A list object of class mvgam containing model output, the text
representation of the model file, the mgcv model output (for easily generating
simulations at unsampled covariate values), Dunn-Smyth residuals for each
series and key information needed for other functions in the package. See
mvgam-class for details. Use methods(class = "mvgam") for an
overview on available methods.
Formula Syntax: Details of the formula syntax used by mvgam can be
found in mvgam_formulae. Note that it is possible to supply an
empty formula where there are no predictors or intercepts in the observation
model (i.e. y ~ 0 or y ~ -1). In this case, an intercept-only observation
model will be set up but the intercept coefficient will be fixed at zero. This
can be handy if you wish to fit pure State-Space models where the variation in
the dynamic trend controls the average expectation, and/or where intercepts are
non-identifiable (as in piecewise trends).
Families and Link Functions: Details of families supported by mvgam
can be found in mvgam_families.
Trend Models: Details of latent error process models supported by mvgam
can be found in mvgam_trends.
Default priors for intercepts and any variance parameters are chosen to be
vaguely informative, but these should always be checked by the user. Prior
distributions for most important model parameters can be altered (see
get_mvgam_priors() for details). Note that latent trends are estimated on
the link scale so choose priors accordingly.
However more control over the model specification can be accomplished by setting
run_model = FALSE and then editing the model code (found in the
model_file slot in the returned object) before running the model using either
rstan or cmdstanr. This is encouraged for complex modelling tasks.
Important: No priors are formally checked to ensure they are in the right syntax so it is up to the user to ensure these are correct.
Random Effects: For any smooth terms using the random effect basis
(smooth.construct.re.smooth.spec), a non-centred
parameterisation is automatically employed to avoid degeneracies that are common
in hierarchical models. Note however that centred versions may perform better
for series that are particularly informative, so as with any foray into Bayesian
modelling, it is worth building an understanding of the model's assumptions and
limitations by following a principled workflow. Also note that models are
parameterised using drop.unused.levels = FALSE in jagam
to ensure predictions can be made for all levels of the supplied factor variable.
Observation Level Parameters: When more than one series is included in
data and an observation family that contains more than one parameter is
used, additional observation family parameters (i.e. phi for nb() or sigma
for gaussian()) are by default estimated independently for each series. But if
you wish for the series to share the same observation parameters, set
share_obs_params = TRUE.
Residuals: For each series, randomized quantile (i.e. Dunn-Smyth) residuals are calculated for inspecting model diagnostics. If the fitted model is appropriate then Dunn-Smyth residuals will be standard normal in distribution and no autocorrelation will be evident. When a particular observation is missing, the residual is calculated by comparing independent draws from the model's posterior distribution.
Using Stan: mvgam is primarily designed to use Hamiltonian Monte Carlo
for parameter estimation via the software Stan (using either the cmdstanr
or rstan interface). There are great advantages when using Stan over Gibbs /
Metropolis Hastings samplers, which includes the option to estimate nonlinear
effects via Hilbert space approximate Gaussian Processes,
the availability of a variety of inference algorithms (i.e. variational inference,
laplacian inference etc...) and capabilities to enforce stationarity for complex Vector Autoregressions.
Because of the many advantages of Stan over JAGS, further development of
the package will only be applied to Stan. This includes the planned addition
of more response distributions, plans to handle zero-inflation, and plans to
incorporate a greater variety of trend models. Users are strongly encouraged to
opt for Stan over JAGS in any proceeding workflows.
How to Start: The mvgam cheatsheet
is a good starting place if you are just learning to use the package. It gives
an overview of the package's key functions and objects, as well as providing a
reasonable workflow that new users can follow.
Recommended Steps:
Data Preparation: Check that your data are in a suitable tidy format for mvgam modeling (see the data formatting vignette for guidance)
Data Exploration: Inspect features of the data using plot_mvgam_series.
Now is also a good time to familiarise yourself with the package's example
workflows that are detailed in the vignettes:
Model Structure: Carefully think about how to structure linear predictor
effects (i.e. smooth terms using s(), te()
or ti(), GPs using gp(), dynamic
time-varying effects using dynamic(), and parametric terms), latent temporal
trend components (see mvgam_trends) and the appropriate
observation family (see mvgam_families). Use get_mvgam_priors()
to see default prior distributions for stochastic parameters.
Prior Specification: Change default priors using appropriate prior knowledge
(see prior()). When using State-Space models with a
trend_formula, pay particular attention to priors for any variance parameters
such as process errors and observation errors. Default priors on these parameters
are chosen to be vaguely informative and to avoid zero (using Inverse Gamma
priors), but more informative priors will often help with model efficiency
and convergence.
Model Fitting: Fit the model using either Hamiltonian Monte Carlo or an
approximation algorithm (i.e. change the backend argument) and use
summary.mvgam(), conditional_effects.mvgam(), mcmc_plot.mvgam(),
pp_check.mvgam(), pairs.mvgam() and plot.mvgam() to inspect /
interrogate the model.
Model Comparison: Update the model as needed and use loo_compare.mvgam()
for in-sample model comparisons, or alternatively use forecast.mvgam(),
lfo_cv.mvgam() and score.mvgam_forecast() to compare models based on
out-of-sample forecasts (see the forecast evaluation vignette
for guidance).
Inference and Prediction: When satisfied with the model structure, use
predict.mvgam(), plot_predictions() and/or
plot_slopes() for more targeted simulation-based
inferences (see "How to interpret and report nonlinear effects from Generalized Additive Models"
for some guidance on interpreting GAMs). For time series models, use
hindcast.mvgam(), fitted.mvgam(), augment.mvgam() and forecast.mvgam()
to inspect posterior hindcast / forecast distributions.
Documentation: Use how_to_cite() to obtain a scaffold methods section
(with full references) to begin describing this model in scientific publications.
Nicholas J Clark
Nicholas J Clark & Konstans Wells (2023). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution. 14:3, 771-784.
Nicholas J Clark, SK Morgan Ernest, Henry Senyondo, Juniper Simonis, Ethan P White, Glenda M Yenni, KANK Karunarathna (2025). Beyond single-species models: leveraging multispecies forecasts to navigate the dynamics of ecological predictability. PeerJ. 13:e18929 https://doi.org/10.7717/peerj.18929
jagam(), gam(),
gam.models, get_mvgam_priors(), jsdgam(),
hindcast.mvgam(), forecast.mvgam(), predict.mvgam()
## Not run: # ============================================================================= # Basic Multi-Series Time Series Modeling # ============================================================================= # Simulate three time series that have shared seasonal dynamics, # independent AR(1) trends, and Poisson observations set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = "all") # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent AR1 temporal process. # Note that 'noncentred = TRUE' will likely give performance gains. # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, run_model = FALSE ) # View the model code in Stan language stancode(mod1) # View the data objects needed to fit the model in Stan sdata1 <- standata(mod1) str(sdata1) # Now fit the model mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) # Extract the model summary summary(mod1) # Plot the historical trend and hindcast distributions for one series hc_trend <- hindcast(mod1, type = "trend") plot(hc_trend) hc_predicted <- hindcast(mod1, type = "response") plot(hc_predicted) # Residual diagnostics plot(mod1, type = "residuals", series = 1) resids <- residuals(mod1) str(resids) # Fitted values and residuals can be added directly to the training data augment(mod1) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) fc_summary <- summary(fc) head(fc_summary, 12) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = "smooths") # Plot estimated first derivatives of the smooth plot(mod1, type = "smooths", derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = "smooths", residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = "smooths", realisations = TRUE) # Plot conditional response predictions using marginaleffects conditional_effects(mod1) plot_predictions(mod1, condition = "season", points = 0.5) # Generate posterior predictive checks using bayesplot pp_check(mod1) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = "betas") head(beta_draws_df) str(beta_draws_df) # Investigate model fit mc.cores.def <- getOption("mc.cores") options(mc.cores = 1) loo(mod1) options(mc.cores = mc.cores.def) # ============================================================================= # Vector Autoregressive (VAR) Models # ============================================================================= # Fit a model to the portal time series that uses a latent # Vector Autoregression of order 1 mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Plot the process error variance-covariance matrix in the same way mcmc_plot( mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calculate Generalized Impulse Response Functions for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot some of them plot(irfs, series = 1) plot(irfs, series = 2) # Calculate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # ============================================================================= # Dynamic Factor Models # ============================================================================= # Now fit a model that uses two RW dynamic factors to model # the temporal dynamics of the four rodent species mod <- mvgam( captures ~ series, trend_model = RW(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the factors plot(mod, type = 'factors') # Plot the hindcast distributions hcs <- hindcast(mod) plot(hcs, series = 1) plot(hcs, series = 2) plot(hcs, series = 3) plot(hcs, series = 4) # Use residual_cor() to calculate temporal correlations among the series # based on the factor loadings lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # For those correlations whose credible intervals did not include # zero, plot them as a correlation matrix (all other correlations # are shown as zero on this plot) plot(lvcors, cluster = TRUE) # ============================================================================= # Shared Latent Trends with Custom Trend Mapping # ============================================================================= # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has its own unique latent trend trend_map <- data.frame( series = unique(mod_data$series), trend = c(1, 1, 2) ) # Fit the model using AR1 trends mod <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), trend_map = trend_map, trend_model = AR(), data = mod_data, return_model_data = TRUE, chains = 2, silent = 2 ) # The mapping matrix is now supplied as data to the model in the 'Z' element mod$model_data$Z # The first two series share an identical latent trend; the third is different plot(residual_cor(mod)) plot(mod, type = "trend", series = 1) plot(mod, type = "trend", series = 2) plot(mod, type = "trend", series = 3) # ============================================================================= # Time-Varying (Dynamic) Coefficients # ============================================================================= # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(123) N <- 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for (i in 2:N) { beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) } plot(beta_temp) # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate some noisy Gaussian observations out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5 ) # Gather necessary data into a data.frame; split into training / testing data <- data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180, ] data_test <- data[181:200, ] # Fit the model using the dynamic() function mod <- mvgam( formula = out ~ dynamic( temp, scale = FALSE, k = 40 ), family = gaussian(), data = data_train, newdata = data_test, chains = 2, silent = 2 ) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = "smooths") fc <- forecast(mod, newdata = data_test) plot(fc) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = "dashed", lwd = 2) points(beta_temp, pch = 16) # ============================================================================= # Working with Offset Terms # ============================================================================= # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam( prop_trend = 0, mu = c(0, 2, 2), seasonality = "hierarchical" ) # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam( formula = y ~ offset(offset) + s(series, bs = "re") + s(season, bs = "cc") + s(season, by = series, m = 1, k = 5), data = dat$data_train, chains = 2, silent = 2 ) # Inspect the model file to see the modification to the linear predictor (eta) stancode(mod) # Forecasts for the first two series will differ in magnitude fc <- forecast(mod, newdata = dat$data_test) plot(fc, series = 1, ylim = c(0, 75)) plot(fc, series = 2, ylim = c(0, 75)) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 fc <- forecast(mod, newdata = dat$data_test) plot(fc) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = "link", newdata = dat$data_test, summary = FALSE ) series1_inds <- which(dat$data_test$series == "series_1") series2_inds <- which(dat$data_test$series == "series_2") # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = "l", col = "grey75", ylim = range(preds_rr), ylab = "Series1 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series1_inds], col = "grey75") } plot(preds_rr[1, series2_inds], type = "l", col = "darkred", ylim = range(preds_rr), ylab = "Series2 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series2_inds], col = "darkred") } layout(1) # ============================================================================= # Binomial Family Models # ============================================================================= # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = "series1", x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = "series2", x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) plot_mvgam_series(data = dat, series = "all") # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( formula = cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat, chains = 2, silent = 2 ) summary(mod) pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) conditional_effects(mod, type = "link") # To view predictions on the probability scale, # use ntrials = 1 in datagrid() plot_predictions( mod, by = c('x', 'series'), newdata = datagrid( x = runif(100, -2, 2), series = unique, ntrials = 1 ), type = 'expected' ) # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)## Not run: # ============================================================================= # Basic Multi-Series Time Series Modeling # ============================================================================= # Simulate three time series that have shared seasonal dynamics, # independent AR(1) trends, and Poisson observations set.seed(0) dat <- sim_mvgam( T = 80, n_series = 3, mu = 2, trend_model = AR(p = 1), prop_missing = 0.1, prop_trend = 0.6 ) # Plot key summary statistics for a single series plot_mvgam_series(data = dat$data_train, series = 1) # Plot all series together plot_mvgam_series(data = dat$data_train, series = "all") # Formulate a model using Stan where series share a cyclic smooth for # seasonality and each series has an independent AR1 temporal process. # Note that 'noncentred = TRUE' will likely give performance gains. # Set run_model = FALSE to inspect the returned objects mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, run_model = FALSE ) # View the model code in Stan language stancode(mod1) # View the data objects needed to fit the model in Stan sdata1 <- standata(mod1) str(sdata1) # Now fit the model mod1 <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), data = dat$data_train, trend_model = AR(), family = poisson(), noncentred = TRUE, chains = 2, silent = 2 ) # Extract the model summary summary(mod1) # Plot the historical trend and hindcast distributions for one series hc_trend <- hindcast(mod1, type = "trend") plot(hc_trend) hc_predicted <- hindcast(mod1, type = "response") plot(hc_predicted) # Residual diagnostics plot(mod1, type = "residuals", series = 1) resids <- residuals(mod1) str(resids) # Fitted values and residuals can be added directly to the training data augment(mod1) # Compute the forecast using covariate information in data_test fc <- forecast(mod1, newdata = dat$data_test) str(fc) fc_summary <- summary(fc) head(fc_summary, 12) plot(fc) # Plot the estimated seasonal smooth function plot(mod1, type = "smooths") # Plot estimated first derivatives of the smooth plot(mod1, type = "smooths", derivatives = TRUE) # Plot partial residuals of the smooth plot(mod1, type = "smooths", residuals = TRUE) # Plot posterior realisations for the smooth plot(mod1, type = "smooths", realisations = TRUE) # Plot conditional response predictions using marginaleffects conditional_effects(mod1) plot_predictions(mod1, condition = "season", points = 0.5) # Generate posterior predictive checks using bayesplot pp_check(mod1) # Extract observation model beta coefficient draws as a data.frame beta_draws_df <- as.data.frame(mod1, variable = "betas") head(beta_draws_df) str(beta_draws_df) # Investigate model fit mc.cores.def <- getOption("mc.cores") options(mc.cores = 1) loo(mod1) options(mc.cores = mc.cores.def) # ============================================================================= # Vector Autoregressive (VAR) Models # ============================================================================= # Fit a model to the portal time series that uses a latent # Vector Autoregression of order 1 mod <- mvgam( formula = captures ~ -1, trend_formula = ~ trend, trend_model = VAR(cor = TRUE), family = poisson(), data = portal_data, chains = 2, silent = 2 ) # Plot the autoregressive coefficient distributions; # use 'dir = "v"' to arrange the order of facets correctly mcmc_plot( mod, variable = 'A', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Plot the process error variance-covariance matrix in the same way mcmc_plot( mod, variable = 'Sigma', regex = TRUE, type = 'hist', facet_args = list(dir = 'v') ) # Calculate Generalized Impulse Response Functions for each series irfs <- irf( mod, h = 12, cumulative = FALSE ) # Plot some of them plot(irfs, series = 1) plot(irfs, series = 2) # Calculate forecast error variance decompositions for each series fevds <- fevd(mod, h = 12) # Plot median contributions to forecast error variance plot(fevds) # ============================================================================= # Dynamic Factor Models # ============================================================================= # Now fit a model that uses two RW dynamic factors to model # the temporal dynamics of the four rodent species mod <- mvgam( captures ~ series, trend_model = RW(), use_lv = TRUE, n_lv = 2, data = portal_data, chains = 2, silent = 2 ) # Plot the factors plot(mod, type = 'factors') # Plot the hindcast distributions hcs <- hindcast(mod) plot(hcs, series = 1) plot(hcs, series = 2) plot(hcs, series = 3) plot(hcs, series = 4) # Use residual_cor() to calculate temporal correlations among the series # based on the factor loadings lvcors <- residual_cor(mod) names(lvcors) lvcors$cor # For those correlations whose credible intervals did not include # zero, plot them as a correlation matrix (all other correlations # are shown as zero on this plot) plot(lvcors, cluster = TRUE) # ============================================================================= # Shared Latent Trends with Custom Trend Mapping # ============================================================================= # Example of supplying a trend_map so that some series can share # latent trend processes sim <- sim_mvgam(n_series = 3) mod_data <- sim$data_train # Here, we specify only two latent trends; series 1 and 2 share a trend, # while series 3 has its own unique latent trend trend_map <- data.frame( series = unique(mod_data$series), trend = c(1, 1, 2) ) # Fit the model using AR1 trends mod <- mvgam( formula = y ~ s(season, bs = "cc", k = 6), trend_map = trend_map, trend_model = AR(), data = mod_data, return_model_data = TRUE, chains = 2, silent = 2 ) # The mapping matrix is now supplied as data to the model in the 'Z' element mod$model_data$Z # The first two series share an identical latent trend; the third is different plot(residual_cor(mod)) plot(mod, type = "trend", series = 1) plot(mod, type = "trend", series = 2) plot(mod, type = "trend", series = 3) # ============================================================================= # Time-Varying (Dynamic) Coefficients # ============================================================================= # Example of how to use dynamic coefficients # Simulate a time-varying coefficient for the effect of temperature set.seed(123) N <- 200 beta_temp <- vector(length = N) beta_temp[1] <- 0.4 for (i in 2:N) { beta_temp[i] <- rnorm(1, mean = beta_temp[i - 1] - 0.0025, sd = 0.05) } plot(beta_temp) # Simulate a covariate called 'temp' temp <- rnorm(N, sd = 1) # Simulate some noisy Gaussian observations out <- rnorm(N, mean = 4 + beta_temp * temp, sd = 0.5 ) # Gather necessary data into a data.frame; split into training / testing data <- data.frame(out, temp, time = seq_along(temp)) data_train <- data[1:180, ] data_test <- data[181:200, ] # Fit the model using the dynamic() function mod <- mvgam( formula = out ~ dynamic( temp, scale = FALSE, k = 40 ), family = gaussian(), data = data_train, newdata = data_test, chains = 2, silent = 2 ) # Inspect the model summary, forecast and time-varying coefficient distribution summary(mod) plot(mod, type = "smooths") fc <- forecast(mod, newdata = data_test) plot(fc) # Propagating the smooth term shows how the coefficient is expected to evolve plot_mvgam_smooth(mod, smooth = 1, newdata = data) abline(v = 180, lty = "dashed", lwd = 2) points(beta_temp, pch = 16) # ============================================================================= # Working with Offset Terms # ============================================================================= # Example showing how to incorporate an offset; simulate some count data # with different means per series set.seed(100) dat <- sim_mvgam( prop_trend = 0, mu = c(0, 2, 2), seasonality = "hierarchical" ) # Add offset terms to the training and testing data dat$data_train$offset <- 0.5 * as.numeric(dat$data_train$series) dat$data_test$offset <- 0.5 * as.numeric(dat$data_test$series) # Fit a model that includes the offset in the linear predictor as well as # hierarchical seasonal smooths mod <- mvgam( formula = y ~ offset(offset) + s(series, bs = "re") + s(season, bs = "cc") + s(season, by = series, m = 1, k = 5), data = dat$data_train, chains = 2, silent = 2 ) # Inspect the model file to see the modification to the linear predictor (eta) stancode(mod) # Forecasts for the first two series will differ in magnitude fc <- forecast(mod, newdata = dat$data_test) plot(fc, series = 1, ylim = c(0, 75)) plot(fc, series = 2, ylim = c(0, 75)) # Changing the offset for the testing data should lead to changes in # the forecast dat$data_test$offset <- dat$data_test$offset - 2 fc <- forecast(mod, newdata = dat$data_test) plot(fc) # Relative Risks can be computed by fixing the offset to the same value # for each series dat$data_test$offset <- rep(1, NROW(dat$data_test)) preds_rr <- predict(mod, type = "link", newdata = dat$data_test, summary = FALSE ) series1_inds <- which(dat$data_test$series == "series_1") series2_inds <- which(dat$data_test$series == "series_2") # Relative Risks are now more comparable among series layout(matrix(1:2, ncol = 2)) plot(preds_rr[1, series1_inds], type = "l", col = "grey75", ylim = range(preds_rr), ylab = "Series1 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series1_inds], col = "grey75") } plot(preds_rr[1, series2_inds], type = "l", col = "darkred", ylim = range(preds_rr), ylab = "Series2 Relative Risk", xlab = "Time" ) for (i in 2:50) { lines(preds_rr[i, series2_inds], col = "darkred") } layout(1) # ============================================================================= # Binomial Family Models # ============================================================================= # Example showcasing how cbind() is needed for Binomial observations # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = "series1", x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = "series2", x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) plot_mvgam_series(data = dat, series = "all") # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( formula = cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat, chains = 2, silent = 2 ) summary(mod) pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) conditional_effects(mod, type = "link") # To view predictions on the probability scale, # use ntrials = 1 in datagrid() plot_predictions( mod, by = c('x', 'series'), newdata = datagrid( x = runif(100, -2, 2), series = unique, ntrials = 1 ), type = 'expected' ) # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)
Extract quantities that can be used to diagnose sampling behavior of the algorithms applied by Stan at the back-end of mvgam.
## S3 method for class 'mvgam' nuts_params(object, pars = NULL, ...) ## S3 method for class 'mvgam' log_posterior(object, ...) ## S3 method for class 'mvgam' rhat(x, pars = NULL, ...) ## S3 method for class 'mvgam' neff_ratio(object, pars = NULL, ...)## S3 method for class 'mvgam' nuts_params(object, pars = NULL, ...) ## S3 method for class 'mvgam' log_posterior(object, ...) ## S3 method for class 'mvgam' rhat(x, pars = NULL, ...) ## S3 method for class 'mvgam' neff_ratio(object, pars = NULL, ...)
object, x
|
A |
pars |
An optional character vector of parameter names.
For |
... |
Arguments passed to individual methods. |
For more details see
bayesplot-extractors.
The exact form of the output depends on the method.
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == "divergent__")$Value) head(neff_ratio(mod)) ## End(Not run)## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) np <- nuts_params(mod) head(np) # extract the number of divergence transitions sum(subset(np, Parameter == "divergent__")$Value) head(neff_ratio(mod)) ## End(Not run)
Extract posterior draws in conventional formats as data.frames, matrices, or arrays.
## S3 method for class 'mvgam' as.data.frame( x, row.names = NULL, optional = TRUE, variable = "betas", use_alias = TRUE, regex = FALSE, ... ) ## S3 method for class 'mvgam' as.matrix(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as.array(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as_draws( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_matrix( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_df( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_array( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_list( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)## S3 method for class 'mvgam' as.data.frame( x, row.names = NULL, optional = TRUE, variable = "betas", use_alias = TRUE, regex = FALSE, ... ) ## S3 method for class 'mvgam' as.matrix(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as.array(x, variable = "betas", regex = FALSE, use_alias = TRUE, ...) ## S3 method for class 'mvgam' as_draws( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_matrix( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_df( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_array( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_list( x, variable = NULL, regex = FALSE, inc_warmup = FALSE, use_alias = TRUE, ... ) ## S3 method for class 'mvgam' as_draws_rvars(x, variable = NULL, regex = FALSE, inc_warmup = FALSE, ...)
x |
|
row.names |
Ignored |
optional |
Ignored |
variable |
A character specifying which parameters to extract. Can either be one of the following options:
OR can be a character vector providing the variables to extract. |
use_alias |
Logical. If more informative names for parameters are
available (i.e. for beta coefficients |
regex |
Logical. If not using one of the prespecified options for
extractions, should |
... |
Ignored |
inc_warmup |
Should warmup draws be included? Defaults to |
A data.frame, matrix, or array containing the posterior draws.
Nicholas J Clark
## Not run: sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = sim$data_train, family = Gamma(), chains = 2, silent = 2 ) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars) ## End(Not run)## Not run: sim <- sim_mvgam(family = Gamma()) mod1 <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = sim$data_train, family = Gamma(), chains = 2, silent = 2 ) beta_draws_df <- as.data.frame(mod1, variable = 'betas') head(beta_draws_df) str(beta_draws_df) beta_draws_mat <- as.matrix(mod1, variable = 'betas') head(beta_draws_mat) str(beta_draws_mat) shape_pars <- as.matrix(mod1, variable = 'shape', regex = TRUE) head(shape_pars) ## End(Not run)
Supported mvgam families
tweedie(link = "log") student_t(link = "identity") betar(...) nb(...) lognormal(...) student(...) bernoulli(...) beta_binomial(...) nmix(link = "log")tweedie(link = "log") student_t(link = "identity") betar(...) nb(...) lognormal(...) student(...) bernoulli(...) beta_binomial(...) nmix(link = "log")
link |
a specification for the family link function. At present these cannot be changed |
... |
Arguments to be passed to the mgcv version of the associated functions |
mvgam currently supports the following standard observation families:
gaussian with identity link, for real-valued data
poisson with log-link, for count data
Gamma with log-link, for non-negative real-valued data
binomial with logit-link, for count data when the number
of trials is known (and must be supplied)
In addition, the following extended families from the mgcv and brms packages are supported:
betar with logit-link, for proportional data on (0,1)
nb with log-link, for count data
lognormal with identity-link, for non-negative real-valued data
bernoulli with logit-link, for binary data
beta_binomial with logit-link, as for binomial() but allows
for overdispersion
Finally, mvgam supports the three extended families described here:
tweedie with log-link, for count data (power parameter p fixed at 1.5)
student_t() (or student) with identity-link, for real-valued data
nmix for count data with imperfect detection modeled via a
State-Space N-Mixture model. The latent states are Poisson (with log link), capturing the 'true' latent
abundance, while the observation process is Binomial to account for imperfect detection. The
observation formula in these models is used to set up a linear predictor for the detection
probability (with logit link). See the example below for a more detailed worked explanation
of the nmix() family
Only poisson(), nb(), and tweedie() are available if
using JAGS. All families, apart from tweedie(), are supported if
using Stan.
Note that currently it is not possible to change the default link
functions in mvgam, so any call to change these will be silently ignored
Objects of class family
Nicholas J Clark
## Not run: # ============================================================================= # N-mixture Models # ============================================================================= set.seed(999) # Simulate observations for species 1, which shows a declining trend and # 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate # and species dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 80 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a # smaller detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate) ) # series identifiers testdat$species <- factor( testdat$species, levels = unique(testdat$species) ) testdat$series <- factor( testdat$series, levels = unique(testdat$series) ) # The trend_map to state how replicates are structured testdat %>% # each unique combination of site*species is a separate process dplyr::mutate( trend = as.numeric(factor(paste0(site, species))) ) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map # Fit a model mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), chains = 2 ) # The usual diagnostics summary(mod) # Plotting conditional effects library(ggplot2) plot_predictions( mod, condition = 'species', type = 'detection' ) + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') # ============================================================================= # Binomial Models # ============================================================================= # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat ) summary(mod) ## End(Not run)## Not run: # ============================================================================= # N-mixture Models # ============================================================================= set.seed(999) # Simulate observations for species 1, which shows a declining trend and # 0.7 detection probability data.frame( site = 1, # five replicates per year; six years replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_1', # true abundance declines nonlinearly truth = c( rep(28, 5), rep(26, 5), rep(23, 5), rep(16, 5), rep(14, 5), rep(14, 5) ), # observations are taken with detection prob = 0.7 obs = c( rbinom(5, 28, 0.7), rbinom(5, 26, 0.7), rbinom(5, 23, 0.7), rbinom(5, 15, 0.7), rbinom(5, 14, 0.7), rbinom(5, 14, 0.7) ) ) %>% # add 'series' information, which is an identifier of site, replicate # and species dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), # add a 'cap' variable that defines the maximum latent N to # marginalize over when estimating latent abundance; in other words # how large do we realistically think the true abundance could be? cap = 80 ) %>% dplyr::select(-replicate) -> testdat # Now add another species that has a different temporal trend and a # smaller detection probability (0.45 for this species) testdat <- testdat %>% dplyr::bind_rows( data.frame( site = 1, replicate = rep(1:5, 6), time = sort(rep(1:6, 5)), species = 'sp_2', truth = c( rep(4, 5), rep(7, 5), rep(15, 5), rep(16, 5), rep(19, 5), rep(18, 5) ), obs = c( rbinom(5, 4, 0.45), rbinom(5, 7, 0.45), rbinom(5, 15, 0.45), rbinom(5, 16, 0.45), rbinom(5, 19, 0.45), rbinom(5, 18, 0.45) ) ) %>% dplyr::mutate( series = paste0( 'site_', site, '_', species, '_rep_', replicate ), time = as.numeric(time), cap = 50 ) %>% dplyr::select(-replicate) ) # series identifiers testdat$species <- factor( testdat$species, levels = unique(testdat$species) ) testdat$series <- factor( testdat$series, levels = unique(testdat$series) ) # The trend_map to state how replicates are structured testdat %>% # each unique combination of site*species is a separate process dplyr::mutate( trend = as.numeric(factor(paste0(site, species))) ) %>% dplyr::select(trend, series) %>% dplyr::distinct() -> trend_map trend_map # Fit a model mod <- mvgam( # the observation formula sets up linear predictors for # detection probability on the logit scale formula = obs ~ species - 1, # the trend_formula sets up the linear predictors for # the latent abundance processes on the log scale trend_formula = ~ s(time, by = trend, k = 4) + species, # the trend_map takes care of the mapping trend_map = trend_map, # nmix() family and data family = nmix(), data = testdat, # priors can be set in the usual way priors = c( prior(std_normal(), class = b), prior(normal(1, 1.5), class = Intercept_trend) ), chains = 2 ) # The usual diagnostics summary(mod) # Plotting conditional effects library(ggplot2) plot_predictions( mod, condition = 'species', type = 'detection' ) + ylab('Pr(detection)') + ylim(c(0, 1)) + theme_classic() + theme(legend.position = 'none') # ============================================================================= # Binomial Models # ============================================================================= # Simulate two time series of Binomial trials trials <- sample(c(20:25), 50, replace = TRUE) x <- rnorm(50) detprob1 <- plogis(-0.5 + 0.9 * x) detprob2 <- plogis(-0.1 - 0.7 * x) dat <- rbind( data.frame( y = rbinom(n = 50, size = trials, prob = detprob1), time = 1:50, series = 'series1', x = x, ntrials = trials ), data.frame( y = rbinom(n = 50, size = trials, prob = detprob2), time = 1:50, series = 'series2', x = x, ntrials = trials ) ) dat <- dplyr::mutate(dat, series = as.factor(series)) dat <- dplyr::arrange(dat, time, series) # Fit a model using the binomial() family; must specify observations # and number of trials in the cbind() wrapper mod <- mvgam( cbind(y, ntrials) ~ series + s(x, by = series), family = binomial(), data = dat ) summary(mod) ## End(Not run)
mvgam_fevd object descriptionA mvgam_fevd object returned by function fevd(). Run
methods(class = "mvgam_fevd") to see an overview of available methods.
A forecast error variance decomposition is useful for quantifying
the amount of information each series that in a Vector Autoregression
contributes to the forecast distributions of the other series in the
autoregression. This object contains the forecast error variance
decomposition using the orthogonalised impulse response coefficient
matrices , which can be used to quantify the contribution of
series to the h-step forecast error variance of series :
If the orthogonalised impulse reponses are divided by the variance of the forecast error
, this yields an interpretable percentage representing
how much of the forecast error variance for can be explained by an
exogenous shock to . This percentage is what is calculated and
returned in objects of class mvgam_fevd, where the posterior
distribution of variance decompositions for each variable in the original
model is contained in a separate slot within the returned list object
Nicholas J Clark
Lütkepohl, H (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.
mvgam_forecast object descriptionA mvgam_forecast object returned by function hindcast
or forecast. Run methods(class = "mvgam_forecast") to see
an overview of available methods.
A mvgam_forecast object contains the following elements:
call the original observation model formula
trend_call If a trend_formula was supplied, the original trend
model formula is returned. Otherwise NULL
family character description of the observation distribution
family_pars list containing draws of family-specific
parameters (i.e. shape, scale or overdispersion parameters). Only
returned if type = link. Otherwise NULL
trend_model character description of the latent trend model
drift Logical specifying whether a drift term was used in the
trend model
use_lv Logical flag indicating whether latent dynamic factors were
used in the model
fit_engine Character describing the fit engine, either as stan
or jags
type The type of predictions included (either link, response
or trend)
series_names Names of the time series, taken from
levels(data$series) in the original model fit
train_observations A list of training observation vectors of
length n_series
train_times A list of the unique training times of length
n_series
test_observations If the forecast function was used,
a list of test observation vectors of length n_series. Otherwise
NULL
test_times If the forecast function was used, a
list of the unique testing (validation) times of length n_series.
Otherwise NULL
hindcasts A list of posterior hindcast distributions of length
n_series.
forecasts If the forecast function was used, a
list of posterior forecast distributions of length n_series.
Otherwise NULL
Nicholas J Clark
mvgam, hindcast.mvgam, forecast.mvgam
Details of formula specifications in mvgam models
mvgam will accept an observation model formula and an optional
process model formula (via the argument trend_formula). Neither of these formulae can
be specified as lists, contrary to the accepted behaviour in some mgcv or brms models.
Note that it is possible to supply an empty formula where
there are no predictors or intercepts in the observation model (i.e. y ~ 0 or y ~ -1).
In this case, an intercept-only observation model will be set up but the intercept coefficient
will be fixed at zero. This can be handy if you wish to fit pure State-Space models where
the variation in the dynamic trend controls the average expectation, and/or where intercepts
are non-identifiable.
The formulae supplied to mvgam and jsdgam
are exactly like those supplied to
glm except that smooth terms,
s,
te,
ti and
t2,
time-varying effects using dynamic,
monotonically increasing (using s(x, bs = 'moi'))
or decreasing splines (using s(x, bs = 'mod');
see smooth.construct.moi.smooth.spec for
details), as well as
Gaussian Process functions using gp and offsets using
offset
can be added to the right hand side (and . is not supported in mvgam formulae).
Further details on specifying different kinds of smooth functions, and how to control their behaviours
by modifying their potential complexities and / or how the penalties behave, can be found in the
extensive documentation for the mgcv package.
Nicholas J Clark
mvgam,
formula.gam,
gam.models,
jagam,
gam,
s,
gp,
formula
mvgam_irf object descriptionA mvgam_irf object returned by function irf.
Run methods(class = "mvgam_irf") to see an overview of available methods.
Generalized or Orthogonalized Impulse Response Functions can be
computed using the posterior estimates of Vector Autoregressive parameters.
This function generates a positive "shock" for a target process at time
t = 0 and then calculates how each of the remaining processes in the
latent VAR are expected to respond over the forecast horizon h. The
function computes IRFs for all processes in the object and returns them in
an array that can be plotted using the S3 plot function. To inspect
community-level metrics of stability using latent VAR processes, you can
use the related stability() function.
A mvgam_irf object contains a list of posterior impulse response
functions, each stored as its own list
Nicholas J Clark
PH Pesaran & Shin Yongcheol (1998). Generalized impulse response analysis in linear multivariate models. Economics Letters 58: 17–29.
Helper functions for marginaleffects calculations in mvgam models
Functions needed for working with marginaleffects
Functions needed for getting data / objects with insight
## S3 method for class 'mvgam' get_coef(model, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' set_coef(model, coefs, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' get_vcov(model, vcov = NULL, ...) ## S3 method for class 'mvgam' get_predict( model, newdata, type = "response", mfx, newparams, ndraws, se.fit, process_error = FALSE, ... ) ## S3 method for class 'mvgam' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam_prefit' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) ## S3 method for class 'mvgam_prefit' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... )## S3 method for class 'mvgam' get_coef(model, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' set_coef(model, coefs, trend_effects = FALSE, ...) ## S3 method for class 'mvgam' get_vcov(model, vcov = NULL, ...) ## S3 method for class 'mvgam' get_predict( model, newdata, type = "response", mfx, newparams, ndraws, se.fit, process_error = FALSE, ... ) ## S3 method for class 'mvgam' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam_prefit' get_data(x, source = "environment", verbose = TRUE, ...) ## S3 method for class 'mvgam' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... ) ## S3 method for class 'mvgam_prefit' find_predictors( x, effects = c("fixed", "random", "all"), component = c("all", "conditional", "zi", "zero_inflated", "dispersion", "instruments", "correlation", "smooth_terms"), flatten = FALSE, verbose = TRUE, ... )
model |
Model object |
trend_effects |
|
... |
Additional arguments are passed to the |
coefs |
vector of coefficients to insert in the model object |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
newdata |
Grid of predictor values at which we evaluate the slopes.
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
mfx |
Ignored |
newparams |
Ignored |
ndraws |
Ignored |
se.fit |
Ignored |
process_error |
|
x |
A fitted model. |
source |
String, indicating from where data should be recovered. If
|
verbose |
Toggle messages and warnings. |
effects |
Should model data for fixed effects ( |
component |
Which type of parameters to return, such as parameters for the conditional model, the zero-inflated part of the model, the dispersion term, the instrumental variables or marginal effects be returned? Applies to models with zero-inflated and/or dispersion formula, or to models with instrumental variables (so called fixed-effects regressions), or models with marginal effects (from mfx). See details in section Model Components .May be abbreviated. Note that the conditional component also refers to the count or mean component - names may differ, depending on the modeling package. There are three convenient shortcuts (not applicable to all model classes):
|
flatten |
Logical, if |
Objects suitable for internal 'marginaleffects' functions to proceed.
See marginaleffects::get_coef(), marginaleffects::set_coef(),
marginaleffects::get_vcov(), marginaleffects::get_predict(),
insight::get_data() and insight::find_predictors() for details
Nicholas J Clark
mvgam_residcor object descriptionA mvgam_residcor object returned by function residual_cor().
Run methods(class = "mvgam_residcor") to see an overview of available methods.
Hui (2016) provides an excellent description of the quantities that this function calculates, so this passage is heavily paraphrased from his associated boral package.
In latent factor models, the residual covariance matrix is calculated
based on the matrix of latent factor loading matrix , where the residual covariance
matrix . A strong residual covariance/correlation matrix
between two species can be interpreted as evidence of species interactions (e.g.,
facilitation or competition),
missing covariates, as well as any additional species correlation not accounted for by shared
environmental captured in formula.
The residual precision matrix (also known as partial correlation matrix, Ovaskainen et al., 2016) is defined as the inverse of the residual correlation matrix. The precision matrix is often used to identify direct or causal relationships between two species e.g., two species can have a zero precision but still be correlated, which can be interpreted as saying that two species are not directly associated, but they are still correlated through other species. In other words, they are conditionally independent given the other species. It is important that the precision matrix does not exhibit the exact same properties of the correlation e.g., the diagonal elements are not equal to 1. Nevertheless, relatively larger values of precision may imply stronger direct relationships between two species.
In addition to the residual correlation and precision matrices, the median or mean point estimator
of trace of the residual covariance matrix is returned,
. Often used in other areas of multivariate
statistics, the trace may be interpreted as the amount of covariation explained by the latent factors.
One situation where the trace may be useful is when comparing a pure latent factor model
(where no terms are suppled to formula) versus a model with latent
factors and some additional predictors in formula – the proportional difference in trace
between these two models may be interpreted as the proportion of covariation between species explained
by the predictors in formula. Of course, the trace itself is random due to the MCMC sampling, and so it
is not always guaranteed to produce sensible answers.
Objects of this class are structured as a list with the following components:
cor, cor_lower, cor_upper
|
A set of |
sig_cor |
A |
prec, prec_lower, prec_upper
|
A set of |
sig_prec |
A |
cov |
A |
trace |
The median/mean point estimator of the trace (sum of the diagonal elements)
of the residual covariance matrix |
Nicholas J Clark
Francis KC Hui (2016). BORAL - Bayesian ordination and regression analysis of
multivariate abundance data in R. Methods in Ecology and Evolution. 7, 744-750.
Otso Ovaskainen et al. (2016). Using latent variable models to identify large networks of
species-to-species associations at different spatial scales. Methods in Ecology and Evolution,
7, 549-555.
Supported latent trend models in mvgam
mvgam currently supports the following dynamic trend models:
None (no latent trend component; i.e. the GAM component is all
that contributes to the linear predictor, and the observation process is
the only source of error; similar to what is estimated by
gam)
ZMVN() (zero-mean correlated errors, useful for modelling time
series where no autoregressive terms are needed or for modelling data
that are not sampled as time series)
RW()
AR(p = 1, 2, or 3)
CAR(p = 1) (continuous time autoregressive trends; only available
in Stan)
VAR() (only available in Stan)
PW() (piecewise linear or logistic trends; only available in
Stan)
GP() (Gaussian Process with squared exponential kernel; only
available in Stan)
For most dynamic trend types available in mvgam (see argument
trend_model), time should be measured in discrete, regularly spaced
intervals (i.e. c(1, 2, 3, ...)). However, you can use irregularly spaced
intervals if using trend_model = CAR(1), though note that any temporal
intervals that are exactly 0 will be adjusted to a very small number
(1e-12) to prevent sampling errors.
For all autoregressive trend types apart from CAR(), moving average and/or
correlated process error terms can also be estimated (for example,
RW(cor = TRUE) will set up a multivariate Random Walk if data contains
>1 series). Hierarchical process error correlations can also be handled if
the data contain relevant observation units that are nested into relevant
grouping and subgrouping levels (i.e. using
AR(gr = region, subgr = species)).
Note that only RW, AR1, AR2 and AR3 are available if using JAGS.
All trend models are supported if using Stan.
Dynamic factor models can be used in which the latent factors evolve as
either RW, AR1-3, VAR or GP. For VAR models (i.e. VAR and
VARcor models), users can either fix the trend error covariances to be 0
(using VAR) or estimate them and potentially allow for contemporaneously
correlated errors using VARcor.
For all VAR models, stationarity of the latent process is enforced through
the prior using the parameterisation given by Heaps (2022). Stationarity is
not enforced when using AR1, AR2 or AR3 models, though this can be
changed by the user by specifying lower and upper bounds on autoregressive
parameters using functionality in get_mvgam_priors and the priors
argument in mvgam.
Piecewise trends follow the formulation in the popular prophet package
produced by Facebook, where users can allow for changepoints to control
the potential flexibility of the trend. See Taylor and Letham (2018) for
details.
Sarah E. Heaps (2022) Enforcing stationarity through the prior in Vector Autoregressions. Journal of Computational and Graphical Statistics. 32:1, 1–10.
Sean J. Taylor and Benjamin Letham (2018) Forecasting at scale. The American Statistician 72.1, 37–45.
RW,
AR,
CAR,
VAR,
PW,
GP,
ZMVN
mvgam is a package for fitting dynamic generalized additive models (GAMs) to univariate or multivariate data. It combines the flexibility of smooth functions with latent temporal processes to model autocorrelation, seasonality, and uncertainty. The package supports both univariate and multivariate time series, making it especially useful for ecological and environmental forecasting. Bayesian inference via Stan allows for full uncertainty quantification and forecasting in complex, non-Gaussian settings.
This help page provides external links to example applications and discussions relevant to the use of mvgam models. These examples span non-Gaussian time series modelling, multivariate abundance forecasting, and the use of complex predictors such as time-varying seasonality, monotonic nonlinear effects and Gaussian processes.
Non-Gaussian time series modelling and forecasting
mvgam is designed for real-world time series data that include discrete, zero-inflated, or overdispersed observations. It supports latent dynamic components and smooth terms to model autocorrelation, trends, and uncertainty.
Uncertain serial autocorrelation in GAM count model residuals
Fitting an autoregressive model and Poisson process interdependently
Visualising autocorrelation in irregularly spaced count data
Video tutorial: Ecological forecasting with Dynamic Generalized Additive Models
Multivariate time series modelling and forecasting
mvgam supports multivariate models with shared or correlated latent trends, making it suitable for a broad range of applications that gather data on multiple time series simultaneously.
Ecological modelling: multivariate abundance time-series data
Chains stuck in a local optimum: correlated Poisson distributions
Blog post: Hierarchical distributed lag models in mgcv and mvgam
Video tutorial: Time series in R and Stan using the mvgam package: hierarchical GAMs
Seasonality and other complex predictors
mvgam allows for flexible modelling of seasonal patterns and nonlinear effects using cyclic smooths, Gaussian processes, monotonic smooths and hierarchical structures.
Fitting a GAM with double seasonality to a daily time series
Blog post: Incorporating time-varying seasonality in forecast models
Video tutorial: Time series in R and Stan using the mvgam package: an introduction
Nicholas J Clark
mvgam object descriptionA fitted mvgam object returned by function mvgam.
Run methods(class = "mvgam") to see an overview of available methods.
A mvgam object contains the following elements:
call the original observation model formula
trend_call If a trend_formula was supplied, the original trend model
formula is returned. Otherwise NULL
family character description of the observation distribution
trend_model character description of the latent trend model
trend_map data.frame describing the mapping of trend states to
observations, if supplied in the original model. Otherwise NULL
drift Logical specifying whether a drift term was used in the trend
model
priors If the model priors were updated from their defaults, the prior
dataframe will be returned. Otherwise NULL
model_output The MCMC object returned by the fitting engine. If the
model was fitted using Stan, this will be an object of class stanfit
(see stanfit-class for details). If JAGS was used
as the backend, this will be an object of class runjags (see
runjags-class for details)
model_file The character string model file used to describe the model
in either Stan or JAGS syntax
model_data If return_model_data was set to TRUE when fitting the
model, the list object containing all data objects needed to condition
the model is returned. Each item in the list is described in detail at
the top of the model_file. Otherwise NULL
inits If return_model_data was set to TRUE when fitting the model,
the initial value functions used to initialise the MCMC chains will be
returned. Otherwise NULL
monitor_pars The parameters that were monitored during MCMC sampling
are returned as a character vector
sp_names A character vector specifying the names for each smoothing
parameter
mgcv_model An object of class gam containing the mgcv version of
the observation model. This object is used for generating the linear
predictor matrix when making predictions for new data. The coefficients
in this model object will contain the posterior median coefficients from
the GAM linear predictor, but these are only used if generating plots of
smooth functions that mvgam currently cannot handle (such as plots for
three-dimensional smooths). This model therefore should not be used for
inference. See gamObject for details
trend_mgcv_model If a trend_formula was supplied, an object of class
gam containing the mgcv version of the trend model. Otherwise NULL
ytimes The matrix object used in model fitting for indexing which
series and timepoints were observed in each row of the supplied data.
Used internally by some downstream plotting and prediction functions
resids A named list object containing posterior draws of Dunn-Smyth
randomized quantile residuals
use_lv Logical flag indicating whether latent dynamic factors were used
in the model
n_lv If use_lv == TRUE, the number of latent dynamic factors used in
the model
upper_bounds If bounds were supplied in the original model fit, they
will be returned. Otherwise NULL
obs_data The original data object (either a list or dataframe)
supplied in model fitting.
test_data If test data were supplied (as argument newdata in the
original model), it will be returned. Othwerise NULL
fit_engine Character describing the fit engine, either as stan or
jags
backend Character describing the backend used for modelling, either
as rstan, cmdstanr or rjags
algorithm Character describing the algorithm used for finding the
posterior, either as sampling, laplace, pathfinder, meanfield or
fullrank
max_treedepth If the model was fitted using Stan, the value supplied
for the maximum treedepth tuning parameter is returned (see
stan for details). Otherwise NULL
adapt_delta If the model was fitted using Stan, the value supplied
for the adapt_delta tuning parameter is returned (see
stan for details). Otherwise NULL
Nicholas J Clark
Plot an ordination of latent variables and their factor loadings from
jsdgam models
ordinate(object, ...) ## S3 method for class 'jsdgam' ordinate( object, which_lvs = c(1, 2), biplot = TRUE, alpha = 0.5, label_sites = TRUE, ... )ordinate(object, ...) ## S3 method for class 'jsdgam' ordinate( object, which_lvs = c(1, 2), biplot = TRUE, alpha = 0.5, label_sites = TRUE, ... )
object |
|
... |
ignored |
which_lvs |
A |
biplot |
|
alpha |
A proportional numeric scalar between |
label_sites |
|
This function constructs a two-dimensional scatterplot in ordination space.
The chosen latent variables are first re-rotated using singular value
decomposition, so that the first plotted latent variable does not have to
be the first latent variable that was estimated in the original model.
Posterior median estimates of the variables and the species' loadings on
these variables are then used to construct the resulting plot. Some attempt
at de-cluttering the resulting plot is made by using geom_label_repel()
and geom_text_repel from the ggrepel package, but if there are many
sites and/or species then some labels may be removed automatically. Note
that you can typically get better, more readable plot layouts if you also
have the ggarrow and ggpp packages installed
An ggplot object
Nicholas J Clark
## Not run: # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot a residual ordination biplot ordinate( mod, alpha = 0.7 ) # Compare to a residual correlation plot plot( residual_cor(mod) ) ## End(Not run)## Not run: # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot a residual ordination biplot ordinate( mod, alpha = 0.7 ) # Compare to a residual correlation plot plot( residual_cor(mod) ) ## End(Not run)
mvgam objectA pairs
method that is customized for MCMC output.
## S3 method for class 'mvgam' pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)## S3 method for class 'mvgam' pairs(x, variable = NULL, regex = FALSE, use_alias = TRUE, ...)
x |
An object of class |
variable |
Names of the variables (parameters) to plot, as given by a
character vector or a regular expression (if |
regex |
Logical; Indicates whether |
use_alias |
Logical. If more informative names for parameters are
available (i.e. for beta coefficients |
... |
Further arguments to be passed to
|
For a detailed description see
mcmc_pairs.
Plottable objects whose classes depend on the arguments supplied.
See mcmc_pairs for details.
## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE) ## End(Not run)## Not run: simdat <- sim_mvgam(n_series = 1, trend_model = 'AR1') mod <- mvgam(y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2) pairs(mod) pairs(mod, variable = c('ar1', 'sigma'), regex = TRUE) ## End(Not run)
This function takes a fitted mvgam object and returns plots and
summary statistics for the latent dynamic factors
plot_mvgam_factors(object, plot = TRUE)plot_mvgam_factors(object, plot = TRUE)
object |
|
plot |
|
If the model in object was estimated using dynamic factors,
it is possible that not all factors contributed to the estimated trends.
This is due to the regularisation penalty that acts independently on each
factor's Gaussian precision, which will squeeze un-needed factors to a
white noise process (effectively dropping that factor from the model). In
this function, each factor is tested against a null hypothesis of white
noise by calculating the sum of the factor's 2nd derivatives. A factor
that has a larger contribution will have a larger sum due to the weaker
penalty on the factor's precision. If plot == TRUE, the factors
are also plotted.
A data.frame of factor contributions
Nicholas J Clark
## Not run: simdat <- sim_mvgam() mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, chains = 2, silent = 2 ) plot_mvgam_factors(mod) ## End(Not run)## Not run: simdat <- sim_mvgam() mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), use_lv = TRUE, n_lv = 2, data = simdat$data_train, chains = 2, silent = 2 ) plot_mvgam_factors(mod) ## End(Not run)
Plot posterior forecast predictions from mvgam models
plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) ## S3 method for class 'mvgam_forecast' plot( x, series = 1, realisations = FALSE, n_realisations = 15, xlab, ylab, ylim, ... )plot_mvgam_fc( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, hide_xlabels = FALSE, xlab, ylab, ylim, n_cores = 1, return_forecasts = FALSE, return_score = FALSE, ... ) ## S3 method for class 'mvgam_forecast' plot( x, series = 1, realisations = FALSE, n_realisations = 15, xlab, ylab, ylim, ... )
object |
|
series |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
realisations |
|
n_realisations |
|
hide_xlabels |
|
xlab |
Label for x axis |
ylab |
Label for y axis |
ylim |
Optional |
n_cores |
|
return_forecasts |
|
return_score |
|
... |
Further |
x |
Object of class |
plot_mvgam_fc generates posterior predictions from an object of
class mvgam, calculates posterior empirical quantiles and plots
them against the observed data. If realisations = FALSE, the returned
plot shows 90, 60, 40 and 20 percent posterior quantiles (as ribbons of
increasingly darker shades of red) as well as the posterior median (as a
dark red line). If realisations = TRUE, a set of n_realisations
posterior draws are shown. This function produces an older style base
R plot, as opposed to plot.mvgam_forecast
plot.mvgam_forecast takes an object of class mvgam_forecast, in which
forecasts have already been computed, and plots the resulting forecast
distribution as a ggplot object. This function is therefore more
versatile and is recommended over the older and clunkier
plot_mvgam_fc version
If realisations = FALSE, these posterior quantiles are plotted
along with the true observed data that was used to train the model.
Otherwise, a spaghetti plot is returned to show possible forecast paths.
A base R graphics plot (for plot_mvgam_fc) or a ggplot
object (for plot.mvgam_forecast) and an optional list containing
the forecast distribution and the out of sample probabilistic forecast
score
Nicholas J Clark
## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) ## End(Not run)## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Hindcasts on response scale hc <- hindcast(mod) str(hc) plot(hc, series = 1) plot(hc, series = 2) plot(hc, series = 3) # Forecasts on response scale fc <- forecast( mod, newdata = simdat$data_test ) str(fc) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Forecasts as expectations fc <- forecast( mod, newdata = simdat$data_test, type = 'expected' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) # Dynamic trend extrapolations fc <- forecast( mod, newdata = simdat$data_test, type = 'trend' ) plot(fc, series = 1) plot(fc, series = 2) plot(fc, series = 3) ## End(Not run)
This function plots posterior empirical quantiles for partial effects of parametric terms
plot_mvgam_pterms(object, trend_effects = FALSE)plot_mvgam_pterms(object, trend_effects = FALSE)
object |
|
trend_effects |
logical. If |
Posterior empirical quantiles of each parametric term's partial
effect estimates (on the link scale) are calculated and visualised as
ribbon plots. These effects can be interpreted as the partial effect that
a parametric term contributes when all other terms in the model have been
set to 0
A base R graphics plot
Nicholas J Clark
This function plots posterior empirical quantiles for random effect smooths (bs = re)
plot_mvgam_randomeffects(object, trend_effects = FALSE)plot_mvgam_randomeffects(object, trend_effects = FALSE)
object |
|
trend_effects |
logical. If |
Posterior empirical quantiles of random effect coefficient estimates (on the link scale) are calculated and visualised as ribbon plots. Labels for coefficients are taken from the levels of the original factor variable that was used to specify the smooth in the model's formula
A base R graphics plot
Nicholas J Clark
This function takes a fitted mvgam object and returns various
residual diagnostic plots
plot_mvgam_resids(object, series = 1, n_draws = 100L, n_points = 1000L)plot_mvgam_resids(object, series = 1, n_draws = 100L, n_points = 1000L)
object |
|
series |
|
n_draws |
|
n_points |
|
A total of four ggplot plots are generated to examine posterior
Dunn-Smyth residuals for the specified series. Plots include a residuals
vs fitted values plot, a Q-Q plot, and two plots to check for any
remaining temporal autocorrelation in the residuals. Note, all plots only
report statistics from a sample of up to 100 posterior draws (to save
computational time), so uncertainty in these relationships may not be
adequately represented.
A facetted ggplot object
Nicholas J Clark
Nicholas J Clark and Matthijs Hollanders
## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Plot Dunn Smyth residuals for some series plot_mvgam_resids(mod) plot_mvgam_resids(mod, series = 2) ## End(Not run)## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Plot Dunn Smyth residuals for some series plot_mvgam_resids(mod) plot_mvgam_resids(mod, series = 2) ## End(Not run)
This function takes either a fitted mvgam object or a
data.frame object and produces plots of observed time series, ACF,
CDF and histograms for exploratory data analysis
plot_mvgam_series( object, data, newdata, y = "y", lines = TRUE, series = 1, n_bins = NULL, log_scale = FALSE )plot_mvgam_series( object, data, newdata, y = "y", lines = TRUE, series = 1, n_bins = NULL, log_scale = FALSE )
object |
Optional |
data |
Optional |
newdata |
Optional |
y |
Character. What is the name of the outcome variable in the supplied
data? Defaults to |
lines |
Logical. If |
series |
Either an |
n_bins |
|
log_scale |
|
A set of ggplot objects. If series is an integer, the plots
will show observed time series, autocorrelation and cumulative
distribution functions, and a histogram for the series. If
series == 'all', a set of observed time series plots is returned in
which all series are shown on each plot but only a single focal series is
highlighted, with all remaining series shown as faint gray lines.
Nicholas J Clark and Matthijs Hollanders
# Simulate and plot series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' ) plot_mvgam_series( data = sim_data$data_train, newdata = sim_data$data_test, series = 1 ) # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' )# Simulate and plot series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' ) plot_mvgam_series( data = sim_data$data_train, newdata = sim_data$data_test, series = 1 ) # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series( data = sim_data$data_train, series = 'all' )
This function plots posterior empirical quantiles for a series-specific smooth term
plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata )plot_mvgam_smooth( object, trend_effects = FALSE, series = 1, smooth, residuals = FALSE, n_resid_bins = 25, realisations = FALSE, n_realisations = 15, derivatives = FALSE, newdata )
object |
|
trend_effects |
logical. If |
series |
|
smooth |
Either a |
residuals |
|
n_resid_bins |
|
realisations |
|
n_realisations |
|
derivatives |
|
newdata |
Optional |
Smooth functions are shown as empirical quantiles (or spaghetti
plots) of posterior partial expectations across a sequence of values
between the variable's min and max, while zeroing out
effects of all other variables. At present, only univariate and bivariate
smooth plots are allowed, though note that bivariate smooths rely on
default behaviour from plot.gam. plot_mvgam_smooth
generates posterior predictions from an object of class mvgam,
calculates posterior empirical quantiles and plots them. If
realisations = FALSE, the returned plot shows 90, 60, 40 and 20 percent
posterior quantiles (as ribbons of increasingly darker shades of red) as
well as the posterior median (as a dark red line). If
realisations = TRUE, a set of n_realisations posterior draws are
shown. For more nuanced visualisation, supply newdata just as you
would when predicting from a gam model or use the more
flexible conditional_effects.mvgam. Alternatively, if you
prefer to use partial effect plots in the style of gratia, and if you
have the gratia package installed, you can use draw.mvgam. See
gratia_mvgam_enhancements for details.
A base R graphics plot
Nicholas J Clark
plot.gam,
conditional_effects.mvgam,
gratia_mvgam_enhancements
Plot latent trend predictions from mvgam models
plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, xlab, ylab )plot_mvgam_trend( object, series = 1, newdata, data_test, realisations = FALSE, n_realisations = 15, n_cores = 1, derivatives = FALSE, xlab, ylab )
object |
|
series |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
realisations |
|
n_realisations |
|
n_cores |
Deprecated. Parallel processing is no longer supported |
derivatives |
|
xlab |
Label for x axis |
ylab |
Label for y axis |
A ggplot object
Nicholas J Clark
## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) # Plot estimated trends for some series plot_mvgam_trend(mod) plot_mvgam_trend(mod, series = 2) # Extrapolate trends forward in time and plot on response scale plot_mvgam_trend( mod, newdata = simdat$data_test ) plot_mvgam_trend( mod, newdata = simdat$data_test, series = 2 ) # But it is recommended to compute extrapolations for all series # first and then plot trend_fc <- forecast( mod, newdata = simdat$data_test ) plot(trend_fc, series = 1) plot(trend_fc, series = 2) ## End(Not run)## Not run: simdat <- sim_mvgam( n_series = 3, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) # Plot estimated trends for some series plot_mvgam_trend(mod) plot_mvgam_trend(mod, series = 2) # Extrapolate trends forward in time and plot on response scale plot_mvgam_trend( mod, newdata = simdat$data_test ) plot_mvgam_trend( mod, newdata = simdat$data_test, series = 2 ) # But it is recommended to compute extrapolations for all series # first and then plot trend_fc <- forecast( mod, newdata = simdat$data_test ) plot(trend_fc, series = 1) plot(trend_fc, series = 2) ## End(Not run)
Plot forecast uncertainty contributions from mvgam models
plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = "topleft", hide_xlabels = FALSE )plot_mvgam_uncertainty( object, series = 1, newdata, data_test, legend_position = "topleft", hide_xlabels = FALSE )
object |
|
series |
|
newdata |
A |
data_test |
Deprecated. Still works in place of |
legend_position |
The location may also be specified by setting x to a single keyword from the list: "none", "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location (if it is not "none"). |
hide_xlabels |
|
The basic idea of this function is to compute forecasts by ignoring
one of the two primary components in a correlated residual model (i.e. by
either ignoring the linear predictor effects or by ignoring the residual
dynamics). Some caution is required however, as this function was designed
early in the mvgam development cycle and there are now many types of
models that it cannot handle very well. For example, models with shared
latent states, or any type of State-Space models that include terms in the
trend_formula, will either fail or give nonsensical results. Improvements
are in the works to provide a more general way to decompose forecast
uncertainties, so please check back at a later date.
A base R graphics plot
This function takes a fitted mvgam object and produces plots of
smooth functions, forecasts, trends and uncertainty components
## S3 method for class 'mvgam' plot( x, type = "residuals", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... )## S3 method for class 'mvgam' plot( x, type = "residuals", series = 1, residuals = FALSE, newdata, data_test, trend_effects = FALSE, ... )
x |
|
type |
|
series |
|
residuals |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
trend_effects |
logical. If |
... |
Additional arguments for each individual plotting function. |
These plots are useful for getting an overview of the fitted
model and its estimated random effects or smooth functions, but the
individual plotting functions and the functions from the marginaleffects
and gratia packages offer far more customisation.
A base R plot or set of plots
Nicholas J Clark
plot_mvgam_resids,
plot_mvgam_smooth,
plot_mvgam_fc,
plot_mvgam_trend,
plot_mvgam_uncertainty,
plot_mvgam_factors,
plot_mvgam_randomeffects,
conditional_effects.mvgam,
plot_predictions,
plot_slopes,
gratia_mvgam_enhancements
## Not run: # Simulate some time series dat <- sim_mvgam( T = 80, n_series = 3 ) # Fit a basic model mod <- mvgam( y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with 'marginaleffects' utilities library(marginaleffects) plot_predictions( mod, condition = 'season', type = 'link' ) plot_predictions( mod, condition = c('season', 'series', 'series'), type = 'link' ) plot_predictions( mod, condition = 'series', type = 'link' ) # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam( y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) plot(mod, type = 'smooths', trend_effects = TRUE) # But 'marginaleffects' functions work without any modification plot_predictions( mod, condition = 'season', type = 'link' ) ## End(Not run)## Not run: # Simulate some time series dat <- sim_mvgam( T = 80, n_series = 3 ) # Fit a basic model mod <- mvgam( y ~ s(season, bs = 'cc') + s(series, bs = 're'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) # Plot predictions and residuals for each series plot(mod, type = 'forecast', series = 1) plot(mod, type = 'forecast', series = 2) plot(mod, type = 'forecast', series = 3) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) plot(mod, type = 'residuals', series = 3) # Plot model effects plot(mod, type = 'smooths') plot(mod, type = 're') # More flexible plots with 'marginaleffects' utilities library(marginaleffects) plot_predictions( mod, condition = 'season', type = 'link' ) plot_predictions( mod, condition = c('season', 'series', 'series'), type = 'link' ) plot_predictions( mod, condition = 'series', type = 'link' ) # When using a State-Space model with predictors on the process # model, set trend_effects = TRUE to visualise process effects mod <- mvgam( y ~ -1, trend_formula = ~ s(season, bs = 'cc'), data = dat$data_train, trend_model = RW(), chains = 2, silent = 2 ) plot(mod, type = 'smooths', trend_effects = TRUE) # But 'marginaleffects' functions work without any modification plot_predictions( mod, condition = 'season', type = 'link' ) ## End(Not run)
mvgam_fevd objectThis function takes an mvgam_fevd object and produces
a plot of the posterior median contributions to forecast variance for each series
in the fitted Vector Autoregression
## S3 method for class 'mvgam_fevd' plot(x, ...)## S3 method for class 'mvgam_fevd' plot(x, ...)
x |
|
... |
ignored |
A ggplot object,
which can be further customized using the ggplot2 package
Nicholas J Clark
mvgam_irf objectThis function takes an mvgam_irf object and produces plots of
Impulse Response Functions
## S3 method for class 'mvgam_irf' plot(x, series = 1, ...)## S3 method for class 'mvgam_irf' plot(x, series = 1, ...)
x |
|
series |
|
... |
ignored |
A ggplot object showing the expected response of each latent time
series to a shock of the focal series
Nicholas J Clark
mvgam_lfo objectThis function takes an object of class mvgam_lfo and creates several
informative diagnostic plots
## S3 method for class 'mvgam_lfo' plot(x, ...)## S3 method for class 'mvgam_lfo' plot(x, ...)
x |
An object of class |
... |
Ignored |
A ggplot object presenting Pareto-k and ELPD values over the
evaluation timepoints. For the Pareto-k plot, a dashed red line indicates the
specified threshold chosen for triggering model refits. For the ELPD plot,
a dashed red line indicates the bottom 10% quantile of ELPD values. Points below
this threshold may represent outliers that were more difficult to forecast
Plot residual correlation estimates from Joint Species Distribution
(jsdgam) or dynamic factor (mvgam) models
## S3 method for class 'mvgam_residcor' plot(x, cluster = FALSE, ...)## S3 method for class 'mvgam_residcor' plot(x, cluster = FALSE, ...)
x |
|
cluster |
Logical. Should the variables be re-arranged within the plot
to group the correlation matrix into clusters of positive and negative correlations?
Defaults to |
... |
ignored |
This function plots the significant residual correlations from a
mvgam_residcor object, whereby the posterior mean (if robust = FALSE)
or posterior median (if robust = TRUE) correlations are shown
only those correlations whose credible interval does not contain zero. All other
correlations are set to zero in the returned plot
A ggplot object
Nicholas J Clark
jsdgam(), lv_correlations(), residual_cor()
A dataset containing time series of total captures (across all control plots) for select rodent species from the Portal Project
portal_dataportal_data
A data.frame containing the following fields:
time of sampling, in lunar monthly cycles
factor indicator of the time series, i.e. the species
total captures across all control plots at each time point
12-month moving average of the mean Normalised Difference Vegetation Index
monthly mean of minimum temperature
https://github.com/weecology/PortalData/blob/main/SiteandMethods/Methods.md
Compute posterior draws of the expected value of the posterior predictive
distribution (i.e. the conditional expectation). Can be performed for the
data used to fit the model (posterior predictive checks) or for new data.
By definition, these predictions have smaller variance than the posterior
predictions performed by the posterior_predict.mvgam method.
This is because only the uncertainty in the expected value of the posterior
predictive distribution is incorporated in the draws computed by
posterior_epred while the residual error is ignored there. However,
the estimated means of both methods averaged across draws should be very
similar.
## S3 method for class 'mvgam' posterior_epred( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )## S3 method for class 'mvgam' posterior_epred( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
ndraws |
Positive |
process_error |
|
... |
Ignored |
Note that for all types of predictions for models that did not
include a trend_formula, uncertainty in the dynamic trend component can
be ignored by setting process_error = FALSE. However, if a
trend_formula was supplied in the model, predictions for this component
cannot be ignored. If process_error = TRUE, trend predictions will
ignore autocorrelation coefficients or GP length scale coefficients,
ultimately assuming the process is stationary. This method is similar to
the types of posterior predictions returned from brms models when using
autocorrelated error predictions for newdata. This function is therefore
more suited to posterior simulation from the GAM components of a
mvgam model, while the forecasting functions
plot_mvgam_fc and forecast.mvgam are better
suited to generate h-step ahead forecasts that respect the temporal
dynamics of estimated latent trends.
A matrix of dimension n_samples x n_obs, where
n_samples is the number of posterior samples from the fitted object
and n_obs is the number of observations in newdata
Nicholas J Clark
hindcast.mvgam,
posterior_linpred.mvgam,
posterior_predict.mvgam
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior expectations expectations <- posterior_epred(mod) str(expectations) ## End(Not run)
Compute posterior draws of the linear predictor, that is draws before applying any link functions or other transformations. Can be performed for the data used to fit the model (posterior predictive checks) or for new data.
## S3 method for class 'mvgam' posterior_linpred( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... )## S3 method for class 'mvgam' posterior_linpred( object, transform = FALSE, newdata, ndraws = NULL, data_test, process_error = TRUE, ... )
object |
|
transform |
|
newdata |
Optional |
ndraws |
Positive |
data_test |
Deprecated. Still works in place of |
process_error |
|
... |
Ignored |
Note that for all types of predictions for models that did not
include a trend_formula, uncertainty in the dynamic trend component can
be ignored by setting process_error = FALSE. However, if a
trend_formula was supplied in the model, predictions for this component
cannot be ignored. If process_error = TRUE, trend predictions will
ignore autocorrelation coefficients or GP length scale coefficients,
ultimately assuming the process is stationary. This method is similar to
the types of posterior predictions returned from brms models when using
autocorrelated error predictions for newdata. This function is therefore
more suited to posterior simulation from the GAM components of a
mvgam model, while the forecasting functions
plot_mvgam_fc and forecast.mvgam are better
suited to generate h-step ahead forecasts that respect the temporal
dynamics of estimated latent trends.
A matrix of dimension n_samples x n_obs, where
n_samples is the number of posterior samples from the fitted object
and n_obs is the number of observations in newdata
Nicholas J Clark
hindcast.mvgam,
posterior_epred.mvgam,
posterior_predict.mvgam
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam( n_series = 1, trend_model = AR() ) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract linear predictor values linpreds <- posterior_linpred(mod) str(linpreds) ## End(Not run)
Compute posterior draws of the posterior predictive distribution. Can be
performed for the data used to fit the model (posterior predictive checks)
or for new data. By definition, these draws have higher variance than draws
of the expected value of the posterior predictive distribution computed by
posterior_epred.mvgam. This is because the residual error is
incorporated in posterior_predict. However, the estimated means of
both methods averaged across draws should be very similar.
## S3 method for class 'mvgam' posterior_predict( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )## S3 method for class 'mvgam' posterior_predict( object, newdata, data_test, ndraws = NULL, process_error = TRUE, ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
ndraws |
Positive |
process_error |
Logical. If |
... |
Ignored |
Note that for all types of predictions for models that did not
include a trend_formula, uncertainty in the dynamic trend component can
be ignored by setting process_error = FALSE. However, if a
trend_formula was supplied in the model, predictions for this component
cannot be ignored. If process_error = TRUE, trend predictions will
ignore autocorrelation coefficients or GP length scale coefficients,
ultimately assuming the process is stationary. This method is similar to
the types of posterior predictions returned from brms models when using
autocorrelated error predictions for newdata. This function is therefore
more suited to posterior simulation from the GAM components of a
mvgam model, while the forecasting functions
plot_mvgam_fc and forecast.mvgam are better
suited to generate h-step ahead forecasts that respect the temporal
dynamics of estimated latent trends.
A matrix of dimension n_samples x new_obs, where
n_samples is the number of posterior samples from the fitted object
and n_obs is the number of observations in newdata
Nicholas J Clark
hindcast.mvgam,
posterior_linpred.mvgam,
posterior_epred.mvgam
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), data = simdat$data_train, chains = 2, silent = 2 ) # Compute posterior predictions predictions <- posterior_predict(mod) str(predictions) ## End(Not run)
mvgam modelsPerform unconditional posterior predictive checks with the help of the bayesplot package.
## S3 method for class 'mvgam' pp_check( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... )## S3 method for class 'mvgam' pp_check( object, type, ndraws = NULL, prefix = c("ppc", "ppd"), group = NULL, x = NULL, newdata = NULL, ... )
object |
An object of class |
type |
Type of the ppc plot as given by a character string.
See |
ndraws |
Positive integer indicating how many
posterior draws should be used.
If |
prefix |
The prefix of the bayesplot function to be applied. Either '"ppc"' (posterior predictive check; the default) or '"ppd"' (posterior predictive distribution), the latter being the same as the former except that the observed data is not shown for '"ppd"'. |
group |
Optional name of a factor variable in the model
by which to stratify the ppc plot. This argument is required for
ppc |
x |
Optional name of a variable in the model.
Only used for ppc types having an |
newdata |
Optional |
... |
Further arguments passed to |
Unlike the conditional posterior checks provided by ppc,
This function computes unconditional posterior predictive checks (i.e. it generates
predictions for fake data without considering the true observations associated with those
fake data). For a detailed explanation of each of the ppc functions,
see the PPC
documentation of the bayesplot
package.
A ggplot object that can be further customized using the ggplot2 package.
Nicholas J Clark
## Not run: simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) # Use pp_check(mod, type = "xyz") for a list of available plot types # Default is a density overlay for all observations pp_check(mod) # Rootograms particularly useful for count data pp_check(mod, type = "rootogram") # Grouping plots by series is useful pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "stat_freqpoly_grouped", group = "series", ndraws = 50 ) # Several types can be used to plot distributions of randomized # quantile residuals pp_check( object = mod, x = "season", type = "resid_ribbon" ) pp_check( object = mod, x = "season", group = "series", type = "resid_ribbon_grouped" ) pp_check(mod, ndraws = 5, type = "resid_hist_grouped", group = "series" ) # Custom functions accepted pp_check(mod, type = "stat", stat = function(x) mean(x == 0)) pp_check(mod, type = "stat_grouped", stat = function(x) mean(x == 0), group = "series" ) # Some functions accept covariates to set the x-axes pp_check(mod, x = "season", type = "ribbon_grouped", prob = 0.5, prob_outer = 0.8, group = "series" ) # Many plots can be made without the observed data pp_check(mod, prefix = "ppd") ## End(Not run)## Not run: simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) # Use pp_check(mod, type = "xyz") for a list of available plot types # Default is a density overlay for all observations pp_check(mod) # Rootograms particularly useful for count data pp_check(mod, type = "rootogram") # Grouping plots by series is useful pp_check(mod, type = "bars_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "ecdf_overlay_grouped", group = "series", ndraws = 50 ) pp_check(mod, type = "stat_freqpoly_grouped", group = "series", ndraws = 50 ) # Several types can be used to plot distributions of randomized # quantile residuals pp_check( object = mod, x = "season", type = "resid_ribbon" ) pp_check( object = mod, x = "season", group = "series", type = "resid_ribbon_grouped" ) pp_check(mod, ndraws = 5, type = "resid_hist_grouped", group = "series" ) # Custom functions accepted pp_check(mod, type = "stat", stat = function(x) mean(x == 0)) pp_check(mod, type = "stat_grouped", stat = function(x) mean(x == 0), group = "series" ) # Some functions accept covariates to set the x-axes pp_check(mod, x = "season", type = "ribbon_grouped", prob = 0.5, prob_outer = 0.8, group = "series" ) # Many plots can be made without the observed data pp_check(mod, prefix = "ppd") ## End(Not run)
Plot conditional posterior predictive checks from mvgam models
ppc(object, ...) ## S3 method for class 'mvgam' ppc( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... )ppc(object, ...) ## S3 method for class 'mvgam' ppc( object, newdata, data_test, series = 1, type = "hist", n_bins, legend_position, xlab, ylab, ... )
object |
|
... |
Further |
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
series |
|
type |
|
n_bins |
|
legend_position |
The location may also be specified by setting x to a single keyword from the list "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center". This places the legend on the inside of the plot frame at the given location. Or alternatively, use "none" to hide the legend. |
xlab |
Label for x axis |
ylab |
Label for y axis |
Conditional posterior predictions are drawn from the fitted
mvgam and compared against the empirical distribution of the
observed data for a specified series to help evaluate the model's ability
to generate unbiased predictions. For all plots apart from
type = 'rootogram', posterior predictions can also be compared to out
of sample observations as long as these observations were included as
'data_test' in the original model fit and supplied here. Rootograms are
currently only plotted using the 'hanging' style.
Note that the predictions used for these plots are conditional on
the observed data, i.e. they are those predictions that have been
generated directly within the mvgam() model. They can be misleading if
the model included flexible dynamic trend components. For a broader range
of posterior checks that are created using unconditional "new data"
predictions, see pp_check.mvgam
A base R graphics plot showing either a posterior rootogram
(for type == 'rootogram'), the predicted vs observed mean for the
series (for type == 'mean'), predicted vs observed proportion of
zeroes for the series (for type == 'prop_zero'), predicted vs
observed histogram for the series (for type == 'hist'), kernel
density or empirical CDF estimates for posterior predictions (for
type == 'density' or type == 'cdf') or a Probability
Integral Transform histogram (for type == 'pit').
Nicholas J Clark
## Not run: # Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim( 1, n = 200, scale = 2 ) mod <- mvgam( y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Posterior checks ppc(mod, type = "hist") ppc(mod, type = "density") ppc(mod, type = "cdf") # Many more options are available with pp_check() pp_check(mod) pp_check(mod, type = "ecdf_overlay") pp_check(mod, type = "freqpoly") ## End(Not run)## Not run: # Simulate some smooth effects and fit a model set.seed(0) dat <- mgcv::gamSim( 1, n = 200, scale = 2 ) mod <- mvgam( y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, family = gaussian(), chains = 2, silent = 2 ) # Posterior checks ppc(mod, type = "hist") ppc(mod, type = "density") ppc(mod, type = "cdf") # Many more options are available with pp_check() pp_check(mod) pp_check(mod, type = "ecdf_overlay") pp_check(mod, type = "freqpoly") ## End(Not run)
Predict from a fitted mvgam model
## S3 method for class 'mvgam' predict( object, newdata, data_test, type = "link", process_error = FALSE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )## S3 method for class 'mvgam' predict( object, newdata, data_test, type = "link", process_error = FALSE, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
|
newdata |
Optional |
data_test |
Deprecated. Still works in place of |
type |
When this has the value |
process_error |
Logical. If |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
Ignored |
Note that if your model included a latent temporal trend (i.e. if
you used something other than "None" for the trend_model argument), the
predictions returned by this function will ignore autocorrelation
coefficients or GP length scale coefficients by assuming the process is
stationary. This approach is similar to how predictions are computed from
other types of regression models that can include correlated residuals,
ultimately treating the temporal dynamics as random effect nuisance
parameters. The predict function is therefore more suited to
scenario-based posterior simulation from the GAM components of a
mvgam model, while the hindcast / forecast functions
hindcast.mvgam() and forecast.mvgam() are better suited to generate
predictions that respect the temporal dynamics of estimated latent trends
at the actual time points supplied in data and newdata.
Predicted values on the appropriate scale.
If summary = FALSE and type != "terms", the output is a matrix of
dimension n_draw x n_observations containing predicted values for each
posterior draw in object.
If summary = TRUE and type != "terms", the output is an
n_observations x E matrix. The number of summary statistics
E is equal to 2 + length(probs): The Estimate column
contains point estimates (either mean or median depending on argument
robust), while the Est.Error column contains uncertainty
estimates (either standard deviation or median absolute deviation depending
on argument robust). The remaining columns starting with Q
contain quantile estimates as specified via argument probs.
If type = "terms" and summary = FALSE, the output is a named list
containing a separate slot for each effect, with the effects returned as
matrices of dimension n_draw x 1. If summary = TRUE, the output
resembles that from predict.gam when using the call
predict.gam(object, type = "terms", se.fit = TRUE), where mean
contributions from each effect are returned in matrix form while standard
errors (representing the interval: (max(probs) - min(probs)) / 2) are
returned in a separate matrix
Nicholas J Clark
hindcast.mvgam(),
forecast.mvgam(),
fitted.mvgam(),
augment.mvgam()
## Not run: # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(123) simdat <- sim_mvgam( seasonality = 'hierarchical', prop_trend = 0.75, trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality # and AR(1) dynamics mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = AR(), noncentred = TRUE, chains = 2, silent = 2 ) # Generate predictions against observed data preds <- predict( mod1, summary = TRUE ) head(preds) # Generate predictions against test data preds <- predict( mod1, newdata = simdat$data_test, summary = TRUE ) head(preds) # Use plot_predictions(), which relies on predict() # to more easily see how the latent AR(1) dynamics are # being ignored when using predict() plot_predictions( mod1, by = c('time', 'series', 'series'), points = 0.5 ) # Using the hindcast() function will give a more accurate # representation of how the AR(1) processes were estimated to give # accurate predictions to the in-sample training data hc <- hindcast(mod1) plot(hc) + plot(hc, series = 2) + plot(hc, series = 3) ## End(Not run)## Not run: # Simulate 4 time series with hierarchical seasonality # and independent AR1 dynamic processes set.seed(123) simdat <- sim_mvgam( seasonality = 'hierarchical', prop_trend = 0.75, trend_model = AR(), family = gaussian() ) # Fit a model with shared seasonality # and AR(1) dynamics mod1 <- mvgam( y ~ s(season, bs = 'cc', k = 6), data = simdat$data_train, family = gaussian(), trend_model = AR(), noncentred = TRUE, chains = 2, silent = 2 ) # Generate predictions against observed data preds <- predict( mod1, summary = TRUE ) head(preds) # Generate predictions against test data preds <- predict( mod1, newdata = simdat$data_test, summary = TRUE ) head(preds) # Use plot_predictions(), which relies on predict() # to more easily see how the latent AR(1) dynamics are # being ignored when using predict() plot_predictions( mod1, by = c('time', 'series', 'series'), points = 0.5 ) # Using the hindcast() function will give a more accurate # representation of how the AR(1) processes were estimated to give # accurate predictions to the in-sample training data hc <- hindcast(mod1) plot(hc) + plot(hc, series = 2) + plot(hc, series = 3) ## End(Not run)
This function takes a fitted mvgam or jsdgam object and prints
a quick summary.
## S3 method for class 'mvgam' print(x, ...)## S3 method for class 'mvgam' print(x, ...)
x |
|
... |
Ignored |
A brief summary of the model's call is printed
A list is printed on-screen
Nicholas J Clark
Print method for mvgam_summary objects
## S3 method for class 'mvgam_summary' print(x, ...)## S3 method for class 'mvgam_summary' print(x, ...)
x |
An object of class |
... |
Additional arguments (ignored) |
Invisibly returns the input object after printing
Set up piecewise linear or logistic trend models in mvgam. These
functions do not evaluate their arguments – they exist purely to help set up
a model with particular piecewise trend models.
PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = "linear" )PW( n_changepoints = 10, changepoint_range = 0.8, changepoint_scale = 0.05, growth = "linear" )
n_changepoints |
A non-negative integer specifying the number of
potential changepoints. Potential changepoints are selected uniformly from
the first |
changepoint_range |
Proportion of history in |
changepoint_scale |
Parameter modulating the flexibility of the
automatic changepoint selection by altering the scale parameter of a
Laplace distribution. The resulting prior will be
|
growth |
Character string specifying either |
Offsets and intercepts:
For each of these trend models, an offset parameter is included in the trend
estimation process. This parameter will be incredibly difficult to identify
if you also include an intercept in the observation formula. For that
reason, it is highly recommended that you drop the intercept from the
formula (i.e. y ~ x + 0 or y ~ x - 1, where x are your optional
predictor terms).
Logistic growth and the cap variable: When forecasting growth, there is often some maximum achievable point that a time series can reach. For example, total market size, total population size or carrying capacity in population dynamics. It can be advantageous for the forecast to saturate at or near this point so that predictions are more sensible.
This function allows you to make forecasts using a logistic growth trend
model, with a specified carrying capacity. Note that this capacity does not
need to be static over time; it can vary with each series × timepoint
combination if necessary. But you must supply a cap value for each
observation in the data when using growth = 'logistic'.
For observation families that use a non-identity link function, the cap
value will be internally transformed to the link scale (i.e. your specified
cap will be log-transformed if you are using a poisson() or nb()
family). It is therefore important that you specify the cap values on the
scale of your outcome. Note also that no missing values are allowed in
cap.
An object of class mvgam_trend, which contains a list of
arguments to be interpreted by the parsing functions in mvgam.
Nicholas J Clark
Taylor, Sean J., and Benjamin Letham. "Forecasting at scale." The American Statistician 72.1 (2018): 37–45.
## Not run: # Example of logistic growth with possible changepoints dNt = function(r, N, k) { r * N * (k - N) } Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { if (i %in% c(5, 15, 25, 41, 45, 60, 80)) { N[i + 1] <- max( 1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k) ) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') y <- rpois(100, expected) plot(y, xlab = 'Time') mod_data <- data.frame( y = y, time = 1:100, cap = 35, series = as.factor('series_1') ) plot_mvgam_series(data = mod_data) mod <- mvgam( y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data, chains = 2, silent = 2 ) summary(mod) hc <- hindcast(mod) plot(hc) library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs( y = 'Potential changepoint', x = 'Rate change' ) how_to_cite(mod) ## End(Not run)## Not run: # Example of logistic growth with possible changepoints dNt = function(r, N, k) { r * N * (k - N) } Nt = function(r, N, t, k) { for (i in 1:(t - 1)) { if (i %in% c(5, 15, 25, 41, 45, 60, 80)) { N[i + 1] <- max( 1, N[i] + dNt(r + runif(1, -0.1, 0.1), N[i], k) ) } else { N[i + 1] <- max(1, N[i] + dNt(r, N[i], k)) } } N } set.seed(11) expected <- Nt(0.004, 2, 100, 30) plot(expected, xlab = 'Time') y <- rpois(100, expected) plot(y, xlab = 'Time') mod_data <- data.frame( y = y, time = 1:100, cap = 35, series = as.factor('series_1') ) plot_mvgam_series(data = mod_data) mod <- mvgam( y ~ 0, trend_model = PW(growth = 'logistic'), family = poisson(), data = mod_data, chains = 2, silent = 2 ) summary(mod) hc <- hindcast(mod) plot(hc) library(ggplot2) mcmc_plot(mod, variable = 'delta_trend', regex = TRUE) + scale_y_discrete(labels = mod$trend_model$changepoints) + labs( y = 'Potential changepoint', x = 'Rate change' ) how_to_cite(mod) ## End(Not run)
Compute residual correlation estimates from Joint Species Distribution
(jsdgam) or mvgam models that either used latent factors
or included correlated process errors directly
residual_cor(object, ...) ## S3 method for class 'mvgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) ## S3 method for class 'jsdgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )residual_cor(object, ...) ## S3 method for class 'mvgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... ) ## S3 method for class 'jsdgam' residual_cor( object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ... )
object |
|
... |
ignored |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
See mvgam_residcor-class for a description of the quantities
that are computed and returned by this function, along with key references.
If summary = TRUE, a list of
mvgam_residcor-class with the following components:
cor, cor_lower, cor_upper
|
A set of |
sig_cor |
A |
prec, prec_lower, prec_upper
|
A set of |
sig_prec |
A |
cov |
A |
trace |
The median/mean point estimator of the trace (sum of the
diagonal elements) of the residual covariance matrix |
If summary = FALSE, this function returns a list containing the
following components:
all_cormat |
A |
all_covmat |
A |
all_presmat |
A |
all_trace |
A |
Hui, F. K. C. (2016). boral – Bayesian Ordination and Regression Analysis of Multivariate Abundance Data in r. Methods in Ecology and Evolution, 7(6), 744-750. doi:10.1111/2041-210X.12514
jsdgam(), lv_correlations(), mvgam_residcor-class
## Not run: # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot residual correlations plot( residual_cor(mod) ) # Compare to a residual ordination biplot if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)## Not run: # Fit a JSDGAM to the portal_data captures mod <- jsdgam( formula = captures ~ # Fixed effects of NDVI and mintemp, row effect as a GP of time ndvi_ma12:series + mintemp:series + gp(time, k = 15), factor_formula = ~ -1, data = portal_data, unit = time, species = series, family = poisson(), n_lv = 2, silent = 2, chains = 2 ) # Plot residual correlations plot( residual_cor(mod) ) # Compare to a residual ordination biplot if(requireNamespace('ggrepel', quietly = TRUE)){ ordinate(mod) } # Not needed for general use; cleans up connections for automated testing closeAllConnections() ## End(Not run)
This method extracts posterior draws of Dunn-Smyth (randomized quantile) residuals in the order in which the data were supplied to the model. It includes additional arguments for obtaining summaries of the computed residuals.
## S3 method for class 'mvgam' residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)## S3 method for class 'mvgam' residuals(object, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)
object |
An object of class |
summary |
Should summary statistics be returned
instead of the raw values? Default is |
robust |
If |
probs |
The percentiles to be computed by the |
... |
Ignored |
This method gives residuals as Dunn-Smyth (randomized quantile)
residuals. Any observations that were missing (i.e. NA) in the original
data will have missing values in the residuals.
An array of randomized quantile residual values.
If summary = FALSE the output resembles those of
posterior_epred.mvgam and predict.mvgam.
If summary = TRUE the output is an n_observations x E
matrix. The number of summary statistics E is equal to 2 +
length(probs). The Estimate column contains point estimates (either
mean or median depending on argument robust), while the
Est.Error column contains uncertainty estimates (either standard
deviation or median absolute deviation depending on argument
robust). The remaining columns starting with Q contain
quantile estimates as specified via argument probs.
Nicholas J Clark
## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract posterior residuals resids <- residuals(mod) str(resids) # Or add them directly to the observed data, along with fitted values augment(mod, robust = FALSE, probs = c(0.25, 0.75)) ## End(Not run)## Not run: # Simulate some data and fit a model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2, silent = 2 ) # Extract posterior residuals resids <- residuals(mod) str(resids) # Or add them directly to the observed data, along with fitted values augment(mod, robust = FALSE, probs = c(0.25, 0.75)) ## End(Not run)
Set up autoregressive or autoregressive moving average trend models in mvgam. These functions do not evaluate their arguments – they exist purely to help set up a model with particular autoregressive trend models.
RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) CAR(p = 1) VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)RW(ma = FALSE, cor = FALSE, gr = NA, subgr = NA) AR(p = 1, ma = FALSE, cor = FALSE, gr = NA, subgr = NA) CAR(p = 1) VAR(ma = FALSE, cor = FALSE, gr = NA, subgr = NA)
ma |
|
cor |
|
gr |
An optional grouping variable, which must be a
where When used within a |
subgr |
A subgrouping For example, if you are modelling temporal counts for a group of species
(labelled as
|
p |
A non-negative integer specifying the autoregressive (AR) order.
Default is |
Use vignette("mvgam_overview") to see the full details of
available stochastic trend types in mvgam, or view the rendered
version on the package website at:
https://nicholasjclark.github.io/mvgam/articles/mvgam_overview.html
An object of class mvgam_trend, which contains a list of
arguments to be interpreted by the parsing functions in mvgam.
Nicholas J Clark
## Not run: # A short example to illustrate CAR(1) models # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 125, phi = 0.5, sigma = 2, sigma_obs = 0.75) { # Sample irregularly spaced time intervals time_dis <- c(1, runif(n - 1, 0, 5)) # Set up the latent dynamic process x <- vector(length = n); x[1] <- -0.3 for (i in 2:n) { # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if (time_dis[i] == 0) { x[i] <- rnorm( 1, mean = (phi^1e-3) * x[i - 1], sd = sigma * (1 - phi^(2 * 1e-3)) / (1 - phi^2) ) } else { x[i] <- rnorm( 1, mean = (phi^time_dis[i]) * x[i - 1], sd = sigma * (1 - phi^(2 * time_dis[i])) / (1 - phi^2) ) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1:n) / 12) cov2 <- cos(2 * pi * (1:n) / 12) beta1 <- runif(1, 0.3, 0.7) beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame( y = rnorm(n, mean = x + seasonality, sd = sigma_obs), season = rep(1:12, 20)[1:n], time = cumsum(time_dis) ) } # Sample two time series dat <- rbind( dplyr::bind_cols( sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1') ), dplyr::bind_cols( sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2') ) ) %>% dplyr::mutate(series = as.factor(series)) # mvgam with CAR(1) trends and series-level seasonal smooths mod <- mvgam( formula = y ~ -1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), priors = c( prior(exponential(3), class = sigma), prior(beta(4, 4), class = sigma_obs) ), data = dat, family = gaussian(), chains = 2, silent = 2 ) # View usual summaries and plots summary(mod) conditional_effects(mod, type = 'expected') plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) mcmc_plot( mod, variable = 'ar1', regex = TRUE, type = 'hist' ) # Now an example illustrating hierarchical dynamics set.seed(123) # Simulate three species monitored in three different regions simdat1 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat2 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat3 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) # Set up the data but DO NOT include 'series' all_dat <- rbind( simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic') ) %>% dplyr::mutate( species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region) ) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) # Check priors for a hierarchical AR1 model get_mvgam_priors( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat ) # Fit the model mod <- mvgam( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat, chains = 2, silent = 2 ) # Check standard outputs summary(mod) # Inspect posterior estimates for the correlation weighting parameter mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') ## End(Not run)## Not run: # A short example to illustrate CAR(1) models # Function to simulate CAR1 data with seasonality sim_corcar1 = function(n = 125, phi = 0.5, sigma = 2, sigma_obs = 0.75) { # Sample irregularly spaced time intervals time_dis <- c(1, runif(n - 1, 0, 5)) # Set up the latent dynamic process x <- vector(length = n); x[1] <- -0.3 for (i in 2:n) { # zero-distances will cause problems in sampling, so mvgam uses a # minimum threshold; this simulation function emulates that process if (time_dis[i] == 0) { x[i] <- rnorm( 1, mean = (phi^1e-3) * x[i - 1], sd = sigma * (1 - phi^(2 * 1e-3)) / (1 - phi^2) ) } else { x[i] <- rnorm( 1, mean = (phi^time_dis[i]) * x[i - 1], sd = sigma * (1 - phi^(2 * time_dis[i])) / (1 - phi^2) ) } } # Add 12-month seasonality cov1 <- sin(2 * pi * (1:n) / 12) cov2 <- cos(2 * pi * (1:n) / 12) beta1 <- runif(1, 0.3, 0.7) beta2 <- runif(1, 0.2, 0.5) seasonality <- beta1 * cov1 + beta2 * cov2 # Take Gaussian observations with error and return data.frame( y = rnorm(n, mean = x + seasonality, sd = sigma_obs), season = rep(1:12, 20)[1:n], time = cumsum(time_dis) ) } # Sample two time series dat <- rbind( dplyr::bind_cols( sim_corcar1(phi = 0.65, sigma_obs = 0.55), data.frame(series = 'series1') ), dplyr::bind_cols( sim_corcar1(phi = 0.8, sigma_obs = 0.35), data.frame(series = 'series2') ) ) %>% dplyr::mutate(series = as.factor(series)) # mvgam with CAR(1) trends and series-level seasonal smooths mod <- mvgam( formula = y ~ -1, trend_formula = ~ s(season, bs = 'cc', k = 5, by = trend), trend_model = CAR(), priors = c( prior(exponential(3), class = sigma), prior(beta(4, 4), class = sigma_obs) ), data = dat, family = gaussian(), chains = 2, silent = 2 ) # View usual summaries and plots summary(mod) conditional_effects(mod, type = 'expected') plot(mod, type = 'trend', series = 1) plot(mod, type = 'trend', series = 2) plot(mod, type = 'residuals', series = 1) plot(mod, type = 'residuals', series = 2) mcmc_plot( mod, variable = 'ar1', regex = TRUE, type = 'hist' ) # Now an example illustrating hierarchical dynamics set.seed(123) # Simulate three species monitored in three different regions simdat1 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat2 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) simdat3 <- sim_mvgam( trend_model = VAR(cor = TRUE), prop_trend = 0.95, n_series = 3, mu = c(1, 2, 3) ) # Set up the data but DO NOT include 'series' all_dat <- rbind( simdat1$data_train %>% dplyr::mutate(region = 'qld'), simdat2$data_train %>% dplyr::mutate(region = 'nsw'), simdat3$data_train %>% dplyr::mutate(region = 'vic') ) %>% dplyr::mutate( species = gsub('series', 'species', series), species = as.factor(species), region = as.factor(region) ) %>% dplyr::arrange(series, time) %>% dplyr::select(-series) # Check priors for a hierarchical AR1 model get_mvgam_priors( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat ) # Fit the model mod <- mvgam( formula = y ~ species, trend_model = AR(gr = region, subgr = species), data = all_dat, chains = 2, silent = 2 ) # Check standard outputs summary(mod) # Inspect posterior estimates for the correlation weighting parameter mcmc_plot(mod, variable = 'alpha_cor', type = 'hist') ## End(Not run)
Compute probabilistic forecast scores for mvgam models
## S3 method for class 'mvgam_forecast' score( object, score = "crps", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...)## S3 method for class 'mvgam_forecast' score( object, score = "crps", log = FALSE, weights, interval_width = 0.9, n_cores = 1, ... ) score(object, ...)
object |
|
score |
|
log |
|
weights |
optional |
interval_width |
proportional value on |
n_cores |
|
... |
Ignored |
A list containing scores and interval coverages per
forecast horizon. If score %in% c('drps', 'crps', 'elpd', 'brier'),
the list will also contain return the sum of all series-level scores
per horizon. If score %in% c('energy','variogram'), no
series-level scores are computed and the only score returned will be
for all series. For all scores apart from elpd and brier, the
in_interval column in each series-level slot is a binary indicator of
whether or not the true value was within the forecast's corresponding
posterior empirical quantiles. Intervals are not calculated when using
elpd because forecasts will only contain the linear predictors
Nicholas J Clark
Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102(477), 359-378. doi:10.1198/016214506000001437
## Not run: # Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam( y ~ 1, trend_model = RW(), data = data$data_train, newdata = data$data_test, chains = 2, silent = 2 ) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) plot(fc) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores) # An example using binary data data <- sim_mvgam(family = bernoulli()) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = data$data_train, newdata = data$data_test, family = bernoulli(), chains = 2, silent = 2 ) # Extract forecasts on the expectation (probability) scale fc <- forecast(mod, type = 'expected') plot(fc) # Compute Brier scores fc_scores <- score(fc, score = 'brier') str(fc_scores) ## End(Not run)## Not run: # Simulate observations for three count-valued time series data <- sim_mvgam() # Fit a dynamic model using 'newdata' to automatically produce forecasts mod <- mvgam( y ~ 1, trend_model = RW(), data = data$data_train, newdata = data$data_test, chains = 2, silent = 2 ) # Extract forecasts into a 'mvgam_forecast' object fc <- forecast(mod) plot(fc) # Compute Discrete Rank Probability Scores and 0.90 interval coverages fc_scores <- score(fc, score = 'drps') str(fc_scores) # An example using binary data data <- sim_mvgam(family = bernoulli()) mod <- mvgam( y ~ s(season, bs = 'cc', k = 6), trend_model = AR(), data = data$data_train, newdata = data$data_test, family = bernoulli(), chains = 2, silent = 2 ) # Extract forecasts on the expectation (probability) scale fc <- forecast(mod, type = 'expected') plot(fc) # Compute Brier scores fc_scores <- score(fc, score = 'brier') str(fc_scores) ## End(Not run)
This function converts univariate or multivariate time series (xts or
ts objects) to the format necessary for mvgam.
series_to_mvgam(series, freq, train_prop = 0.85)series_to_mvgam(series, freq, train_prop = 0.85)
series |
|
freq |
|
train_prop |
|
A list object containing outputs needed for
mvgam, including 'data_train' and 'data_test'
# A ts object example data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) series_to_mvgam(series, frequency(series), 0.85) # An xts object example library(xts) dates <- seq(as.Date("2001-05-01"), length = 30, by = "quarter") data <- cbind( c(gas = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))) ) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) series_to_mvgam(series, freq = 4, train_prop = 0.85)# A ts object example data("sunspots") series <- cbind(sunspots, sunspots) colnames(series) <- c('blood', 'bone') head(series) series_to_mvgam(series, frequency(series), 0.85) # An xts object example library(xts) dates <- seq(as.Date("2001-05-01"), length = 30, by = "quarter") data <- cbind( c(gas = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))), c(oil = rpois(30, cumprod(1 + rnorm(30, mean = 0.01, sd = 0.001)))) ) series <- xts(x = data, order.by = dates) colnames(series) <- c('gas', 'oil') head(series) series_to_mvgam(series, freq = 4, train_prop = 0.85)
This function simulates sets of time series data for fitting a multivariate GAM that includes shared seasonality and dependence on State-Space latent dynamic factors. Random dependencies among series, i.e. correlations in their long-term trends, are included in the form of correlated loadings on the latent dynamic factors
sim_mvgam( T = 100, n_series = 3, seasonality = "shared", use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 )sim_mvgam( T = 100, n_series = 3, seasonality = "shared", use_lv = FALSE, n_lv = 0, trend_model = RW(), drift = FALSE, prop_trend = 0.2, trend_rel, freq = 12, family = poisson(), phi, shape, sigma, nu, mu, prop_missing = 0, prop_train = 0.85 )
T |
|
n_series |
|
seasonality |
|
use_lv |
|
n_lv |
|
trend_model |
See mvgam_trends for more details |
drift |
|
prop_trend |
|
trend_rel |
Deprecated. Use |
freq |
|
family |
|
phi |
|
shape |
|
sigma |
|
nu |
|
mu |
|
prop_missing |
|
prop_train |
|
A list object containing outputs needed for
mvgam, including 'data_train' and 'data_test', as well
as some additional information about the simulated seasonality and
trend dependencies
Clark, N. J. and Wells, K. (2022). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series. Methods in Ecology and Evolution, 13(11), 2388-2404. doi:10.1111/2041-210X.13974
# Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series(data = sim_data$data_train, series = 'all') # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series(data = sim_data$data_train, series = 'all')# Simulate series with observations bounded at 0 and 1 (Beta responses) sim_data <- sim_mvgam( family = betar(), trend_model = RW(), prop_trend = 0.6 ) plot_mvgam_series(data = sim_data$data_train, series = 'all') # Now simulate series with overdispersed discrete observations sim_data <- sim_mvgam( family = nb(), trend_model = RW(), prop_trend = 0.6, phi = 10 ) plot_mvgam_series(data = sim_data$data_train, series = 'all')
Compute reactivity, return rates and contributions of interactions to stationary forecast variance from mvgam models with Vector Autoregressive dynamics.
stability(object, ...) ## S3 method for class 'mvgam' stability(object, ...)stability(object, ...) ## S3 method for class 'mvgam' stability(object, ...)
object |
|
... |
Ignored |
These measures of stability can be used to assess how important inter-series dependencies are to the variability of a multivariate system and to ask how systems are expected to respond to environmental perturbations. Using the formula for a latent VAR(1) as:
this function will calculate the long-term stationary forecast distribution
of the system, which has mean and variance
, to then calculate the following quantities:
prop_int: Proportion of the volume of the stationary forecast
distribution that is attributable to lagged interactions:
\item `prop_int_adj`: Same as `prop_int` but scaled by the number of
series \eqn{p}:
\deqn{ det(A)^{2/p} }
\item `prop_int_offdiag`: Sensitivity of `prop_int` to inter-series
interactions (off-diagonals of \eqn{A}):
\deqn{ [2~det(A) (A^{-1})^T] }
\item `prop_int_diag`: Sensitivity of `prop_int` to intra-series
interactions (diagonals of \eqn{A}):
\deqn{ [2~det(A) (A^{-1})^T] }
\item `prop_cov_offdiag`: Sensitivity of \eqn{\Sigma_{\infty}} to
inter-series error correlations:
\deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] }
\item `prop_cov_diag`: Sensitivity of \eqn{\Sigma_{\infty}} to error
variances:
\deqn{ [2~det(\Sigma_{\infty}) (\Sigma_{\infty}^{-1})^T] }
\item `reactivity`: Degree to which the system moves away from a stable
equilibrium following a perturbation. If \eqn{\sigma_{max}(A)} is the
largest singular value of \eqn{A}:
\deqn{ \log\sigma_{max}(A) }
\item `mean_return_rate`: Asymptotic return rate of the mean of the
transition distribution to the stationary mean:
\deqn{ \max(\lambda_{A}) }
\item `var_return_rate`: Asymptotic return rate of the variance of the
transition distribution to the stationary variance:
\deqn{ \max(\lambda_{A \otimes A}) }
Major advantages of using mvgam to compute these metrics are that well-calibrated uncertainties are available and that VAR processes are forced to be stationary. These properties make it simple and insightful to calculate and inspect aspects of both long-term and short-term stability.
You can also inspect interactions among the time series in a latent VAR
process using irf for impulse response functions or
fevd for forecast error variance decompositions.
A data.frame containing posterior draws for each stability
metric.
Nicholas J Clark
AR Ives, B Dennis, KL Cottingham & SR Carpenter (2003). Estimating community stability and ecological interactions from time-series data. Ecological Monographs, 73, 301–330.
## Not run: # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam( y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Calculate stability metrics for this system metrics <- stability(mod) # Proportion of stationary forecast distribution attributable to interactions hist( metrics$prop_int, xlim = c(0, 1), xlab = 'Prop_int', main = '', col = '#B97C7C', border = 'white' ) # Inter- vs intra-series interaction contributions layout(matrix(1:2, nrow = 2)) hist( metrics$prop_int_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series interactions', col = '#B97C7C', border = 'white' ) hist( metrics$prop_int_diag, xlim = c(0, 1), xlab = 'Contribution to interaction effect', main = 'Intra-series interactions (density dependence)', col = 'darkblue', border = 'white' ) layout(1) # Inter- vs intra-series contributions to forecast variance layout(matrix(1:2, nrow = 2)) hist( metrics$prop_cov_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series covariances', col = '#B97C7C', border = 'white' ) hist( metrics$prop_cov_diag, xlim = c(0, 1), xlab = 'Contribution to forecast variance', main = 'Intra-series variances', col = 'darkblue', border = 'white' ) layout(1) # Reactivity: system response to perturbation hist( metrics$reactivity, main = '', xlab = 'Reactivity', col = '#B97C7C', border = 'white', xlim = c( -1 * max(abs(metrics$reactivity)), max(abs(metrics$reactivity)) ) ) abline(v = 0, lwd = 2.5) ## End(Not run)## Not run: # Simulate some time series that follow a latent VAR(1) process simdat <- sim_mvgam( family = gaussian(), n_series = 4, trend_model = VAR(cor = TRUE), prop_trend = 1 ) plot_mvgam_series(data = simdat$data_train, series = 'all') # Fit a model that uses a latent VAR(1) mod <- mvgam( y ~ -1, trend_formula = ~ 1, trend_model = VAR(cor = TRUE), family = gaussian(), data = simdat$data_train, chains = 2, silent = 2 ) # Calculate stability metrics for this system metrics <- stability(mod) # Proportion of stationary forecast distribution attributable to interactions hist( metrics$prop_int, xlim = c(0, 1), xlab = 'Prop_int', main = '', col = '#B97C7C', border = 'white' ) # Inter- vs intra-series interaction contributions layout(matrix(1:2, nrow = 2)) hist( metrics$prop_int_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series interactions', col = '#B97C7C', border = 'white' ) hist( metrics$prop_int_diag, xlim = c(0, 1), xlab = 'Contribution to interaction effect', main = 'Intra-series interactions (density dependence)', col = 'darkblue', border = 'white' ) layout(1) # Inter- vs intra-series contributions to forecast variance layout(matrix(1:2, nrow = 2)) hist( metrics$prop_cov_offdiag, xlim = c(0, 1), xlab = '', main = 'Inter-series covariances', col = '#B97C7C', border = 'white' ) hist( metrics$prop_cov_diag, xlim = c(0, 1), xlab = 'Contribution to forecast variance', main = 'Intra-series variances', col = 'darkblue', border = 'white' ) layout(1) # Reactivity: system response to perturbation hist( metrics$reactivity, main = '', xlab = 'Reactivity', col = '#B97C7C', border = 'white', xlim = c( -1 * max(abs(metrics$reactivity)), max(abs(metrics$reactivity)) ) ) abline(v = 0, lwd = 2.5) ## End(Not run)
These functions take a fitted mvgam or jsdgam object and
return various useful summaries
## S3 method for class 'mvgam' summary(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...) ## S3 method for class 'mvgam_prefit' summary(object, ...) ## S3 method for class 'mvgam' coef(object, summarise = TRUE, ...)## S3 method for class 'mvgam' summary(object, include_betas = TRUE, smooth_test = TRUE, digits = 2, ...) ## S3 method for class 'mvgam_prefit' summary(object, ...) ## S3 method for class 'mvgam' coef(object, summarise = TRUE, ...)
object |
|
include_betas |
Logical. Print a summary that includes posterior
summaries of all linear predictor beta coefficients (including spline
coefficients)? Defaults to |
smooth_test |
Logical. Compute estimated degrees of freedom and
approximate p-values for smooth terms? Defaults to |
digits |
The number of significant digits for printing out the summary;
defaults to |
... |
Ignored |
summarise |
|
summary.mvgam and summary.mvgam_prefit return brief summaries of
the model's call, along with posterior intervals for some of the key
parameters in the model. Note that some smooths have extra penalties on the
null space, so summaries for the rho parameters may include more
penalty terms than the number of smooths in the original model formula.
Approximate p-values for smooth terms are also returned, with methods used
for their calculation following those used for mgcv equivalents (see
summary.gam for details). The Estimated Degrees of
Freedom (edf) for smooth terms is computed using either edf.type = 1 for
models with no trend component, or edf.type = 0 for models with trend
components. These are described in the documentation for
jagam. Experiments suggest these p-values tend to be
more conservative than those that might be returned from an equivalent model
fit with summary.gam using method = 'REML'
coef.mvgam returns either summaries or full posterior estimates for GAM
component coefficients
For summary.mvgam, an object of class mvgam_summary containing:
model_spec: Model specification details (formulas, family, dimensions)
parameters: Parameter estimates and significance tests
diagnostics: MCMC convergence diagnostics
sampling_info: Sampling algorithm details
For summary.mvgam_prefit, a list is printed on-screen showing
the model specifications
For coef.mvgam, either a matrix of posterior coefficient
distributions (if summarise == FALSE or data.frame of
coefficient summaries)
Nicholas J Clark
## Not run: simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) mod_summary <- summary(mod) mod_summary ## End(Not run)## Not run: simdat <- sim_mvgam(seasonality = "hierarchical") mod <- mvgam( y ~ series + s(season, bs = "cc", k = 6) + s(season, series, bs = "fs", k = 4), data = simdat$data_train, chains = 2, silent = 2 ) mod_summary <- summary(mod) mod_summary ## End(Not run)
This function takes an mvgam_fevd object and calculates
a posterior summary of the error variance decompositions of each series,
at all horizons
## S3 method for class 'mvgam_fevd' summary(object, probs = c(0.025, 0.975), ...)## S3 method for class 'mvgam_fevd' summary(object, probs = c(0.025, 0.975), ...)
object |
an object of class |
probs |
The upper and lower percentiles to be computed by the
|
... |
ignored |
A long-format tibble / data.frame reporting the posterior median,
upper and lower percentiles of the error variance decompositions of each
series at all horizons.
Nicholas J Clark
This function takes an mvgam_forecast object and
calculates a posterior summary of the hindcast and forecast distributions
of each series, along with any true values that were included in data
and newdata if type = 'response' was used in the call to
hindcast() or function()
## S3 method for class 'mvgam_forecast' summary(object, probs = c(0.025, 0.975), ...)## S3 method for class 'mvgam_forecast' summary(object, probs = c(0.025, 0.975), ...)
object |
an object of class |
probs |
The upper and lower percentiles to be computed by the
|
... |
ignored |
A long-format tibble / data.frame reporting the posterior median,
upper and lower percentiles of the predictions for each series at each of
the timepoints that were originally supplied in data and, optionally,
in newdata.
Nicholas J Clark
forecast.mvgam, plot.mvgam_forecast
This function takes an mvgam_irf object and
calculates a posterior summary of the impulse responses of each
series to shocks from each of the other series, at all horizons
## S3 method for class 'mvgam_irf' summary(object, probs = c(0.025, 0.975), ...)## S3 method for class 'mvgam_irf' summary(object, probs = c(0.025, 0.975), ...)
object |
an object of class |
probs |
The upper and lower percentiles to be computed by the
|
... |
ignored |
A long-format tibble / data.frame reporting the posterior median,
upper and lower percentiles of the impulse responses of each series to
shocks from each of the other series at all horizons.
Nicholas J Clark
mvgam object's parameter posteriorsGet parameters' posterior statistics, implementing the generic tidy from
the package broom.
## S3 method for class 'mvgam' tidy(x, probs = c(0.025, 0.5, 0.975), ...)## S3 method for class 'mvgam' tidy(x, probs = c(0.025, 0.5, 0.975), ...)
x |
An object of class |
probs |
The desired probability levels of the parameters' posteriors.
Defaults to |
... |
Unused, included for generic consistency only. |
The parameters are categorized by the column "type". For instance, the
intercept of the observation model (i.e. the "formula" arg to mvgam()) has
the "type" "observation_beta". The possible "type"s are:
observation_family_extra_param: any extra parameters for your observation model, e.g. sigma for a gaussian observation model. These parameters are not directly derived from the latent trend components (contrast to mu).
observation_beta: betas from your observation model, excluding any
smooths. If your formula was y ~ x1 + s(x2, bs='cr'), then your
intercept and x1's beta would be categorized as this.
random_effect_group_level: Group-level random effects parameters, i.e. the mean and sd of the distribution from which the specific random intercepts/slopes are considered to be drawn from.
random_effect_beta: betas for the individual random intercepts/slopes.
trend_model_param: parameters from your trend_model.
trend_beta: analog of "observation_beta", but for any trend_formula.
trend_random_effect_group_level: analog of
"random_effect_group_level", but for any trend_formula.
trend_random_effect_beta: analog of "random_effect_beta", but for any
trend_formula.
Additionally, GP terms can be incorporated in several ways, leading to different "type"s (or absence!):
s(bs = "gp"): No parameters returned.
gp() in formula: "type" of "observation_param".
gp() in trend_formula: "type" of "trend_formula_param".
GP() in trend_model: "type" of "trend_model_param".
A tibble containing:
"parameter": The parameter in question.
"type": The component of the model that the parameter belongs to (see details).
"mean": The posterior mean.
"sd": The posterior standard deviation.
percentile(s): Any percentiles of interest from these posteriors.
Other tidiers:
augment.mvgam()
## Not run: set.seed(0) simdat <- sim_mvgam( T = 100, n_series = 3, trend_model = AR(), prop_trend = 0.75, family = gaussian() ) simdat$data_train$x <- rnorm(nrow(simdat$data_train)) simdat$data_train$year_fac <- factor(simdat$data_train$year) mod <- mvgam( y ~ -1 + s(time, by = series, bs = 'cr', k = 20) + x, trend_formula = ~ s(year_fac, bs = 're') - 1, trend_model = AR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2 ) tidy(mod, probs = c(0.2, 0.5, 0.8)) ## End(Not run)## Not run: set.seed(0) simdat <- sim_mvgam( T = 100, n_series = 3, trend_model = AR(), prop_trend = 0.75, family = gaussian() ) simdat$data_train$x <- rnorm(nrow(simdat$data_train)) simdat$data_train$year_fac <- factor(simdat$data_train$year) mod <- mvgam( y ~ -1 + s(time, by = series, bs = 'cr', k = 20) + x, trend_formula = ~ s(year_fac, bs = 're') - 1, trend_model = AR(cor = TRUE), family = gaussian(), data = simdat$data_train, silent = 2 ) tidy(mod, probs = c(0.2, 0.5, 0.8)) ## End(Not run)
This function allows a previously fitted mvgam model to be updated.
## S3 method for class 'mvgam' update( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) ## S3 method for class 'jsdgam' update( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... )## S3 method for class 'mvgam' update( object, formula, trend_formula, knots, trend_knots, trend_model, family, share_obs_params, data, newdata, trend_map, use_lv, n_lv, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... ) ## S3 method for class 'jsdgam' update( object, formula, factor_formula, knots, factor_knots, data, newdata, n_lv, family, share_obs_params, priors, chains, burnin, samples, threads, algorithm, lfo = FALSE, ... )
object |
|
formula |
Optional new |
trend_formula |
An optional Important notes:
|
knots |
An optional |
trend_knots |
As for |
trend_model |
Available options:
Additional features:
|
family |
Supported families:
See |
share_obs_params |
|
data |
A Required columns for most models:
Special cases:
|
newdata |
Optional |
trend_map |
Optional Required structure:
Notes:
|
use_lv |
|
n_lv |
|
priors |
An optional |
chains |
|
burnin |
|
samples |
|
threads |
|
algorithm |
Character string naming the estimation approach:
Can be set globally via |
lfo |
|
... |
|
factor_formula |
Optional new |
factor_knots |
An optional |
A list object of class mvgam containing model output,
the text representation of the model file, the mgcv model output (for
easily generating simulations at unsampled covariate values), Dunn-Smyth
residuals for each outcome variable and key information needed for other
functions in the package. See mvgam-class for details. Use
methods(class = "mvgam") for an overview on available methods.
A list object of class mvgam containing model output,
the text representation of the model file, the mgcv model output (for
easily generating simulations at unsampled covariate values), Dunn-Smyth
residuals for each series and key information needed for other functions in
the package. See mvgam-class for details. Use
methods(class = "mvgam") for an overview on available methods.
Nicholas J Clark
## Not run: # Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) summary(mod) conditional_effects(mod, type = 'link') # Update to an AR2 model updated_mod <- update( mod, trend_model = AR(p = 2), noncentred = TRUE ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') # Now update to a Binomial AR1 by adding information on trials # requires that we supply newdata that contains the 'trials' variable simdat$data_train$trials <- max(simdat$data_train$y) + 15 updated_mod <- update( mod, formula = cbind(y, trials) ~ s(season, bs = 'cc'), noncentred = TRUE, data = simdat$data_train, family = binomial() ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') ## End(Not run)## Not run: # Simulate some data and fit a Poisson AR1 model simdat <- sim_mvgam(n_series = 1, trend_model = AR()) mod <- mvgam( y ~ s(season, bs = 'cc'), trend_model = AR(), noncentred = TRUE, data = simdat$data_train, chains = 2 ) summary(mod) conditional_effects(mod, type = 'link') # Update to an AR2 model updated_mod <- update( mod, trend_model = AR(p = 2), noncentred = TRUE ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') # Now update to a Binomial AR1 by adding information on trials # requires that we supply newdata that contains the 'trials' variable simdat$data_train$trials <- max(simdat$data_train$y) + 15 updated_mod <- update( mod, formula = cbind(y, trials) ~ s(season, bs = 'cc'), noncentred = TRUE, data = simdat$data_train, family = binomial() ) summary(updated_mod) conditional_effects(updated_mod, type = 'link') ## End(Not run)
Set up latent correlated multivariate Gaussian residual processes in mvgam. This function does not evaluate its arguments – it exists purely to help set up a model with particular error processes
ZMVN(unit = time, gr = NA, subgr = series)ZMVN(unit = time, gr = NA, subgr = series)
unit |
The unquoted name of the variable that represents the unit of
analysis in |
gr |
An optional grouping variable, which must be a
where |
subgr |
A subgrouping Models that use the hierarchical correlations (by supplying a value for
For example, if you are modelling counts for a group of species (labelled
as Internally,
|
An object of class mvgam_trend, which contains a list of
arguments to be interpreted by the parsing functions in mvgam
## Not run: # Simulate counts of four species over ten sampling locations site_dat <- data.frame( site = rep(1:10, 4), species = as.factor(sort(rep(letters[1:4], 10))), y = c(NA, rpois(39, 3)) ) head(site_dat) # Set up a correlated residual (i.e. Joint Species Distribution) model trend_model <- ZMVN(unit = site, subgr = species) mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') # A hierarchical correlation example Sigma <- matrix( c(1, -0.4, 0.5, -0.4, 1, 0.3, 0.5, 0.3, 1), byrow = TRUE, nrow = 3 ) make_site_dat <- function(...) { errors <- mgcv::rmvn( n = 30, mu = c(0.6, 0.8, 1.8), V = Sigma ) site_dat <- do.call(rbind, lapply(1:3, function(spec) { data.frame( y = rpois(30, lambda = exp(errors[, spec])), species = paste0('species', spec), site = 1:30 ) })) site_dat } site_dat <- rbind( make_site_dat() %>% dplyr::mutate(group = 'group1'), make_site_dat() %>% dplyr::mutate(group = 'group2') ) %>% dplyr::mutate( species = as.factor(species), group = as.factor(group) ) # Fit the hierarchical correlated residual model mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, gr = group, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') ## End(Not run)## Not run: # Simulate counts of four species over ten sampling locations site_dat <- data.frame( site = rep(1:10, 4), species = as.factor(sort(rep(letters[1:4], 10))), y = c(NA, rpois(39, 3)) ) head(site_dat) # Set up a correlated residual (i.e. Joint Species Distribution) model trend_model <- ZMVN(unit = site, subgr = species) mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') # A hierarchical correlation example Sigma <- matrix( c(1, -0.4, 0.5, -0.4, 1, 0.3, 0.5, 0.3, 1), byrow = TRUE, nrow = 3 ) make_site_dat <- function(...) { errors <- mgcv::rmvn( n = 30, mu = c(0.6, 0.8, 1.8), V = Sigma ) site_dat <- do.call(rbind, lapply(1:3, function(spec) { data.frame( y = rpois(30, lambda = exp(errors[, spec])), species = paste0('species', spec), site = 1:30 ) })) site_dat } site_dat <- rbind( make_site_dat() %>% dplyr::mutate(group = 'group1'), make_site_dat() %>% dplyr::mutate(group = 'group2') ) %>% dplyr::mutate( species = as.factor(species), group = as.factor(group) ) # Fit the hierarchical correlated residual model mod <- mvgam( y ~ species, trend_model = ZMVN(unit = site, gr = group, subgr = species), data = site_dat, chains = 2, silent = 2 ) # Inspect the estimated species-species residual covariances mcmc_plot(mod, variable = 'Sigma', regex = TRUE, type = 'hist') ## End(Not run)