Tutorial

Jair Andrade

2024-04-22

1 Overview

In this tutorial, I will guide you through the steps to perform statisticalinference on a compartmental model, using thereadsdr package as a bridgingtool to streamline the process. Specifically, I will use theSusceptible-Exposed-Infectious-Recovered (SEIR) model as an example. Thistutorial assumes familiarity with System Dynamics (SD) software (e.g., Stella)and concepts related to ordinary differential equation (ODE) models andstatistics.

library(cmdstanr)library(dplyr)library(ggplot2)library(gt)library(posterior)library(readr)library(readsdr) # Development versionlibrary(stringr)library(tidyr)

2 Model

Previously, the SEIR model was built in Stella. You can download ithere.Let’s read the SEIR intoR with theread_xmile function and see itsinitial conditions (sd_stocks) and constant values (sd_constants).

model_path <- "https://jandraor.github.io/models/SEIR_standard.stmx"mdl        <- read_xmile(model_path)

stocks_df <- sd_stocks(mdl)stocks_df

##   name init_value## 1    S       9999## 2    E          0## 3    I          1## 4    R          0## 5    C          0

constants_df <- sd_constants(mdl) |>   mutate(value = format(value, scientific = FALSE))constants_df

##        name    value## 1  par_beta     1.00## 2         N 10000.00## 3 par_sigma     0.50## 4 par_gamma     0.50## 5   par_rho     0.75## 6        R0     0.00## 7        I0     1.00

3 Synthetic data

Briefly, a Data Generating Process can be thought of as the amalgamation of aprocess and a measurement component. The process component corresponds to theODE model, whereas the measurement component concerns the structure thataccounts for the discrepancies between model prediction and actual data. In thiscase, the measurement component is formulated in terms of a Negative Binomialdistribution. To generate synthetic data, we employ the functionsd_measurements. Notice that the measurement component is defined using Stanlanguage. Furthermore, the quantity of interest corresponds to the differenceat discrete times of the stock C (cumulative incidence).

meas_mdl <- list("y ~ neg_binomial_2(net_flow(C), 10)")set.seed(789)dat_df <- sd_measurements(n_meas       = 1,                          ds_inputs    = mdl$deSolve_components,                          meas_model   = meas_mdl,                          start_time   = 0,                          stop_time    = 70,                          timestep     = 1/16,                          integ_method = "rk4")

ggplot(dat_df, aes(time, measurement)) +  geom_point(colour = "#7B92DC") +  labs(x = "Day", y = "Incidence [People per day]") +  theme_classic()

3.1 Inference

meas_mdl <- list("y ~ neg_binomial_2(net_flow(C), phi)")est_params <- list(sd_prior(par_name  = "par_beta",                             dist      = "lognormal",                            dist_pars =  c(0, 1)),                   sd_prior("par_rho", "beta", c(2, 2)),                   # This parameter only affects initial conditions                    sd_prior("I0", "lognormal", c(0, 1), type = "init"))stan_file <- sd_Bayes(filepath         = model_path,                      meas_mdl         = meas_mdl,                      estimated_params = est_params)cat(stan_file)

## // Code generated by the R package readsdr v0.3.0.9000## // See more info at github https://github.com/jandraor/readsdr## functions {##   vector X_model(real time, vector y, array[] real params) {##     vector[5] dydt;##     real S_to_E;##     real E_to_I;##     real I_to_R;##     real C_in;##     S_to_E = params[1]*y[1]*y[3]/10000;##     E_to_I = 0.5*y[2];##     I_to_R = 0.5*y[3];##     C_in = params[2]*E_to_I;##     dydt[1] = -S_to_E;##     dydt[2] = S_to_E-E_to_I;##     dydt[3] = E_to_I-I_to_R;##     dydt[4] = I_to_R;##     dydt[5] = C_in;##     return dydt;##   }## }## data {##   int<lower = 1> n_obs;##   array[n_obs] int y;##   array[n_obs] real ts;## }## parameters {##   real<lower = 0> par_beta;##   real<lower = 0, upper = 1> par_rho;##   real<lower = 0> I0;##   real<lower = 0> inv_phi;## }## transformed parameters{##   array[n_obs] vector[5] x; // Output from the ODE solver##   array[2] real params;##   vector[5] x0; // init values##   array[n_obs] real delta_x_1;##   real phi;##   phi = 1 / inv_phi;##   x0[1] = (10000) - I0 - (0); // S##   x0[2] = 0; // E##   x0[3] = I0; // I##   x0[4] = 0; // R##   x0[5] = 0; // C##   params[1] = par_beta;##   params[2] = par_rho;##   x = ode_rk45(X_model, x0, 0, ts, params);##   delta_x_1[1] =  x[1, 5] - x0[5] + 1e-5;##   for (i in 1:n_obs-1) {##     delta_x_1[i + 1] = x[i + 1, 5] - x[i, 5] + 1e-5;##   }## }## model {##   par_beta ~ lognormal(0, 1);##   par_rho ~ beta(2, 2);##   I0 ~ lognormal(0, 1);##   inv_phi ~ exponential(5);##   y ~ neg_binomial_2(delta_x_1, phi);## }## generated quantities {##   real log_lik;##   array[n_obs] int sim_y;##   log_lik = neg_binomial_2_lpmf(y | delta_x_1, phi);##   sim_y = neg_binomial_2_rng(delta_x_1, phi);## }

stan_path <- "./Stan_files/SEIR.stan"write_file(stan_file, file = stan_path)

stan_d <- list(n_obs = nrow(dat_df),               y     = dat_df$measurement,               ts    = 1:max(dat_df$time))mod           <- cmdstan_model(stan_path)fit <- mod$sample(data              = stan_d,                  chains          = 4,                  parallel_chains = 4,                  iter_warmup     = 1000,                  iter_sampling   = 1000,                  refresh         = 100,                  save_warmup     = FALSE)

## Running MCMC with 4 parallel chains...## ## Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) ## Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) ## Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) ## Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) ## Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) ## Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) ## Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) ## Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) ## Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) ## Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) ## Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) ## Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) ## Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) ## Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) ## Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) ## Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) ## Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) ## Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) ## Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) ## Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) ## Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) ## Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) ## Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) ## Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) ## Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) ## Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) ## Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) ## Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) ## Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) ## Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) ## Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) ## Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) ## Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) ## Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) ## Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) ## Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) ## Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) ## Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) ## Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) ## Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) ## Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) ## Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) ## Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) ## Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) ## Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) ## Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) ## Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) ## Chain 1 finished in 14.7 seconds.## Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) ## Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) ## Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) ## Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) ## Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) ## Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) ## Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) ## Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) ## Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) ## Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) ## Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) ## Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) ## Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) ## Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) ## Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) ## Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) ## Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) ## Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) ## Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) ## Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) ## Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) ## Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) ## Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) ## Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) ## Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) ## Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) ## Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) ## Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) ## Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) ## Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) ## Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) ## Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) ## Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) ## Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) ## Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) ## Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) ## Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) ## Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) ## Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) ## Chain 2 finished in 13.6 seconds.## Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) ## Chain 4 finished in 13.7 seconds.## Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) ## Chain 3 finished in 13.7 seconds.## ## All 4 chains finished successfully.## Mean chain execution time: 13.9 seconds.## Total execution time: 21.5 seconds.

3.1.1 Diagnostics

Before proceeding with any further analysis, it is imperative that we trust theresults from the Hamiltonian Monte Carlo (HMC) sampler. To do so, Stan providesa range of tests that check the validity of the sampling process. If we type thecode below, one should get the message‘Processing complete, no problems’. Anyother result suggests model reformulation or adjustment to the algorithm’sparameters.

fit$cmdstan_diagnose()

## Processing csv files: /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR-202404221731-1-35840e.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR-202404221731-2-35840e.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR-202404221731-3-35840e.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR-202404221731-4-35840e.csv## ## Checking sampler transitions treedepth.## Treedepth satisfactory for all transitions.## ## Checking sampler transitions for divergences.## No divergent transitions found.## ## Checking E-BFMI - sampler transitions HMC potential energy.## E-BFMI satisfactory.## ## Effective sample size satisfactory.## ## Split R-hat values satisfactory all parameters.## ## Processing complete, no problems detected.

3.1.2 Posterior predictive checks

posterior_df <- as_draws_df(fit$draws())summarise_predicted_observations <- function(posterior_df, var_name) {    y_tilde <- readsdr::extract_timeseries_var(var_name, posterior_df)    y_tilde |> group_by(time) |>  summarise(q2.5  = quantile(value, 0.025),            q25   = quantile(value, 0.25),            mean  = mean(value),            q75   = quantile(value, 0.75),            q97.5 = quantile(value, 0.975))}summary_y <- summarise_predicted_observations(posterior_df, "sim_y")

ggplot(summary_y, aes(time, mean)) +  geom_line(colour = "grey60") +  geom_ribbon(aes(ymin = q25, ymax = q75), fill = "grey90", alpha = 0.5) +  geom_ribbon(aes(ymin = q2.5, ymax = q97.5), fill = "grey90", alpha = 0.25) +  geom_point(data = dat_df, aes(time, measurement), colour = "#7B92DC") +  labs(y = "Incidence", x = "Day") +  theme_classic() +  theme(axis.line = element_line(colour = "grey70"),        axis.text = element_text(colour = "grey70"),        axis.ticks = element_line(colour = "grey70"))

pars_df <- posterior_df[, c("par_beta", "I0", "par_rho", "inv_phi")] |>   mutate(par_phi = 1 / inv_phi) |>   dplyr::select(-inv_phi) |>   mutate(iter = row_number()) |>   pivot_longer(-iter, names_to = "par")

3.1.3 Parameter estimates (marginal distributions)

Vertical lines denote the true value.

actual_vals <- data.frame(par   = c("I0", "par_beta", "par_rho", "par_phi"),                          value = c(1, 1, 0.75, 10))ggplot(pars_df, aes(value)) +  geom_histogram(colour = "white", fill = "grey60", alpha = 0.75) +  facet_wrap(vars(par), scales = "free") +  geom_vline(data = actual_vals, aes(xintercept = value),              colour = "#7B92DC", linewidth = 1.5) +  theme_classic()

4 Cumberland

Thereadsdr package contains a data set of daily influenza cases detected bythe US Public Health Service in Maryland during the 1918 influenza pandemic,from 22 September to 30 November 1918(Frost and Sydenstricker 1919).

Maryland <- Maryland |>  mutate(time = row_number())ggplot(Maryland, aes(time, Cumberland)) +  geom_col(fill = "#E35A43") +  labs(x = "Day", y = "Cases") +  theme_bw()

4.1 Inference

This example replicates the work presented inAndrade and Duggan (2021). However,here I use the negative binomial distribution instead of Poisson.

meas_mdl <- list("y ~ neg_binomial_2(net_flow(C), phi)")est_params <- list(sd_prior(par_name  = "par_beta",                             dist      = "lognormal",                            dist_pars =  c(0, 1)),                   sd_prior("par_rho", "beta", c(2, 2)),                   sd_prior("I0", "lognormal", c(0, 1), type = "init"))stan_file <- sd_Bayes(filepath         = model_path,                      meas_mdl         = meas_mdl,                      estimated_params = est_params,                      # Override values in the .stmx file                      const_list = list(N  = 5234,                                      # Initial number of recovered individuals                                        R0 = round(5234 * 0.3)))cat(stan_file)

## // Code generated by the R package readsdr v0.3.0.9000## // See more info at github https://github.com/jandraor/readsdr## functions {##   vector X_model(real time, vector y, array[] real params) {##     vector[5] dydt;##     real S_to_E;##     real E_to_I;##     real I_to_R;##     real C_in;##     S_to_E = params[1]*y[1]*y[3]/5234;##     E_to_I = 0.5*y[2];##     I_to_R = 0.5*y[3];##     C_in = params[2]*E_to_I;##     dydt[1] = -S_to_E;##     dydt[2] = S_to_E-E_to_I;##     dydt[3] = E_to_I-I_to_R;##     dydt[4] = I_to_R;##     dydt[5] = C_in;##     return dydt;##   }## }## data {##   int<lower = 1> n_obs;##   array[n_obs] int y;##   array[n_obs] real ts;## }## parameters {##   real<lower = 0> par_beta;##   real<lower = 0, upper = 1> par_rho;##   real<lower = 0> I0;##   real<lower = 0> inv_phi;## }## transformed parameters{##   array[n_obs] vector[5] x; // Output from the ODE solver##   array[2] real params;##   vector[5] x0; // init values##   array[n_obs] real delta_x_1;##   real phi;##   phi = 1 / inv_phi;##   x0[1] = (5234) - I0 - (1570); // S##   x0[2] = 0; // E##   x0[3] = I0; // I##   x0[4] = 1570; // R##   x0[5] = 0; // C##   params[1] = par_beta;##   params[2] = par_rho;##   x = ode_rk45(X_model, x0, 0, ts, params);##   delta_x_1[1] =  x[1, 5] - x0[5] + 1e-5;##   for (i in 1:n_obs-1) {##     delta_x_1[i + 1] = x[i + 1, 5] - x[i, 5] + 1e-5;##   }## }## model {##   par_beta ~ lognormal(0, 1);##   par_rho ~ beta(2, 2);##   I0 ~ lognormal(0, 1);##   inv_phi ~ exponential(5);##   y ~ neg_binomial_2(delta_x_1, phi);## }## generated quantities {##   real log_lik;##   array[n_obs] int sim_y;##   log_lik = neg_binomial_2_lpmf(y | delta_x_1, phi);##   sim_y = neg_binomial_2_rng(delta_x_1, phi);## }

stan_path <- "./Stan_files/SEIR_Cumberland.stan"write_file(stan_file, file = stan_path)

stan_d <- list(n_obs = nrow(Maryland),               y     = Maryland$Cumberland,               ts    = 1:max(Maryland$time))mod           <- cmdstan_model(stan_path)fit <- mod$sample(data              = stan_d,                  chains          = 4,                  parallel_chains = 4,                  iter_warmup     = 1000,                  iter_sampling   = 1000,                  refresh         = 100,                  save_warmup     = FALSE)

4.1.1 Diagnostics

Don’t forget the diagnostics.

fit$cmdstan_diagnose()

## Processing csv files: /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR_Cumberland-202404221732-1-6b3946.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR_Cumberland-202404221732-2-6b3946.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR_Cumberland-202404221732-3-6b3946.csv, /var/folders/51/j15clly97bl0z25f9q1s9hdm0000gn/T/RtmpLxEEdc/SEIR_Cumberland-202404221732-4-6b3946.csv## ## Checking sampler transitions treedepth.## Treedepth satisfactory for all transitions.## ## Checking sampler transitions for divergences.## No divergent transitions found.## ## Checking E-BFMI - sampler transitions HMC potential energy.## E-BFMI satisfactory.## ## Effective sample size satisfactory.## ## Split R-hat values satisfactory all parameters.## ## Processing complete, no problems detected.

4.1.2 Posterior predictive checks

posterior_df <- as_draws_df(fit$draws())summary_y    <- summarise_predicted_observations(posterior_df, "sim_y")

plot_predictive_checks <- function(summary_y, Maryland) {    ggplot(summary_y, aes(time, mean)) +  geom_line(colour = "grey60") +  geom_ribbon(aes(ymin = q25, ymax = q75), fill = "grey90", alpha = 0.5) +  geom_ribbon(aes(ymin = q2.5, ymax = q97.5), fill = "grey90", alpha = 0.25) +  geom_point(data = Maryland, aes(time, Cumberland), colour = "#E35A43") +  labs(y = "Incidence", x = "Day") +  theme_classic() +  theme(axis.line = element_line(colour = "grey70"),        axis.text = element_text(colour = "grey70"),        axis.ticks = element_line(colour = "grey70"))}plot_predictive_checks(summary_y, Maryland)

4.1.3 Basic reproduction number

We can estimate the 95% uncertainty interval of\(\Re_0\) using the posteriordistribution.

# \Re_0 = beta / gammagamma_val         <- constants_df |> filter(name == "par_sigma") |>   pull(value) |> as.numeric()R0_expected_value <- posterior_df$par_beta / gamma_val quantile(R0_expected_value, c(0.025, 0.975)) |> round(2)

##  2.5% 97.5% ##  2.23  2.42

4.2 Fixing the mean generation time

However, the estimation of the reproduction number is sensitive to the chosenmean generation time. To confirm this observation, we employ theparameterisation suggested byAndrade and Duggan (2023). In particular, we use thealternativeSEI2Rparameterisationwithone stage in the latent compartment andtwo stages in the infectiousclass. In this parameterisation, the mean generation time (\(\tau\)) is anexogenous parameter.

Recall that the mean generation time in the SEIR model can be expressed as afunction (Eq(4.1)) that depends on the number of stages of the infectiouscompartment (\(j\)), and the latent (\(\sigma^{-1}\)) and infectious(\(\gamma^{-1}\)). According to Eq(4.1), the assumed mean generation timein the model fitted in the previous section is\((2) + \frac{(1+1)}{(2\times1)}(2) = 4\).

\[\begin{equation} \tau = \sigma ^{-1} + \frac{j +1}{2j} \gamma^{-1} \tag{4.1}\end{equation}\]

Then, I demonstrate that changes in the parameterisation or in the infectiousperiod distribution do not lead to variations in the estimated reproductionnumber, provided that the mean generation time is held at4. In this example,I also show how to change the value of fixed parameters without having to createa new Stan file. Notice the last line of the call to thesd_Bayes function,where I declare (data_params = "par_tau") thatpar_tau will be configured inStan’sdata block.

model_path <- "https://jandraor.github.io/models/SE1I2R_alt.stmx"meas_mdl <- list("y ~ neg_binomial_2(net_flow(C), phi)")est_params <- list(sd_prior(par_name  = "par_inv_R0",                             dist      = "beta",                            dist_pars =  c(2, 2)),                   sd_prior("par_rho", "beta", c(2, 2)),                   sd_prior("I0", "lognormal", c(0, 1), type = "init"))stan_file <- sd_Bayes(filepath         = model_path,                      meas_mdl         = meas_mdl,                      estimated_params = est_params,                      # Override values in the .stmx file                      const_list = list(N  = 5234,                                      # Initial number of recovered individuals                                        xi = 0.3),                      data_params = "par_tau")cat(stan_file)

## // Code generated by the R package readsdr v0.3.0.9000## // See more info at github https://github.com/jandraor/readsdr## functions {##   vector X_model(real time, vector y, array[] real params) {##     vector[6] dydt;##     real E1_to_I1;##     real C_in;##     real aux_j;##     real aux_tau;##     real var_beta;##     real var_gamma;##     real I2_to_R;##     real S_to_E;##     real I1_to_I2;##     E1_to_I1 = 0.5*y[2];##     C_in = params[2]*E1_to_I1;##     aux_j = (2+1)/(2.0*2);##     aux_tau = params[3]-(1/0.5);##     var_beta = (1/params[1])*(aux_j/aux_tau);##     var_gamma = var_beta*params[1];##     I2_to_R = 2*var_gamma*y[6];##     S_to_E = var_beta*y[1]*(y[3]+y[6])/5234;##     I1_to_I2 = 2*var_gamma*y[3];##     dydt[1] = -S_to_E;##     dydt[2] = S_to_E-E1_to_I1;##     dydt[3] = E1_to_I1-I1_to_I2;##     dydt[4] = I2_to_R;##     dydt[5] = C_in;##     dydt[6] = I1_to_I2-I2_to_R;##     return dydt;##   }## }## data {##   int<lower = 1> n_obs;##   array[n_obs] int y;##   array[n_obs] real ts;##   real par_tau;## }## parameters {##   real<lower = 0, upper = 1> par_inv_R0;##   real<lower = 0, upper = 1> par_rho;##   real<lower = 0> I0;##   real<lower = 0> inv_phi;## }## transformed parameters{##   array[n_obs] vector[6] x; // Output from the ODE solver##   array[3] real params;##   vector[6] x0; // init values##   array[n_obs] real delta_x_1;##   real phi;##   phi = 1 / inv_phi;##   x0[1] = (5234) * (1 -  (0.3)) - I0; // S##   x0[2] = 0; // E1##   x0[3] = I0; // I1##   x0[4] = 1570.2; // R##   x0[5] = I0; // C##   x0[6] = 0; // I2##   params[1] = par_inv_R0;##   params[2] = par_rho;##   params[3] = par_tau;##   x = ode_rk45(X_model, x0, 0, ts, params);##   delta_x_1[1] =  x[1, 5] - x0[5] + 1e-5;##   for (i in 1:n_obs-1) {##     delta_x_1[i + 1] = x[i + 1, 5] - x[i, 5] + 1e-5;##   }## }## model {##   par_inv_R0 ~ beta(2, 2);##   par_rho ~ beta(2, 2);##   I0 ~ lognormal(0, 1);##   inv_phi ~ exponential(5);##   y ~ neg_binomial_2(delta_x_1, phi);## }## generated quantities {##   real log_lik;##   array[n_obs] int sim_y;##   log_lik = neg_binomial_2_lpmf(y | delta_x_1, phi);##   sim_y = neg_binomial_2_rng(delta_x_1, phi);## }

stan_path <- "./Stan_files/SEI2R_Cumberland.stan"write_file(stan_file, file = stan_path)

mod           <- cmdstan_model(stan_path)stan_d <- list(n_obs   = nrow(Maryland),               y       = Maryland$Cumberland,               ts      = 1:max(Maryland$time),               par_tau = 4)fit <- mod$sample(data            = stan_d,                  chains          = 4,                  parallel_chains = 4,                  iter_warmup     = 1000,                  iter_sampling   = 1000,                  refresh         = 100,                  save_warmup     = FALSE)

posterior_df <- as_draws_df(fit$draws())summary_y    <- summarise_predicted_observations(posterior_df, "sim_y")plot_predictive_checks(summary_y, Maryland)

Despite I changed the infectious period distribution (from exponential toErlang), the 95% credible interval is nearly identical to that calculated insection4.2.1. As I stated, the only quantity that matters is the meangeneration time.

R0_expected_value <- 1/posterior_df$par_inv_R0quantile(R0_expected_value, c(0.025, 0.975)) |> round(2)

##  2.5% 97.5% ##  2.22  2.41

However, if I run the sampler with a different mean generation time (2.85 days),a value obtained from the literature(Wallinga and Lipsitch 2007; Hirotsu et al. 2004),…

stan_d <- list(n_obs   = nrow(Maryland),               y       = Maryland$Cumberland,               ts      = 1:max(Maryland$time),               par_tau = 2.85)fit <- mod$sample(data            = stan_d,                  chains          = 4,                  parallel_chains = 4,                  iter_warmup     = 1000,                  iter_sampling   = 1000,                  refresh         = 100,                  save_warmup     = FALSE)

I obtain a similar fit…

posterior_df <- as_draws_df(fit$draws())summary_y    <- summarise_predicted_observations(posterior_df, "sim_y")plot_predictive_checks(summary_y, Maryland)

But lower\(\Re_0\) estimates (first row in the table below).

pars_df <- posterior_df |> select(par_rho, I0, par_inv_R0, phi) |>   mutate(R0 = 1/ par_inv_R0, .before = everything()) |>   select(-par_inv_R0)# Credible intervalscr_I <- apply(pars_df, 2, function(par) quantile(par, c(0.025, 0.975))) |>  t() |>  as.data.frame() par_names <- c("\u211c", "\u03C1", "I", "\u03D5")cr_I |> mutate(Parameter = par_names, .before = everything()) |>   gt() |> fmt_number(columns = 2:3, decimals = 2) |>   text_transform(    locations = cells_body(columns = c(Parameter), rows = c(1, 3)),     fn        = function(x) str_glue("{x}<sub>0</sub>"))

Parameter	2.5%	97.5%
ℜ₀	1.97	2.10
ρ	0.82	0.98
I₀	1.66	3.89
ϕ	1.91	4.28

5 Less data & forecast

Now, imagine that you only had access to the first 30 measurements. How wouldthat change the estimate, and would the model forecast the remaining data pointsaccurately?

first_30 <- Maryland |> filter(time < 31)

To generate forecasts, we set the parameterforecast toTRUE in thesd_bayes function. This parameter instructs Stan to simulaten_fcst daysafter the last measurement. The user specifies the value ofn_fcstin the data block. In this case, we will forecast 61 days.

model_path <- "https://jandraor.github.io/models/SE1I2R_alt.stmx"meas_mdl <- list("y ~ neg_binomial_2(net_flow(C), phi)")est_params <- list(sd_prior(par_name  = "par_inv_R0",                             dist      = "beta",                            dist_pars =  c(2, 2)),                   sd_prior("par_rho", "beta", c(2, 2)),                   sd_prior("I0", "lognormal", c(0, 1), type = "init"))stan_file <- sd_Bayes(filepath         = model_path,                      meas_mdl         = meas_mdl,                      estimated_params = est_params,                      # Override values in the .stmx file                      const_list = list(N  = 5234,                                      # Initial number of recovered individuals                                        xi = 0.3,                                        par_tau = 2.85),                       forecast = TRUE)cat(stan_file)

## // Code generated by the R package readsdr v0.3.0.9000## // See more info at github https://github.com/jandraor/readsdr## functions {##   vector X_model(real time, vector y, array[] real params) {##     vector[6] dydt;##     real E1_to_I1;##     real C_in;##     real aux_j;##     real aux_tau;##     real var_beta;##     real var_gamma;##     real I2_to_R;##     real S_to_E;##     real I1_to_I2;##     E1_to_I1 = 0.5*y[2];##     C_in = params[2]*E1_to_I1;##     aux_j = (2+1)/(2.0*2);##     aux_tau = 2.85-(1/0.5);##     var_beta = (1/params[1])*(aux_j/aux_tau);##     var_gamma = var_beta*params[1];##     I2_to_R = 2*var_gamma*y[6];##     S_to_E = var_beta*y[1]*(y[3]+y[6])/5234;##     I1_to_I2 = 2*var_gamma*y[3];##     dydt[1] = -S_to_E;##     dydt[2] = S_to_E-E1_to_I1;##     dydt[3] = E1_to_I1-I1_to_I2;##     dydt[4] = I2_to_R;##     dydt[5] = C_in;##     dydt[6] = I1_to_I2-I2_to_R;##     return dydt;##   }## }## data {##   int<lower = 1> n_obs;##   array[n_obs] int y;##   array[n_obs] real ts;##   int<lower = 1> n_fcst;## }## parameters {##   real<lower = 0, upper = 1> par_inv_R0;##   real<lower = 0, upper = 1> par_rho;##   real<lower = 0> I0;##   real<lower = 0> inv_phi;## }## transformed parameters{##   array[n_obs] vector[6] x; // Output from the ODE solver##   array[2] real params;##   vector[6] x0; // init values##   array[n_obs] real delta_x_1;##   real phi;##   phi = 1 / inv_phi;##   x0[1] = (5234) * (1 -  (0.3)) - I0; // S##   x0[2] = 0; // E1##   x0[3] = I0; // I1##   x0[4] = 1570.2; // R##   x0[5] = I0; // C##   x0[6] = 0; // I2##   params[1] = par_inv_R0;##   params[2] = par_rho;##   x = ode_rk45(X_model, x0, 0, ts, params);##   delta_x_1[1] =  x[1, 5] - x0[5] + 1e-5;##   for (i in 1:n_obs-1) {##     delta_x_1[i + 1] = x[i + 1, 5] - x[i, 5] + 1e-5;##   }## }## model {##   par_inv_R0 ~ beta(2, 2);##   par_rho ~ beta(2, 2);##   I0 ~ lognormal(0, 1);##   inv_phi ~ exponential(5);##   y ~ neg_binomial_2(delta_x_1, phi);## }## generated quantities {##   real log_lik;##   array[n_obs] int sim_y;##   array[n_fcst] int fcst_y;##   array[n_fcst] vector[6] x_fcst; // Forecast##   array[n_fcst] real t_fcst;##   vector[6] x_fcst0; // Forecast init values##   array[n_fcst] real delta_x_fcst_1;##   log_lik = neg_binomial_2_lpmf(y | delta_x_1, phi);##   // Simulate forecast##   x_fcst0 = x[n_obs, :];##   t_fcst = linspaced_array(n_fcst, 1, n_fcst);##   x_fcst = ode_rk45(X_model, x_fcst0, 0, t_fcst, params);##   delta_x_fcst_1[1] =  x_fcst[1, 5] - x_fcst0[5] + 1e-5;##   for (i in 1:n_fcst-1) {##     delta_x_fcst_1[i + 1] = x_fcst[i + 1, 5] - x_fcst[i, 5] + 1e-5;##   }##   sim_y = neg_binomial_2_rng(delta_x_1, phi);##   fcst_y = neg_binomial_2_rng(delta_x_fcst_1, phi);## }

stan_path <- "./Stan_files/SEI2R_forecast.stan"write_file(stan_file, file = stan_path)

We run the inference process…

mod           <- cmdstan_model(stan_path)stan_d <- list(n_obs   = nrow(first_30),               y       = first_30$Cumberland,               ts      = 1:max(first_30$time),               n_fcst  = 61)fit <- mod$sample(data            = stan_d,                  chains          = 4,                  parallel_chains = 4,                  iter_warmup     = 1000,                  iter_sampling   = 1000,                  refresh         = 100,                  save_warmup     = FALSE)

To understand the code, let’s define two key variables.sim_y representssimulated measurements up to and including the last available data point. Inother words, it’s a hindcast (grey ribbon). Conversely,fcst_y representssimulated measurements after the last available data point, making it aforecast (blue ribbon).

posterior_df <- as_draws_df(fit$draws())summary_y_tilde <- summarise_predicted_observations(posterior_df, "sim_y")summary_y_tilde_fcst <- summarise_predicted_observations(posterior_df,                                                          "fcst_y") |>  mutate(time = time + 30) |>   bind_rows(summary_y_tilde |> tail(1))  ggplot(summary_y_tilde, aes(time, mean)) +  geom_line(colour = "grey60") +  geom_ribbon(aes(ymin = q25, ymax = q75), fill = "grey90", alpha = 0.5) +  geom_ribbon(aes(ymin = q2.5, ymax = q97.5), fill = "grey90", alpha = 0.25) +  geom_line(data = summary_y_tilde_fcst, colour = "#7B92DC") +  geom_ribbon(data = summary_y_tilde_fcst,              aes(ymin = q25, ymax = q75), fill = "#7B92DC", alpha = 0.2) +  geom_ribbon(data = summary_y_tilde_fcst,              aes(x = time, ymin = q2.5, ymax = q97.5), fill = "#7B92DC", alpha = 0.1) +  geom_point(data = Maryland, aes(time, Cumberland), colour = "#E35A43") +  labs(y = "Incidence", x = "Day") +  theme_classic() +  theme(axis.line = element_line(colour = "grey70"),        axis.text = element_text(colour = "grey70"),        axis.ticks = element_line(colour = "grey70"))

While using only the first 30 data points leads to a less accurate capture oftransmission dynamics compared to using all the data, the estimates of\(\Re_0\)remain fairly similar.

R0_expected_value <- 1/posterior_df$par_inv_R0quantile(R0_expected_value, c(0.025, 0.975)) |> round(2)

##  2.5% 97.5% ##  1.93  2.23

6 Computational environment

sessionInfo()

## R version 4.3.1 (2023-06-16)## Platform: aarch64-apple-darwin20 (64-bit)## Running under: macOS Sonoma 14.4.1## ## Matrix products: default## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0## ## locale:## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: Europe/Dublin## tzcode source: internal## ## attached base packages:## [1] stats     graphics  grDevices utils     datasets  methods   base     ## ## other attached packages:## [1] tidyr_1.3.0        stringr_1.5.0      readsdr_0.3.0.9000 readr_2.1.4       ## [5] posterior_1.4.1    gt_0.9.0           ggplot2_3.4.4      dplyr_1.1.2       ## [9] cmdstanr_0.6.1    ## ## loaded via a namespace (and not attached):##  [1] tensorA_0.36.2       sass_0.4.7           utf8_1.2.4          ##  [4] generics_0.1.3       xml2_1.3.5           jpeg_0.1-10         ##  [7] stringi_1.7.12       hms_1.1.3            digest_0.6.33       ## [10] magrittr_2.0.3       evaluate_0.21        grid_4.3.1          ## [13] bookdown_0.35        fastmap_1.1.1        jsonlite_1.8.7      ## [16] processx_3.8.2       backports_1.4.1      deSolve_1.36        ## [19] ps_1.7.5             purrr_1.0.2          fansi_1.0.6         ## [22] scales_1.3.0         jquerylib_0.1.4      abind_1.4-5         ## [25] cli_3.6.1            rlang_1.1.2          munsell_0.5.0       ## [28] withr_2.5.2          cachem_1.0.8         yaml_2.3.7          ## [31] tools_4.3.1          tzdb_0.4.0           checkmate_2.2.0     ## [34] colorspace_2.1-0     curl_5.0.2           vctrs_0.6.5         ## [37] R6_2.5.1             lifecycle_1.0.4      pkgconfig_2.0.3     ## [40] pillar_1.9.0         bslib_0.5.1          gtable_0.3.4        ## [43] data.table_1.14.8    glue_1.6.2           highr_0.10          ## [46] xfun_0.40            tibble_3.2.1         tidyselect_1.2.0    ## [49] rstudioapi_0.15.0    knitr_1.43           farver_2.1.1        ## [52] htmltools_0.5.6      labeling_0.4.3       rmarkdown_2.24      ## [55] compiler_4.3.1       distributional_0.3.2

References

Andrade, Jair, and Jim Duggan. 2021.“A Bayesian approach to calibrate system dynamics models using Hamiltonian Monte Carlo.”System Dynamics Review 37 (4): 283–309.https://doi.org/10.1002/sdr.1693.

———. 2023.“Anchoring the Mean Generation Time in the SEIR to Mitigate Biases in R0 Estimates Due to Uncertainty in the Distribution of the Epidemiological Delays.”Royal Society Open Science 10 (8): 230515.https://doi.org/10.1098/rsos.230515.

Frost, W. H., and Edgar Sydenstricker. 1919.“Influenza in Maryland: Preliminary Statistics of Certain Localities.”Public Health Reports (1896-1970) 34 (11): 491–504.http://www.jstor.org/stable/4575056.

Hirotsu, Nobuo, Hideyuki Ikematsu, Norio Iwaki, Naoki Kawai, Takeshi Shigematsu, Osamu Kunishima, and Seizaburou Kashiwagi. 2004.“Effects of Antiviral Drugs on Viral Detection in Influenza Patients and on the Sequential Infection to Their Family Members—Serial Examination by Rapid Diagnosis (Capilia) and Virus Culture.”International Congress Series 1263: 105–8.https://doi.org/10.1016/j.ics.2004.02.020.

Wallinga, J, and M Lipsitch. 2007.“How Generation Intervals Shape the Relationship Between Growth Rates and Reproductive Numbers.”Proceedings of the Royal Society B: Biological Sciences 274 (February): 599–604.https://doi.org/10.1098/rspb.2006.3754.

Movatterモバイル変換