First, we explain how to estimate the Lee-Carter model using thelc_stan function in theStanMoMo package.
For illustration, the package already includes the objectFRMaleData containing deaths (FRMaleData$Dxt)and central exposures (FRMaleData$Ext) of French males forthe period 1816-2017 and for ages 0-110 extracted from theHuman Mortality Database. In thisexample, we focus on ages 60-90 and the period 1980-2010. This can beobtained via:
ages.fit<-60:90years.fit<-1980:2010deathFR<-FRMaleData$Dxt[formatC(ages.fit),formatC(years.fit)]exposureFR<-FRMaleData$Ext[formatC(ages.fit),formatC(years.fit)]As a reminder, the Lee-Carter model assumes that the log death ratesare given by\[\log \mu_{xt}=\alpha_x+\beta_x \kappa_t\] To ensure identifiability of the model, we assume that\[\sum_x \beta_x=1,\kappa_1=0\]
Moreover, we assume that the period parameter follows a random walkwith drift:\[\kappa_t \sim \mathcal{N}(c+ \kappa_{t-1},\sigma)\] The choice of priors for each parameter can be found in thedocumentation of each function.
All the parameters can be estimated either under a Poisson model withargumentfamily = "poisson" or under a Negative-Binomialmodel which includes an additional overdispersion parameter\(\phi\) with argumentfamily="nb":\[D_{xt} \sim Poisson (e_{xt}\mu_{xt}) \text{ or } D_{xt} \sim NB(e_{xt}\mu_{xt},\phi)\] Given the matrix of deathsdeathFR and the matrixof central exposuresexposureFR, we can infer the posteriordistribution of all the parameters and obtain death rates forecasts forthe next 10 years under a Poisson model by a simple call to thelc_stan function:
fitLC=lc_stan(death = deathFR,exposure=exposureFR,forecast =10,family ="poisson",chains=1,iter=1000,cores=1)## ## SAMPLING FOR MODEL 'leecarter' NOW (CHAIN 1).## Chain 1: ## Chain 1: Gradient evaluation took 0.000235 seconds## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.35 seconds.## Chain 1: Adjust your expectations accordingly!## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)## Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)## Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)## Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)## Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)## Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)## Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)## Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)## Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)## Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)## Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)## Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)## Chain 1: ## Chain 1: Elapsed Time: 3.77 seconds (Warm-up)## Chain 1: 2.354 seconds (Sampling)## Chain 1: 6.124 seconds (Total)## Chain 1:Internally, the sampling is performed through therstan::sampling function. By default, Stan samples fourMarkov chains of 2000 iterations. For each chain, there are 1000 warmupiterations (hence, 4000 post warm-up draws in total). Moreover, bydefault, Stan uses 1 core but we recommend using as many processors asthe hardware and RAM allow (up to the number of chains). All this can beset using the following additional arguments : for instance, if onewants to sample 2 chains of 3000 iterations with 1000 warm-up samples,one can addchains=2,iter=3000,warmup=1000.
For additional arguments, the reader may refer to therstan::sampling documentation.
The output is an object of classstanfit (cf. therstan package) which contains, among others,posterior draws, posterior summary statistics and convergencediagnostics.
The easiest way to extract the posterior draws is to call therstan::extract function which returns a list with namedcomponents corresponding to the model parameters.
## [1] "a" "b" "c" "ks" "sigma" "k" "phi" ## [8] "k_p" "mufor" "log_lik" "pos" "pos2" "pos3" "lp__"Among these parameters, we find
a :\(\alpha_x\).b :\(\beta_x\).k :\(\kappa_t\).k_p : forecasts of\(\kappa_t\).mufor : forecasts of death rates\(\mu_{xt}\).log_lik : log-likelihoods.phi : overdispersion parameter for the NB model.c andsigma : drift and standard deviationof the random walk with drift.The user can have access to an interface for interactive MCMCdiagnostics, plots and tables helpful for analyzing posterior samplesthrough theshinystan package (fore more details, you cancheck theshinystan webpage)
Note: you need to close the shiny app before continuing in R.
Moreover, the package also includes functions to represent boxplotsof the posterior distribution based on theggplot2package.
For instance, boxplots for the parameters from the Lee-Carter modelcan be obtained via:
All forecast death rates\(\mu_{xt}\) are stored in the outputmufor. We point out that the outputmufor is amatrix of dimension\(N\)\(\times\)\((A .M)\) where the number of rows\(N\) is the posterior sample size and thenumber of columns\(A.M\) is theproduct of\(A\) the age dimension and\(M\) the number of forecast years.
In a similar fashion, the models RH, APC, CBD and M6 can be estimatedwith the following functions:
| Model | Predictor |
|---|---|
| LC | \(\log \mu_{xt} = \alpha_x +\beta_x\kappa_t\) |
| RH | \(\log \mu_{xt} = \alpha_x +\beta_x\kappa_t+\gamma_{t-x}\) |
| APC | \(\log \mu_{xt} = \alpha_x +\kappa_t +\gamma_{t-x}\) |
| CBD | \(\log \mu_{xt} =\kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)}\) |
| M6 | \(\log \mu_{xt} =\kappa_t^{(1)} + (x-\bar{x})\kappa_t^{(2)}+\gamma_{t-x}\) |
fitRH=rh_stan(death = deathFR,exposure=exposureFR,forecast =10,family ="poisson",cores=4)fitAPC=apc_stan(death = deathFR,exposure=exposureFR,forecast =10,family ="poisson",cores=4)fitCBD=cbd_stan(death = deathFR,exposure=exposureFR,age=ages.fit,forecast=10,family ="poisson",cores=4)fitM6=m6_stan(death = deathFR,exposure=exposureFR,age=ages.fit,forecast =10,family ="poisson",cores=4)For information, the estimated parameters\(\kappa_t^{(1)}\),\(\kappa_t^{(2)}\) and\(\gamma_{t-x}\) are respectively stored inthe variablesk,k2 andg. Theforecast parameters are respectively stored in the variablesk_p,k2_p andg_p.
You can check the next vignette to know more about model averagingtechniques based on leave-future-out validation.