
A variety oflatent Markov models(Mews,Koslik, and Langrock 2024), includinghidden Markovmodels (HMMs),hidden semi-Markov models(HSMMs),state-space models (SSMs) andcontinuous-time variants can be formulated andestimated within the same framework via directly maximising thelikelihood function using the so-calledforwardalgorithm(Zucchini,MacDonald, and Langrock 2016). Applied researchers often need custommodels that standard software does not easily support. Writing tailoredR code offers flexibility but suffers from slow estimationspeeds. ThisR package solves these issues by providingeasy-to-use functions (written in C++ for speed) for common tasks likethe forward algorithm. These functions can be combined into custommodels in a Lego-type approach, offering up to 10-20 times fasterestimation via standard numerical optimisers. In its most recentiteration,LaMa allows for automatic differentiation withtheRTMB package which drastically increases speed andaccuracy even more.
The most important families of functions are
theforward family that calculates thelog-likelihood for various different models,
thetpm family for calculating transitionprobability matrices,
thestationary family to compute stationary andperiodically stationary distributions
as well as thestateprobs andviterbifamilies for local and global decoding.
You can install the released package version fromCRAN with:
install.packages("LaMa")or the development version from Github:
remotes::install_github("janoleko/LaMa")To aid in building fully custom likelihood functions, this packagecontains several vignettes that demonstrate how to simulate data fromand estimate a wide range of models using the functions included in thispackage.
HMMs, from simple to complex:
Other latent Markov model classes:
We analyse thetrex data set contained in the package.It contains hourly step lengths of a Tyrannosaurus rex, living 66million years ago. To these data, we fit a simple 2-state HMM withstate-dependent gamma distributions for the step lengths.
library(LaMa)#> Loading required package: RTMBhead(trex,3)#> tod step angle state#> 1 9 0.3252437 NA 1#> 2 10 0.2458265 2.234562 1#> 3 11 0.2173252 -2.262418 1We start by defining the negative log-likelihood function. This ismade really convenient by the functionstpm() whichcomputes the transition probability matrix via the multinomial logitlink,stationary() which computes the stationarydistribution of the Markov chain andforward() whichcalculates the log-likelihood via the forward algorithm.
nll=function(par, step){# parameter transformations for unconstrained optimisation Gamma=tpm(par[1:2])# rowwise softmax delta=stationary(Gamma)# stationary distribution mu=exp(par[3:4])# state-dependent means sigma=exp(par[5:6])# state-dependent sds# calculating all state-dependent probabilities allprobs=matrix(1,length(step),2) ind=which(!is.na(step))for(jin1:2) allprobs[ind,j]=dgamma2(step[ind], mu[j], sigma[j])# simple forward algorithm to calculate log-likelihood-forward(delta, Gamma, allprobs)}To fit the model, we define the intial parameter vector andnumerically optimise the above function usingnlm():
par=c(-2,-2,# initial tpm params (logit-scale)log(c(0.3,2.5)),# initial means for step length (log-transformed)log(c(0.2,1.5)))# initial sds for step length (log-transformed)system.time( mod<-nlm(nll, par,step = trex$step))#> user system elapsed#> 0.368 0.010 0.380Really fast for 10.000 data points!
After tranforming the working (unconstrained) parameters to naturalparameters usingtpm() andstationary(), wecan visualise the results:
# transform parameters to working(Gamma =tpm(mod$estimate[1:2]))#> S1 S2#> S1 0.8269546 0.1730454#> S2 0.1608470 0.8391530(delta =stationary(Gamma))# stationary HMM#> S1 S2#> 0.481733 0.518267(mu =exp(mod$estimate[3:4]))#> [1] 0.3034926 2.5057053(sigma =exp(mod$estimate[5:6]))#> [1] 0.2015258 1.4908153hist(trex$step,prob =TRUE,bor ="white",breaks =40,main ="",xlab ="step length")curve(delta[1]*dgamma2(x, mu[1], sigma[1]),add =TRUE,lwd =2,col ="orange",n=500)curve(delta[2]*dgamma2(x, mu[2], sigma[2]),add =TRUE,lwd =2,col ="deepskyblue",n=500)legend("topright",col =c("orange","deepskyblue"),lwd =2,bty ="n",legend =c("state 1","state 2"))