- Notifications
You must be signed in to change notification settings - Fork0
License
Mavrogiannis-Ioannis/dsmmR
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ThedsmmR R package allows the user to estimate, simulate and definedifferent Drifting semi-Markov model (DSMM) specifications.
# Install the released version from CRANinstall.packages('dsmmR')# Or the development version from GitHub# install.packages("devtools")devtools::install_github("Mavrogiannis-Ioannis/dsmmR")
The main functions ofdsmmR are the following:
fit_dsmm(): estimate a DSMM (parametric or non-parametricestimation is possible).parametric_dsmm(): define a parametric DSMM.nonparametric_dsmm(): define a non-parametric DSMM.simulate(): simulate a sequence from a DSMM.get_kernel(): obtain the Drifting semi-Markov kernel.
Drifting semi-Markov models are best suited to capture non-homogeneitieswhich evolve in a linear (or polynomial) way. For example, through thisapproach we account for non-homogeneities that occur from the intrinsicevolution of the system or from the interactions between the system andthe environment.
For a detailed introduction in Drifting semi-Markov models consider thedocumentation through?dsmmR.
For an extensive description of this approach, consider visiting thecomplete documentation of the package on theofficial CRAN page.
The easiest way to usedsmmR is through the main functiondsmm_fit() in the non-parametric case. This function can estimate aDrifting semi-Markov model from a sequence of states (i.e. a charactervector in R). Example data is included in the package, defined in theDNA sequencelambda. Also some parameters need to be specified beforeusingdsmm_fit(), most notably the polynomialdegree and the modelof our choice. The model is chosen by defining whether the sojourn timesf and the transition matricesp are drifting or not.
# Loading the packagelibrary(dsmmR)# Obtaining the sequencedata("lambda",package="dsmmR")sequence<- c(lambda)# Obtaining the statesstates<- sort(unique(sequence))# Defining the polynomial degreedegree<-1# we define a linear evolution in time (state jumps of the embedded Markov chain)# Defining the modelf_is_drifting<-TRUE# sojourn time distributions are drifting in time (state jumps of the EMC)p_is_drifting<-FALSE# transition matrices are not drifting in time (state jumps of the EMC)# When f is drifting and p is not drifting, we have Model 3.# Fitting the drifting semi-Markov model on the sequence.fitted_model<- fit_dsmm(sequence=sequence,states=states,degree=degree,f_is_drifting=f_is_drifting,p_is_drifting=p_is_drifting)
For more details about the estimation, consider viewing the extendeddocumentation through?fit_dsmm.
After fitting a DSMM (or defining it throughnonparametric_dsmm() orparametric_dsmm()), we can simulate a sequence from that DSMM. This ispretty straightforward:
sim_seq<- simulate(fitted_model)
Since we follow an object oriented approach, providing the previousobjectfitted_model is the only necessary attribute.
For more information, consider the documentation through?simulate.dsmm.
In order to account for the dimension of the DSM kernel, a separatefunction was necessary. You can obtain the DSM kernel through thecommand:
kernel<- get_kernel(fitted_model)
The dimensionality of the DSM kernel can be reduced further through theattributes of the function.
For more information, consider the documentation through?get_kernel.
We can put together all the previous concepts in the showcase ofparametric estimation. First, we will define the drifting transitionmatrices and the drifting sojourn time distributions. Then, we willcreate adsmm_parametric object, we will simulate a sequence from itand then finally we will estimate a drifting semi-Markov model from thatsimulated sequence.
For more information, consider the documentation through?parametric_dsmm and?nonparametric_dsmm.
First of all we load the package,
library(dsmmR)and then we define the states and we set the degree equal to 1.
states<- c("a","b","c")s<- length(states)degree<-1
Since degree is equal to 1, we then define the 2 drifting transitionmatrices:
p_dist_1<-matrix(c(0,0.4,0.6,0.5,0,0.5,0.3,0.7,0 ),ncol=s,byrow=TRUE)p_dist_2<-matrix(c(0,0.55,0.45,0.25,0,0.75,0.5,0.5,0 ),ncol=s,byrow=TRUE)p_dist<-array(c(p_dist_1,p_dist_2),dim= c(s,s,degree+1))
Let us also consider the case where only the parameters of thedistributions modeling the sojourn times are drifting across thesequence. Note that distributions like the Negative Binomial and theDiscrete Weibull require two parameters, which we define in two matricesfor each distribution.
f_dist_1<-matrix(c(NA,"nbinom","unif","geom",NA,"pois","pois","dweibull",NA ),nrow=s,ncol=s,byrow=TRUE)f_dist_1_pars_1<-matrix(c(NA,4,3,0.7,NA,5,3,0.6,NA),nrow=s,ncol=s,byrow=TRUE)f_dist_1_pars_2<-matrix(c(NA,0.5,NA,NA,NA,NA,NA,0.8,NA),nrow=s,ncol=s,byrow=TRUE)f_dist_2<-f_dist_1f_dist_2_pars_1<-matrix(c(NA,3,5,0.3,NA,2,5,0.3,NA),nrow=s,ncol=s,byrow=TRUE)f_dist_2_pars_2<-matrix(c(NA,0.4,NA,NA,NA,NA,NA,0.5,NA),nrow=s,ncol=s,byrow=TRUE)f_dist<-array(c(f_dist_1,f_dist_2),dim= c(s,s,degree+1))f_dist_pars<-array(c(f_dist_1_pars_1,f_dist_1_pars_2,f_dist_2_pars_1,f_dist_2_pars_2),dim= c(s,s,2,degree+1))
Then, defining adsmm_parametric object is done simply through thefunctionparametric_dsmm():
dsmm_model<- parametric_dsmm(model_size=10000,states=states,initial_dist= c(0.6,0.3,0.1),degree=degree,p_dist=p_dist,f_dist=f_dist,f_dist_pars=f_dist_pars,p_is_drifting=TRUE,f_is_drifting=TRUE)
We can then simulate a sequence from this parametric object like-so:
sim_seq<- simulate(dsmm_model,klim=30,seed=1)
To fit this sequence with a drifting semi-Markov model, one can use:
fitted_model<- fit_dsmm(sequence=sim_seq,states=states,degree=degree,f_is_drifting=TRUE,p_is_drifting=TRUE,estimation='parametric',f_dist=f_dist)
Finally, the drifting transition matrix is estimated as:
print(fitted_model$dist$p_drift,digits=2)
with output:
, ,p_0abca0.000.400.60b0.510.000.49c0.270.730.00, ,p_1abca0.000.540.46b0.230.000.77c0.510.490.00
and the parameters for the drifting sojourn time distributions are:
print(fitted_model$dist$f_drift_parameters,digits=2)
with output:
, ,1,fpars_0abcaNA3.663.0b0.65NA4.8c3.090.62NA, ,2,fpars_0abcaNA0.46NAbNANANAcNA0.84NA, ,1,fpars_1abcaNA2.745.0b0.31NA2.1c5.020.29NA, ,2,fpars_1abcaNA0.38NAbNANANAcNA0.50NA
Regarding semi-Markov models, the bookSemi-Markov Chains and Hidden Semi-Markov Models toward Applications gives a good overview of the topic and also combines the flexibility of thesemi-Markov chain with the known advantages of hidden semi-markovmodels.
If you are not familiar with Drifting Markov models, they were firstintroduced inDrifting Markov models with Polynomial Drift and Applications to DNA Sequences,while a comprehensive overview is provided inReliability and Survival Analysis for Drifting Markov Models: Modeling and Estimation.
For third parties wishing to contribute to the software, or to reportissues or problems about the software, they can do so directly throughthedevelopment github page of the package.
Automated tests are in place in order to aid the user with any falseinput made and, furthermore, to ensure that the functions used returnthe expected output. Moreover, through strict automated tests, it ismade possible for the user to properly define their owndsmm objectsand make use of them with the generic functions of the package.
If you are in need of support, please contact the maintainer atmavrogiannis.ioa@gmail.com.
Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and HiddenSemi-Markov Models Toward Applications - Their Use in Reliability andDNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift andApplications to DNA Sequences. Statistical Applications in GeneticsMolecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis fordrifting Markov models: modelling and estimation. Methodology andComputing in Applied Probability, 21(4), 1407-1429.
The author and developer Mavrogiannis Ioannis would like to acknowledgethat this research work was primarily conducted while he was affiliatedwith the LMRS laboratory (UMR CNRS 6085, University of Rouen Normandy,France), while he is currently affiliated with the M2P2 laboratory(Aix-Marseille Université, CNRS, Centrale Marseille, M2P2 UMR 7340,Marseille, France).
The authors acknowledge the DATALAB Projecthttps://lmrs-num.math.cnrs.fr/projet-datalab.html (financed by theEuropean Union with the European Regional Development fund (ERDF) and bythe Normandy Region) and the HSMM-INCA Project (financed by the FrenchAgence Nationale de la Recherche (ANR) under grant ANR-21-CE40-0005).
About
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.