Movatterモバイル変換


[0]ホーム

URL:


Variable Length Markov Chains withCovariates (COVLMC)

library(mixvlmc)library(ggplot2)

Limitations of Variable Length Markov Chains

Variable Length Markov Chains (VLMC) are very useful to capturecomplex structures in discrete time series as they can mix short memorywith long memory in a contextual way, leading to sparse models.

However, they cannot capture the influence of exogenous variables onthe behaviour of a time series. As a consequence, a VLMC adjusted onsequences influenced by covariates could fail to capture interestingpatterns.

A typical example

Let us focus again on the electrical usage data set used in thepackage introduction, using the following two weeks of data:

pc_week_15_16<- powerconsumption[powerconsumption$week%in%c(15,16), ]elec<- pc_week_15_16$active_powerggplot(pc_week_15_16,aes(x = date_time,y = active_power))+geom_line()+xlab("Date")+ylab("Activer power (kW)")

We build again a discrete time series using the followingthresholds:

elec_dts<-cut(elec,breaks =c(0,0.4,2,8),labels =c("low","typical","high"))

Adjusting a VLMC to this sequence gives the following model (usingBIC for model selection):

elec_vlmc_tune<-tune_vlmc(elec_dts)best_elec_vlmc<-as_vlmc(elec_vlmc_tune)draw(best_elec_vlmc)#> * (0.3269, 0.4841, 0.189)#> +-- low (0.8088, 0.1821, 0.009105)#> '-- typical (0.1292, 0.8041, 0.06667)#> |   +-- low (0.3667, 0.5833, 0.05)#> |   '-- typical (0.1022, 0.8365, 0.0613)#> |       '-- low (0.4058, 0.5507, 0.04348)#> '-- high (0, 0.1864, 0.8136)

Considering the apparent regularity of the time series, one maywonder whether this model is really capturing the dynamics of powerelectricity consumption. In particular the longest memory of 3 timesteps, i.e., 30 minutes, may seem a bit short compared to the somewhatlong periods of stability.

Theoretical aspects

VLMC with covariates have been introduced inVariable length Markov chainwith exogenous covariates. The core idea is to enable theconditional probabilities of the next state given the context to dependon exogenous covariates.

Variable memory

A COVLMC models a pair of sequences. The main sequence is denoted\(\mathbf{X}=X_1, X_2, \ldots, X_n,\ldots\). The random variables take values in a finite statespace\(S\) exactly as in a standardVLMC. We havein addition another sequence of random variables\(\mathbf{Y}=Y_1, Y_2, \ldots, Y_n,\ldots\) which take values in\(\mathbb{R}^p\). This latter assumption\(Y_l\in \mathbb{R}^p\) is can berelaxed to more general spaces.

A COVLMC puts restriction on the conditional probabilities of\(X_n\) given the past ofbothsequences. To simplify the presentation, we denote\(X^k_m\) the sequence of random variables\[X_k^m=(X_k, X_{k-1}, \ldots, X_m),\] and we use similar notations for the values taken by thevariables\(x^k_m\), as well as for\(Y^k_m\) and\(y^k_m\). The notation accommodates bothtemporal order if\(k<m\) andreverse temporal if\(k>m\).

A pair of sequences\(\mathbf{X}\)and\(\mathbf{Y}\) is a COVLMC if thereis a maximal order\(l_{\max}\) and afunction\(l\) from\(S^{l_{\max}}\) to\(\{0,\ldots,l_{\max}\}\) such that for all\(n>l_{\max}\)\[\begin{multline}\mathbb{P}(X_n=x_n\mid X^{n-1}_1=x^{n-1}_1, Y^{n-1}_1=y^{n-1}_1)=\\\mathbb{P}\left(X_n=x_n\midX^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}=x^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)},Y^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}=y^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}\right).\end{multline}\] As in VLMC, the\(\mathbf{X}\) process has a finite andvariable memory, but this memory applies to both\(\mathbf{X}\) and\(\mathbf{Y}\). Notice that the memory orderdepends only on\(\mathbf{X}\) and thatno assumptions are made on the temporal behaviour of\(\mathbf{Y}\). Thus a COVLMC on\(\mathbf{X}\) and\(\mathbf{Y}\) is not a VLMC on the pair\((\mathbf{X}, \mathbf{Y})\) (which canmake sense if\(\mathbf{Y}\) isdiscrete).

As for VLMC, the memory length function generates acontextfunction\(c\) which keeps in the pastthe part needed to obtain the conditional distribution:\(c\) is a function from\(S^{l_{\max}}\) to\(\bigcup_{k=0}^{l_{\max}}S^k\) given by\[c(x^{n-1}_{n-l_{\max}})=x^{n-1}_{n-l\left(x^{n-1}_{n-l_{\max}}\right)}\] The image by\(c\) of\(S^{l_{\max}}\) is the set of contexts ofthe COVLMC which is entirely specified by\(l\) and one conditional distribution byunique context.

As the notations are somewhat opaque at first, we can illustrate thedefinition with a simple example. We consider a binary sequence (valuesin\(S=\{0, 1\}\)) and a singlenumerical covariate\(Y_t\in\mathbb{R}\). As in the theoreticalexample invignette("variable-length-markov-chains") weassume that\(l_{\max}=3\) and that\(l\) is given by\[\begin{align*}l(a, b, 0)&=1&\forall a, \forall b,\\l(c, 1, 1)&=2&\forall c,\\l(0, 0, 1)&=3,&\\l(1, 0, 1)&=3.&\\\end{align*}\] In practice the COVLMC is therefore fully described byspecifying the following probabilities:\[\begin{align*}\mathbb{P}(X_t=1\mid& X_{t-1}=0, Y_{t-1}=y_{t-1})\\\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=1, Y_{t-1}=y_{t-1},Y_{t-2}=y_{t-2})\\\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=0, X_{t-3}=0,Y_{t-1}=y_{t-1}, Y_{t-2}=y_{t-2}, Y_{t-3}=y_{t-3})\\\mathbb{P}(X_t=1\mid& X_{t-1}=1, X_{t-2}=0, X_{t-3}=1,Y_{t-1}=y_{t-1}, Y_{t-2}=y_{t-2}, Y_{t-3}=y_{t-3})\end{align*}\]

Conditional distributions

The main difficulty induced by COVLMC compared to VLMC is thespecification of the conditional distributions. Indeed the conditionaldistributions depend on\(\mathbf{Y}\)and cannot simply be given by a table. For instance, in the exampleabove, we need to specify, among others,\(\mathbb{P}(X_t=1\mid X_{t-1}=0,Y_{t_1}=y_{t-1})\), for each all values of\(y_{t-1}\in \mathbb{R}\) (in\(\mathbb{R}^p\) in the general case).

A natural choice in this particular example is to use a logisticmodel, i.e. to assume\[\mathbb{P}(X_t=1\mid X_{t-1}=0,Y_{t_1}=y_{t-1})=g(\alpha^0+y_{t-1}\beta_1^0),\] with\(g(t)=\frac{1}{1+\exp(-t)}\) is the logisticfunction. The superscript on\(\alpha^0\) and\(\beta^0_1\) refer to the context, here\(0\), while the subscript on\(\beta^0_1\) refers to the time delay (here\(1\)). By extension, we have forexample\[\begin{multline*}\mathbb{P}(X_t=1\mid X_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1},Y_{t_2}=y_{t-2})=\\g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1+y_{t-2}\beta^{1,1}_2\right).\end{multline*}\]

More generally, the probability distribution associated to a contextcould be given by any function that maps the values of the covariates toa distribution on\(S\), the statespace. Following the originalpaper,mixvlmc uses multinomial logistic regression as implementedbyVGAM::vglm() ornnet::multinom(), or alogistic regression provided bystats::glm() for statespaces with only 2 states. This has several advantages over a moregeneral solution:

  1. during the estimation phase, one probability distribution isestimated for each relevant context: this could induce a largecomputational burden for more complex models than multinomial logisticones;
  2. having a simple model enables to fit contexts with limited number ofoccurrences which is important to allow searching for long termdependencies;
  3. logistic models with different memory order are easy to compareusing a likelihood-ratio test.

The last point is used inmixvlmc (as in the originalpaper) to simplify local models with respect to the covariates. For acontext of length\(l\), theprobability distribution is assumed to depend on the\(l\) past values of\(\mathbf{Y}\) but in practice we allow adependency to only\(k<l\) pastvalues. For instance, we could have\[\mathbb{P}(X_t=1\mid X_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1},Y_{t_2}=y_{t-2})=g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1\right).\]

Beta-Context algorithm

Estimating a COVLMC model from two time series is more complex thanestimating a VLMC model.mixvlmc implements theBeta-Context algorithm proposed in the originalpaper. It is inspired fromthe context algorithm used for VLMC (and proposed inVariable length Markovchains). It can be summarized as follows:

  1. the first step consists in building a context tree (seevignette("context-trees")) from the\(\mathbf{X}\) discrete sequences, almostexactly as for a VLMC: the only difference is that to be kept in thetree a context must appear a number of times that depends on its length,on the dimension of the covariates and on the number of states. Thisguarantees a minimal number of observations for the (maximum likelihood)estimation of the context dependant multinomial logisticregression.
  2. the second step is an estimation one: a multinomial logisticregression model is estimated for each context, using a number of pastvalues of\(\mathbf{Y}\) equal to thelength of the context.
  3. the rest of the algorithm consists in pruning the context tree andthe logistic models:
    1. leaf contexts are first assessed in terms of model simplification.The likelihood-ratio test is used to decide whether the oldest value of\(\mathbf{Y}\) is relevant or not.Essentially we compare e.g.\(g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1\right)\)to\(g\left(\alpha^{1,1}+y_{t-1}\beta^{1,1}_1+y_{t-2}\beta^{1,1}_2\right)\)as an estimator of\(\mathbb{P}(X_t=1\midX_{t-1}=1, X_{t-2}=1, Y_{t_1}=y_{t-1}, Y_{t_2}=y_{t-2})\);
    2. based on the results of those tests, it may be possible tocompletely remove a context from the tree, see thepaper for details.

This last pruning phase is carried out repeatedly as context removalsturn internal contexts into leaves that can then be furthersimplified.

COVLMC in practice

Estimation

COVLMC estimation is provided by thecovlmc() function.We build first a simple example data set as follows:

set.seed(0)nb_obs<-200covariates<-data.frame(y =runif(nb_obs))x<-0for (kin2:nb_obs) {## we induce a simple dependency to the covariate## and an order 1 memoryif (covariates$y[k-1]<0.5) {if (x[k-1]==0) {      x[k]<-sample(c(0,1),1,prob =c(0.7,0.3))    }else {      x[k]<-sample(c(0,1),1,prob =c(0.3,0.7))    }  }else {if (x[k-1]==0) {      x[k]<-sample(c(0,1),1,prob =c(0.1,0.9))    }else {      x[k]<-sample(c(0,1),1,prob =c(0.5,0.5))    }  }}

Then we estimate a COVLMC as follows:

model<-covlmc(x, covariates)model#> VLMC with covariate context tree on 0, 1#>  cutoff in quantile scale: 0.05#>  Number of contexts: 2#>  Maximum context length: 1

The estimation process is controlled by three parameters:

The default parameters work well on the previous example, as shown bythe obtained model:

draw(model,model ="full",p_value =FALSE)#> *#> +-- 0 ([ (I)    y_1#> |        -2.951 6.236 ])#> '-- 1 ([ (I)   y_1#>          1.719 -1.881 ])

The model has 2 contexts, 0 and 1, as expected. The logistic modelsare described by their parameters. For context 0, the intercept isnegative and the coefficient of\(y_{t-1}\) is positive: the probability ofswitching to 1 is small when\(y_{t-1}\) is small and increases with\(y_{t-1}\). For context 1, the situation isreversed and the effect of\(y_{t-1}\)is smaller. This is consistent with the way the series wereconstructed.

Notice however that obtained an interesting model with the defaultparameters should not be seen as a general property and proper modelchoice must be implemented.

Model choice

As COVLMC estimation fits a potentially large number of logisticmodels to the data, the use of a penalized likelihood approach isrecommended to set its parameters and avoid overfitting.

However, model choice is more complex in the case of the COVLMC thanfor VLMC. In particular, the paircutoff()/prune() does not work as well forCOVLMC than for VLMC (seevignette("variable-length-markov-chains")). Indeed thepruning process of the Beta-Context algorithm is such that its effectscannot be predicted as easily as the ones of the Context algorithm. Inpractice, computing the largestalpha (test level) that isguaranteed to make a minimal but actual pruning of a given COVLMC iseasy. But subsequent cut off values could be misleading. To avoid anyproblem it is therefore recommended to rely on thetune_covlmc() function that has been designed to explorethe full “pruning space” associated to a given data set.

Used on the artificial example above, it gives:

model_tune<-tune_covlmc(x, covariates)model_tune#> VLMC with covariate context tree on 0, 1#>  cutoff in quantile scale: 0.1236#>  Number of contexts: 2#>  Maximum context length: 1#>  Selected by BIC (245.1046) with likelihood function "truncated" (-112.3135)

and

draw(as_covlmc(model_tune),model ="full",p_value =FALSE)#> *#> +-- 0 ([ (I)    y_1#> |        -2.951 6.236 ])#> '-- 1 ([ (I)   y_1#>          1.719 -1.881 ])

The resulting model is the same as the one obtained before but it wasproperly obtained by minimising the BIC.

As fortune_vlmc(), themax_depth parameteris automatically increased to avoid using it inadvertently as aregularisation parameter. However, whilemin_size isgenerally considered as not having a major influence on the modelselection for VLMC, this is not the case for COVLMC. There is currentlyno support inmixvlmc for an automatic choice ofmin_size and one should therefore test several values andcompare the obtained BIC/AIC. In the articial case, we can try forinstancemin_size=2 as follows:

model_tune_2<-tune_covlmc(x, covariates,min_size =2)model_tune_2#> VLMC with covariate context tree on 0, 1#>  cutoff in quantile scale: 0.01195#>  Number of contexts: 2#>  Maximum context length: 1#>  Selected by BIC (242.3184) with likelihood function "truncated" (-112.3135)

andmin_size=10 as follows:

model_tune_10<-tune_covlmc(x, covariates,min_size =10)model_tune_10#> VLMC with covariate context tree on 0, 1#>  cutoff in quantile scale: 0.1236#>  Number of contexts: 2#>  Maximum context length: 1#>  Selected by BIC (245.6423) with likelihood function "truncated" (-112.3135)

Here we see no particular effect of the parameter because of thesimplicity of the problem.

Let us now come back to the electricity consumption example describedabove. We introduce a very basic day/night covariate as follows:

elec_cov<-data.frame(day = (pc_week_15_16$hour>=7& pc_week_15_16$hour<=18))

and fit a COVLMC with

elec_tune<-tune_covlmc(elec_dts, elec_cov)elec_tune#> VLMC with covariate context tree on low, typical, high#>  cutoff in quantile scale: 0.007218#>  Number of contexts: 7#>  Maximum context length: 3#>  Selected by BIC (2243.59) with likelihood function "truncated" (-1064.74)

The model selection process can be represented graphically asfollows:

ggplot(elec_tune$results,aes(x = alpha,y = BIC))+geom_line()+geom_point()

or automatically using e.g. autoplot() as follows:

print(autoplot(elec_tune))

The final model is:

draw(as_covlmc(elec_tune),model ="full",p_value =FALSE,with_state =TRUE)#> *#> +-- low ([ (low)   | (I)#> |          typical | -1.491#> |          high    | -4.487 ])#> '-- typical#> |   +-- low ([ (low)   | (I)#> |   |          typical | 0.4643#> |   |          high    | -1.992 ])#> |   '-- typical#> |   |   +-- low ([ (low)   | (I)#> |   |   |          typical | 0.3054#> |   |   |          high    | -2.234 ])#> |   |   '-- typical ([ (low)   | (I)     day_1TRUE#> |   |   |              typical | 1.839   0.975#> |   |   |              high    | -0.1823 -0.1462   ])#> |   |   '-- high ([ (low)   | (I)#> |   |               typical | 2.773#> |   |               high    | 0.8473 ])#> |   '-- high ([ (low)   | (I)#> |               typical | 3.367#> |               high    | 1.705 ])#> '-- high ([ (typical) | (I)#>             high      | 1.474 ])

It shows some interesting patterns:

Notice finally that for this real world example, themin_size parameter has some influence on the results.Setting it to a smaller value does not change the final model, as shownbelow:

elec_tune_3<-tune_covlmc(elec_dts, elec_cov,min_size =3)elec_tune_3#> VLMC with covariate context tree on low, typical, high#>  cutoff in quantile scale: 0.007218#>  Number of contexts: 7#>  Maximum context length: 3#>  Selected by BIC (2239.393) with likelihood function "truncated" (-1064.74)

However, increasing the parameter to, e.g., 10 generates a simplerbut weaker model as show here:

elec_tune_10<-tune_covlmc(elec_dts, elec_cov,min_size =10)elec_tune_10#> VLMC with covariate context tree on low, typical, high#>  cutoff in quantile scale: 0.03454#>  Number of contexts: 5#>  Maximum context length: 2#>  Selected by BIC (2260.505) with likelihood function "truncated" (-1088.409)

Diagnostics

The package provides numerous ways to analyse a COVLMC. asicfunctions include

For instance, the large model obtained above has the followingcharacteristics:

elec_model<-as_covlmc(elec_tune)states(elec_model)#> [1] low     typical high#> Levels: low typical highdepth(elec_model)#> [1] 3covariate_depth(elec_model)#> [1] 1context_number(elec_model)#> [1] 7

VLMC objects support classical statistical functions such as:

logLik(elec_model)#> 'log Lik.' -1064.74 (df=15)AIC(elec_model)#> [1] 2159.48BIC(elec_model)#> [1] 2243.59

Contexts

The model can be explored in details by drawing its context tree (seevignette("context-trees") for details) as follows:

draw(elec_model)#> * (merging (low and high): 8.121e-230)#> +-- low (0.03454 [ -1.491#> |                  -4.487 ])#> '-- typical (merging (low and high): 8.264e-09)#> |   +-- low (0.8447 [ 0.4643#> |   |                 -1.992 ])#> |   '-- typical (collapsing: 3.961e-10)#> |   |   +-- low (0.2323 [ 0.3054#> |   |   |                 -2.234 ])#> |   |   '-- typical (6.269e-05 [ 1.839   0.975#> |   |   |                        -0.1823 -0.1462 ])#> |   |   '-- high (0.6393 [ 2.773#> |   |                      0.8473 ])#> |   '-- high (0.1638 [ 3.367#> |                      1.705 ])#> '-- high (0.776 [ 1.474 ])

Thedraw.covlmc() function is more advanced than itsVLMC counterpart. It provides more detailed information, particularlyregarding p-values associated with pruning operations. The “merging”p-value corresponds to replacing some of the contexts with a singlejoint model, while the “collapsing” p-value is associated to pruning allthe sub-contexts of a context. P-values associated to specific modelscorrespond to reducing the memory of the corresponding model, that isdiscarding the dependency of the conditional probability towards theoldest covariates.

To explore the contexts in a programmatic way, one should rely on thecontexts() function. COVLMC contexts have additionalcharacteristics compared to VLMC and context trees. In particular, thecontexts() function can report the model associated to eachcontext, either by its parameters only:

contexts(elec_model,model ="coef")
contextcoef
low-1.49102….
low, typical0.464305….
low, typ….0.305381….
typical,….1.839226….
high, ty….2.772588….
high, ty….3.367295….
high1.473892….

or using the models themselves:

contexts(elec_model,model ="full")
contextmodel
low<S4 class ‘vglm’ [package “VGAM”] with 37slots>
low, typical<S4 class ‘vglm’ [package “VGAM”] with 37slots>
low, typ….<S4 class ‘vglm’ [package “VGAM”] with 37slots>
typical,….<S4 class ‘vglm’ [package “VGAM”] with 37slots>
high, ty….<S4 class ‘vglm’ [package “VGAM”] with 37slots>
high, ty….<S4 class ‘vglm’ [package “VGAM”] with 37slots>
high<S4 class ‘vglm’ [package “VGAM”] with 37slots>

As for context trees, focused analysis of specific contexts can bedone by requesting actx_node_covlmc representation of thecontext of interest, for instance via thefind_sequence.covlmc() function, or with the default resulttype ofcontexts()

ctxs<-contexts(elec_model)ctxs#> Contexts:#>  low#>  low, typical#>  low, typical, typical#>  typical, typical, typical#>  high, typical, typical#>  high, typical#>  high

Individual contexts can be analysed using a collection of dedicatedfunctions, for instance

model(ctxs[[2]])#>            [,1]#> [1,]  0.4643056#> [2,] -1.9924301model(ctxs[[3]],type ="full")#>#> Call:#> VGAM::vglm(formula = target ~ 1, family = VGAM::multinomial(refLevel = 1),#>     data = mm, control = VGAM::vglm.control(maxit = options()[["mixvlmc.maxit"]]),#>     model = FALSE, x.arg = FALSE, y.arg = FALSE)#>#>#> Coefficients:#> (Intercept):1 (Intercept):2#>     0.3053816    -2.2335922#>#> Degrees of Freedom: 138 Total; 136 Residual#> Residual deviance: 114.655#> Log-likelihood: -57.32751#>#> This is a multinomial logit model with 3 levels

Seecontexts.covlmc() for details.


[8]ページ先頭

©2009-2025 Movatter.jp