Movatterモバイル変換

Type:

Package

Title:

Variable Length Markov Chains with Covariates

Version:

0.2.2

Description:

Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates.

License:

GPL (≥ 3)

URL:

https://github.com/fabrice-rossi/mixvlmc,https://fabrice-rossi.github.io/mixvlmc/

BugReports:

https://github.com/fabrice-rossi/mixvlmc/issues

Encoding:

UTF-8

LazyData:

true

Imports:

assertthat, butcher, ggplot2, methods, nnet, pROC, Rcpp (≥1.0.8.3), rlang, stats, stringr, VGAM, withr

LinkingTo:

Rcpp

RoxygenNote:

7.3.2

Suggests:

data.table, foreach, geodist, knitr, rmarkdown, testthat (≥3.0.0), tibble, vdiffr, waldo

Config/testthat/edition:

Config/testthat/parallel:

true

Config/testthat/start-first:

covlmc*

Depends:

R (≥ 2.10)

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-05-26 11:53:23 UTC; fabrice

Author:

Fabrice Rossi

[aut, cre, cph], Hugo Le Picard

[ctb], Guénolé Joubioux [ctb]

Maintainer:

Fabrice Rossi <Fabrice.Rossi@apiacoa.org>

Repository:

CRAN

Date/Publication:

2025-05-26 12:30:01 UTC

mixvlmc: Variable Length Markov Chains with Covariates

Description

Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999)doi:10.1214/aos/1018031204 for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022)doi:10.1111/jtsa.12615 for VLMC with covariates.

Package options

Mixvlmc uses the followingoptions():

mixvlmc.maxit: maximum number of iterations in model fitting forcovlmc()
mixvlmc.predictive: specifies the computing engine used for model fittingforcovlmc(). Two values are supported:
- "glm" (default value):covlmc() usesstats::glm() with a binomiallink (stats::binomial()) for a two values state space, andVGAM::vglm()with a multinomial link (VGAM::multinomial()) for a state space withthree or more values;
- "multinom":covlmc() usesnnet::multinom() in all cases.
The first option"glm" is recommended as bothstats::glm() andVGAM::vglm()are able to detect and deal with degeneracy in the data set.
mixvlmc.backend: specifies the implementation used for the context treeconstruction inctx_tree(),vlmc() andtune_vlmc(). Two values aresupported:
- "R" (default value): this corresponds to the original almost pure Rimplementation.
- "C++": this corresponds to the experimental C++ implementation. Thisversion is significantly faster than the R version, but is stillconsidered experimental.

Author(s)

Maintainer: Fabrice RossiFabrice.Rossi@apiacoa.org (ORCID) [copyright holder]

Other contributors:

Hugo Le Picardlepicardhugo@gmail.com (ORCID) [contributor]
Guénolé Joubiouxguenole.joubioux@gmail.com [contributor]

Convert an object to a Variable Length Markov Chain with covariates (coVLMC)

Description

This generic function converts an object into a covlmc.

Usage

as_covlmc(x, ...)## S3 method for class 'tune_covlmc'as_covlmc(x, ...)

Arguments

x

an object to convert into a covlmc.

...

additional arguments for conversion functions.

Value

a covlmc

Examples

## conversion from the results of tune_covlmcpc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov)dts_best_model <- as_covlmc(dts_best_model_tune)draw(dts_best_model)

Extract the sequence encoded by a node

Description

This function returns the sequence represented by thenode object.

Usage

as_sequence(node, reverse)

Arguments

node

actx_node object as returned byfind_sequence()

reverse

specifies whether the sequence should be reported in reversetemporal order (TRUE) or in the temporal order (FALSE). Defaults to theorder associated to thectx_node which is determined by the parameters ofthe call tocontexts() orfind_sequence().

Value

the sequence represented by thenode object, a vector

Examples

dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)res <- find_sequence(dts_tree, "A")as_sequence(res)

Convert an object to a Variable Length Markov Chain (VLMC)

Description

This generic function converts an object into a vlmc.

Usage

as_vlmc(x, ...)## S3 method for class 'ctx_tree'as_vlmc(x, alpha, cutoff, ...)## S3 method for class 'tune_vlmc'as_vlmc(x, ...)

Arguments

x

an object to convert into a vlmc.

...

additional arguments for conversion functions.

alpha

cut off parameter applied during the conversion, quantile scale(if specified)

cutoff

cut off parameter applied during the conversion, native scale(if specified)

Details

This function converts a context tree into a VLMC. Ifalpha orcutoff is specified, it is used to reduce the complexity of the tree as ina direct call tovlmc() (prune()).

Value

a vlmc

Examples

## conversion from a context treedts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)draw(dts_ctree)dts_vlmc <- as_vlmc(dts_ctree)class(dts_vlmc)draw(dts_vlmc)## conversion from the result of tune_vlmcdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)tune_resultdts_best_vlmc <- as_vlmc(tune_result)draw(dts_best_vlmc)

Convert an object to a Variable Length Markov Chain (VLMC)

Description

This generic function converts an object into a vlmc.

Usage

## S3 method for class 'ctx_tree_cpp'as_vlmc(x, alpha, cutoff, ...)

Arguments

x

an object to convert into a vlmc.

alpha

cut off parameter applied during the conversion, quantile scale(if specified)

cutoff

cut off parameter applied during the conversion, native scale(if specified)

...

additional arguments for conversion functions.

Details

This function converts a context tree into a VLMC. Ifalpha orcutoff is specified, it is used to reduce the complexity of the tree as ina direct call tovlmc() (prune()).

Value

a vlmc

Examples

## conversion from a context treedts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")draw(dts_ctree)dts_vlmc <- as_vlmc(dts_ctree)class(dts_vlmc)draw(dts_vlmc)

Create a complete ggplot for the results of automatic COVLMC complexityselection

Description

This function prepares a plot of the results oftune_covlmc() usingggplot2. The result can be passed toprint() to display the result.

Usage

## S3 method for class 'tune_covlmc'autoplot(object, ...)

Arguments

object

atune_covlmc object

...

additional parameters (not used currently)

Details

The graphical representation proposed by this function is complete, while theone produced byplot.tune_covlmc() is minimalistic. We use here thefaceting capabilities of ggplot2 to combine on a single graphicalrepresentation the evolution of multiple characteristics of the VLMC duringthe pruning process, whileplot.tune_covlmc() shows only the selectioncriterion or the log likelihood. Each facet of the resulting plot shows aquantity as a function of the cut off expressed in quantile or native scale.

Value

a ggplot object

Examples

pc <- powerconsumption[powerconsumption$week %in% 10:12, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov, criterion = "AIC")covlmc_plot <- ggplot2::autoplot(dts_best_model_tune)print(covlmc_plot)

Create a complete ggplot for the results of automatic VLMC complexityselection

Description

This function prepares a plot of the results oftune_vlmc() using ggplot2.The result can be passed toprint() to display the result.

Usage

## S3 method for class 'tune_vlmc'autoplot(object, cutoff = c("quantile", "native"), ...)

Arguments

object

atune_vlmc object

cutoff

the scale used for the cut off criterion (default "quantile")

...

additional parameters (not used currently)

Details

The graphical representation proposed by this function is complete, while theone produced byplot.tune_vlmc() is minimalistic. We use here the facetingcapabilities of ggplot2 to combine on a single graphical representation theevolution of multiple characteristics of the VLMC during the pruning process,whileplot.tune_vlmc() shows only the selection criterion or the loglikelihood. Each facet of the resulting plot shows a quantity as a functionof the cut off expressed in quantile or native scale.

Value

a ggplot object

Examples

pc <- powerconsumption[powerconsumption$week %in% 10:11, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_best_model_tune <- tune_vlmc(dts, criterion = "BIC")vlmc_plot <- ggplot2::autoplot(dts_best_model_tune)print(vlmc_plot)## simple post customisationprint(vlmc_plot + ggplot2::geom_point())

Find the children nodes of a node in a context tree

Description

This function returns a list (possibly empty) ofctx_node objects. Eachobject represents one of the children of the node represented by thenodeparameter.

Usage

children(node)## S3 method for class 'ctx_node'children(node)## S3 method for class 'ctx_node_cpp'children(node)

Arguments

node

actx_node object as returned byfind_sequence()

Details

Each node of a context tree represents a sequence. Whenfind_sequence() iscalled with success, the returned object represents the corresponding node inthe context tree. If this node has no child, the present function returns anempty list. When the node has at least one child, the function returns a listwith one value for each element in the state space (seestates()). Thevalue isNULL if the corresponding child is empty, while it is actx_nodeobject when the child is present. Eachctx_node object is associated to thesequence obtained by adding to the past of the sequence represented bynodean observation of the associated state (this corresponds to an extension tothe left of the sequence in temporal order).

Value

a list ofctx_node objects, see details.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)ctx_00 <- find_sequence(dts_ctree, c(0, 0))## this context can only be extended in the past by 1:children(ctx_00)ctx_10 <- find_sequence(dts_ctree, c(1, 0))## this context can be extended by both stateschildren(ctx_10)## C++ backenddts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")ctx_00 <- find_sequence(dts_ctree, c(0, 0))## this context can only be extended in the past by 1:children(ctx_00)ctx_10 <- find_sequence(dts_ctree, c(1, 0))## this context can be extended by both stateschildren(ctx_10)

Number of contexts of a context tree

Description

This function returns the number of distinct contexts in a context tree.

Usage

context_number(ct)

Arguments

ct

a context tree.

Value

the number of contexts of the tree.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)# should be 8context_number(dts_ctree)

Contexts number of a VLMC with covariates

Description

This function returns the total number of contexts of a VLMC with covariates.

Usage

## S3 method for class 'covlmc'context_number(ct)

Arguments

ct

a fitted covlmc model.

Value

the number of contexts present in the VLMC with covariates.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)# should be 3context_number(m_cov)

Contexts of a context tree

Description

This function extracts from a context tree a description of all of itscontexts.

Usage

contexts(ct, sequence = FALSE, reverse = FALSE, ...)

Arguments

ct

a context tree.

sequence

ifTRUE the function returns its results as adata.frame,ifFALSE (default) as a list ofctx_node objects. (see details)

reverse

logical (defaults toFALSE). See details.

...

additional arguments for the contexts function.

Details

The default behaviour consists in returning a list of all the contextscontained in the tree usingctx_node objects (as returned by e.g.find_sequence()) (withtype="list"). The properties of the contexts canthen be explored using adapted functions such ascounts() andpositions(). The result list is of classcontexts. Whensequence=TRUE,the method returns a data.frame whose first column, namedcontext, containsthe contexts as vectors (i.e. the value returned byas_sequence() appliedto actx_node object). Other columns contain context specific values whichdepend on the actual class of the tree and on additional parameters. In allimplementations ofcontexts(), setting the additional parameters to any nodefault value leads to adata.frame result.

Value

A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.

State order in a context

Notice that contexts are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. Set reverse toTRUE for the reverseconvention which is somewhat easier to relate to the way the context treesare represented bydraw() (i.e. recent values at the top the tree).

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)contexts(dts_tree)contexts(dts_tree, TRUE, TRUE)

Contexts of a VLMC with covariates

Description

This function returns the different contexts present in a VLMC withcovariates, possibly with some associated data.

Usage

## S3 method for class 'covlmc'contexts(  ct,  sequence = FALSE,  reverse = FALSE,  frequency = NULL,  positions = FALSE,  local = FALSE,  metrics = FALSE,  model = NULL,  hsize = FALSE,  merging = FALSE,  ...)

Arguments

ct

a fitted covlmc model.

sequence

ifTRUE the function returns its results as adata.frame,ifFALSE (default) as a list ofctx_node objects. (see details)

reverse

logical (defaults toFALSE). See details.

frequency

specifies the counts to be included in the resultdata.frame. The default value ofNULL does not include anything."total" gives the number of occurrences of each context in the originalsequence."detailed" includes in addition the break down of theseoccurrences into all the possible states.

positions

logical (defaults to FALSE). Specify whether the positionsof each context in the time series used to build the context tree should bereported in apositions column of the result data frame. The availabilityof the positions depends on the way the context tree was built. See detailsfor the definition of a position.

local

specifies how the counts reported byfrequency are computed.Whenlocal isFALSE (default value) the counts include both counts thatare specific to the context (if any) and counts from the descendants of thecontext in the tree. Whenlocal isTRUE the counts include only thenumber of times the context appears without being the last part of a longercontext.

metrics

if TRUE, adds predictive metrics for each context (seemetrics() for the definition of predictive metrics).

model

specifies whether to include the model associated to a eachcontext. The default result withmodel=NULL does not include any model.Settingmodel to"coef" adds the coefficients of the models in acoefcolumn, while"full" include the models themselves (as R objects) in amodel column.

hsize

if TRUE, adds ahsize column to the result data frame thatgives for each context the size of the history of covariates used by themodel.

merging

if TRUE, adds amerged column to the result data frame. Fora normal context, the value ofmerged is FALSE. Contexts that share thesame model have a TRUEmerged value.

...

additional arguments for the contexts function.

Details

The default behaviour of the function is to return a list of all thecontexts usingctx_node_covlmc objects (as returned byfind_sequence.covlmc()). The properties of the contexts can then beexplored using adapted functions such ascounts(),covariate_memory(),cutoff.ctx_node(),metrics.ctx_node(),model(),merged_with() andpositions().

Whensequence=TRUE the method returns a data.frame whose first column,namedcontext, contains the contexts as vectors (i.e. the value returnedbyas_sequence() applied to actx_node object). Other columns containcontext specific values specified by the additional parameters. Setting anyof those parameters to a value that ask for reporting information willtoggle the result type of the function todata.frame.

Seecontexts.ctx_tree() for details about thefrequency parameter. Whenmodel is nonNULL, the resultingdata.frame contains the modelsassociated to each context (either the full R model or its coefficients).Other columns are added is the corresponding parameters are set toTRUE.

Value

A list of classcontexts containing the contexts represented inthis tree (asctx_node_covlmc) or a data.frame.

Positions

A position of a contextctx in the time seriesx isan index valuet such that the context ends withx[t]. Thusx[t+1] isafter the context. For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (instandard state order), then the position ofctx inx is 3.

State order in a context

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(0, median(pc$active_power), max(pc$active_power))dts <- cut(pc$active_power, breaks = breaks)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)## direct representation with ctx_node_covlmc objectsm_cov_ctxs <- contexts(m_cov)m_cov_ctxssapply(m_cov_ctxs, covariate_memory)sapply(m_cov_ctxs, is_merged)sapply(m_cov_ctxs, model)## data.frame interfacecontexts(m_cov, model = "coef")contexts(m_cov, model = "full", hsize = TRUE)

Contexts of a context tree

Description

This function extracts from a context tree a description of all of itscontexts.

Usage

## S3 method for class 'ctx_tree'contexts(  ct,  sequence = FALSE,  reverse = FALSE,  frequency = NULL,  positions = FALSE,  ...)## S3 method for class 'ctx_tree_cpp'contexts(  ct,  sequence = FALSE,  reverse = FALSE,  frequency = NULL,  positions = FALSE,  ...)

Arguments

ct

a context tree.

sequence

ifTRUE the function returns its results as adata.frame,ifFALSE (default) as a list ofctx_node objects. (see details)

reverse

logical (defaults toFALSE). See details.

frequency

positions

...

additional arguments for the contexts function.

Details

The default behaviour of the function is to return a list of all thecontexts usingctx_node objects (as returned byfind_sequence()). Theproperties of the contexts can then be explored using adapted functionssuch ascounts() andpositions().

Iffrequency="total", an additional column namedfreq gives the numberof occurrences of each context in the series used to build the tree. Iffrequency="detailed", one additional column is added per state in thecontext space. Each column records the number of times a given context isfollowed by the corresponding value in the original series.

Value

A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.

Positions

State order in a context

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)## direct representation with ctx_node objectscontexts(dts_tree)## data.frame formatcontexts(dts_tree, sequence = TRUE)contexts(dts_tree, frequency = "total")contexts(dts_tree, frequency = "detailed")

Contexts of a VLMC

Description

This function extracts all the contexts from a fitted VLMC, possibly withsome associated data.

Usage

## S3 method for class 'vlmc'contexts(  ct,  sequence = FALSE,  reverse = FALSE,  frequency = NULL,  positions = FALSE,  local = FALSE,  cutoff = NULL,  metrics = FALSE,  ...)## S3 method for class 'vlmc_cpp'contexts(  ct,  sequence = FALSE,  reverse = FALSE,  frequency = NULL,  positions = FALSE,  local = FALSE,  cutoff = NULL,  metrics = FALSE,  ...)

Arguments

ct

a context tree.

sequence

ifTRUE the function returns its results as adata.frame,ifFALSE (default) as a list ofctx_node objects. (see details)

reverse

logical (defaults toFALSE). See details.

frequency

positions

local

cutoff

specifies whether to include the cut off value associated toeach context (seecutoff() andprune()). The default result withcutoff=NULL does not include those values. Settingcutoff toquantileadds the cut off values in quantile scale, whilecutoff="native" addsthem in the native scale. The returned values are directly based on the loglikelihood ratio computed in the context tree and are not modified toensure pruning (as whencutoff() is called byraw=TRUE).

metrics

if TRUE, adds predictive metrics for each context (seemetrics() for the definition of predictive metrics).

...

additional arguments for the contexts function.

Details

Thefrequency parameter is described in details in the documentation ofcontexts.ctx_tree(). Whencutoff is nonNULL, the resultingdata.frame contains acutoff column with the cut off values, either inquantile or in native scale. Seecutoff.vlmc() andprune.vlmc() for thedefinitions of cut off values and of the two scales.

Value

A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.

Cut off values

The cut off values reported bycontexts.vlmc canbe different from the ones reported bycutoff.vlmc() for three reasons:

cutoff.vlmc() reports only useful cut off values, i.e., cut off valuesthat should induce a simplification of the VLMC when used inprune().This exclude cut off values associated to simple contexts that are smallerthan the ones of their descendants in the context tree. Those values arereported bycontext.vlmc.
context.vlmc reports only cut off values of actual contexts, whilecutoff.vlmc() reports cut off values for all nodes of the context tree.
values are not modified to induce pruning, contrarily to the defaultbehaviour ofcutoff.vlmc()

Positions

State order in a context

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)model <- vlmc(dts, alpha = 0.5)## direct representation with ctx_node objectsmodel_ctxs <- contexts(model)model_ctxssapply(model_ctxs, cutoff, scale = "quantile")sapply(model_ctxs, cutoff, scale = "native")sapply(model_ctxs, function(x) metrics(x)$accuracy)## data.frame formatcontexts(model, frequency = "total")contexts(model, cutoff = "quantile")contexts(model, cutoff = "native", metrics = TRUE)

Report the distribution of values that follow occurrences of a sequence

Description

This function reports the number of occurrences of the sequence representedbynode in the original time series used to build the associated contexttree (not including a possible final occurrence not followed by any value atthe end of the original time series). In addition iffrequency=="detailed",the function reports the frequencies of each of the possible value of thetime series when they appear just after the sequence.

Usage

counts(node, frequency = c("detailed", "total"), local = FALSE)## S3 method for class 'ctx_node'counts(node, frequency = c("detailed", "total"), local = FALSE)## S3 method for class 'ctx_node_cpp'counts(node, frequency = c("detailed", "total"), local = FALSE)

Arguments

node

actx_node object as returned byfind_sequence()

frequency

specifies the counts to be included in the result."total"gives the number of occurrences of the sequence in the original sequence."detailed" includes in addition the break down of these occurrences intoall the possible states.

local

specifies how the counts are computed. Whenlocal isFALSE(default value) the counts include both counts that are specific to thecontext (if any) and counts from the descendants of the context in thetree. Whenlocal isTRUE the counts include only the number of timesthe context appears without being the last part of a longer context.

Value

either an integer whenfrequency="total" which gives the totalnumber of occurrences of the sequence represented bynode or adata.frame with atotal column with the same value and a column foreach of the possible value of the original time series, reporting counts ineach column (see the description above).

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)subseq <- find_sequence(dts_tree, factor(c("A", "A"), levels = c("A", "B", "C")))if (!is.null(subseq)) {  counts(subseq)}

Maximal covariate memory of a VLMC with covariates

Description

This function return the longest covariate memory used by a VLMCwith covariates.

Usage

covariate_depth(model)

Arguments

model

a covlmc object

Value

the longest covariate memory of this model

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))m_nocovariate <- vlmc(dts)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)covariate_depth(m_cov)

Covariate memory length for a COVLMC context

Description

This function returns the length of the memory of a COVLMC context representedby actx_node_covlmc object.

Usage

covariate_memory(node)

Arguments

node

Actx_node_covlmc object as returned byfind_sequence() orcontexts.covlmc()

Value

the memory length, an integer

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)ctxs <- contexts(m_cov)## get all the memory lengthssapply(ctxs, covariate_memory)

Fit a Variable Length Markov Chain with Covariates (coVLMC)

Description

This function fits a Variable Length Markov Chain with covariates (coVLMC)to a discrete time series coupled with a time series of covariates.

Usage

covlmc(  x,  covariate,  alpha = 0.05,  min_size = 5L,  max_depth = 100L,  keep_data = TRUE,  control = covlmc_control(...),  ...)

Arguments

x

a discrete time series; can be numeric, character, factor or logical.

covariate

a data frame of covariates.

alpha

number in (0,1) (default: 0.05) cut off value in the pruningphase (in quantile scale).

min_size

number >= 1 (default: 5). Tune the minimum number ofobservations for a context in the growing phase of the context tree (seebelow for details).

max_depth

integer >= 1 (default: 100). Longest context considered ingrowing phase of the context tree.

keep_data

logical (defaults toTRUE). IfTRUE, the original dataare stored in the resulting object to enable post pruning (seeprune.covlmc()).

control

a list with control parameters, seecovlmc_control().

...

arguments passed tocovlmc_control().

Details

The model is built using the algorithm described in Zanin Zambom et al. Asfor thevlmc() approach, the algorithm builds first a context tree (seectx_tree()). Themin_size parameter is used to compute the actual numberof observations per context in the growing phase of the tree. It is computedasmin_size*(1+ncol(covariate)*d)*(s-1) whered is the length of thecontext (a.k.a. the depth in the tree) ands is the number of states. Thiscorresponds to ensuring min_size observations per parameter of the logisticregression during the estimation phase.

Then logistic models are adjusted in the leaves at the tree: the goal of eachlogistic model is to estimate the conditional distribution of the next stateof the times series given the context (the recent past of the time series)and delayed versions of the covariates. A pruning strategy is used tosimplified the models (mainly to reduce the time window associated to thecovariates) and the tree itself.

Parameters specified bycontrol are used to fine tune the behaviour of thealgorithm.

Value

a fitted covlmc model.

Logistic models

By default,covlmc uses two different computingengines for logisticmodels:

when the time series has only two states,covlmc usesstats::glm()with a binomial link (stats::binomial());
when the time series has at least threestates,covlmc useVGAM::vglm() with a multinomial link(VGAM::multinomial()).

Both engines are able to detect degenerate cases and lead to more robustresults that usingnnet::multinom(). It is nevertheless possible toreplacestats::glm() andVGAM::vglm() withnnet::multinom() by settingthe global optionmixvlmc.predictive to"multinom" (the default value is"glm"). Notice that while results should be comparable, there is noguarantee that they will be identical.

References

Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann.Statist. 27 (2) 480-513doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chainwith exogenous covariates." J. Time Ser. Anal., 43 (2)312-328doi:10.1111/jtsa.12615

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 15)draw(m_cov)withr::with_options(  list(mixvlmc.predictive = "multinom"),  m_cov_nnet <- covlmc(dts, dts_cov, min_size = 15))draw(m_cov_nnet)

Control for coVLMC fitting

Description

This function creates a list with parameters used to fine tune the coVLMCfitting algorithm.

Usage

covlmc_control(pseudo_obs = 1)

Arguments

pseudo_obs

number of fake observations of each state to add to theobserved ones.

Details

pseudo_obs is used to regularize the probability estimations when acontext is only observed followed by always the same state. Transitionprobabilities are computed after addingpseudo_obs pseudo observationsof each of the states (including the observed one). This corresponds to aBayesian posterior mean estimation with a Dirichlet prior.

Value

a list.

Examples

dts <- rep(c(0, 1), 100)dts_cov <- data.frame(y = rep(0, length(dts)))default_model <- covlmc(dts, dts_cov)contexts(default_model, type = "data.frame", model = "coef")$coefcontrol <- covlmc_control(pseudo_obs = 10)model <- covlmc(dts, dts_cov, control = control)contexts(model, type = "data.frame", model = "coef")$coef

Build a context tree for a discrete time series

Description

This function builds a context tree for a time series.

Usage

ctx_tree(  x,  min_size = 2L,  max_depth = 100L,  keep_position = TRUE,  backend = getOption("mixvlmc.backend", "R"))

Arguments

x

a discrete time series; can be numeric, character, factor orlogical.

min_size

integer >= 1 (default: 2). Minimum number of observations fora context to be included in the tree.

max_depth

integer >= 1 (default: 100). Maximum length of a context tobe included in the tree.

keep_position

logical (default: TRUE). Should the context tree keepthe position of the contexts.

backend

"R" or "C++" (default: as specified by the "mixvlmc.backend"option). Specifies the implementation used to represent the context treeand to built it. See details.

Details

The tree represents all the sequences of symbols/states of length smallerthanmax_depth that appear at leastmin_size times in the time series andstores the frequencies of the states that follow each context. Optionally,the positions of the contexts in the time series can be stored in the tree.

Value

a context tree (of class that inherits fromctx_tree).

Back ends

Two back ends are available to compute context trees:

the "R" back end represents the tree in pure R data structures (nested lists)that be easily processed further in pure R (C++ helper functions are usedto speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end isconsidered experimental. The tree is built with an optimised suffix treealgorithm which speeds up the construction by at least a factor 10 instandard settings. As the tree is kept outside of R direct reach, contexttrees built with the C++ back end must be restored after asaveRDS()/readRDS() sequence. This is done automatically by recomputingcompletely the context tree.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)## get all contexts of length 2dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)draw(dts_ctree)

Cut off values for VLMC like model

Description

This generic function returns one or more cut off values that are guaranteedto have an effect on themodel passed to the function when a simplificationprocedure is applied (in general a tree pruning operation as provided byprune()).

Usage

cutoff(model, ...)

Arguments

model

a model.

...

additional arguments for the cutoff function implementations

Details

The exact definition of what is a cut off value depends on the model type andis documented in concrete implementation of the function.

Value

a cut off value or a vector of cut off values.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)model_cuts <- cutoff(model)model_2 <- prune(model, model_cuts[2])draw(model_2)

Cut off values for pruning the context tree of a VLMC with covariates

Description

This function returns all the cut off values that should induce a pruning ofthe context tree of a VLMC with covariates.

Usage

## S3 method for class 'covlmc'cutoff(model, raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)

Arguments

model

a fitted COVLMC model.

raw

specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails)

tolerance

specify the minimum separation between two consecutivevalues of the cut off in native mode (before any transformation). Seedetails.

...

additional arguments for thecutoff function.

Details

Notice that the list of cut off values returned by the function is not ascomplete as the one computed for a VLMC without covariates. Indeed, pruningthe COVLMC tree creates new pruning opportunities that are not evaluatedduring the construction of the initial model, while all pruning opportunitiesare computed during the construction of a VLMC context tree. Nevertheless,the largest value returned by the function is guaranteed to produce the leastpruned tree consistent with the reference one.

For large COVLMC, some cut off values can be almost identical, with adifference of the order of the machine epsilon value. Thetoleranceparameter is used to keep only values that are different enough. This is donein the quantile scale, before transformations implemented whenraw isFALSE.

Notice that the loglikelihood scale is not directly useful in COVLMC as thedifferences in model sizes are not constant through the pruning process. As aconsequence, this function does not providemode parameter, contrarily tocutoff.vlmc().

Settingraw toTRUE removes the small perturbation that are subtractedfrom the log-likelihood ratio values computed from the COVLMC (in quantilescale).

As automated model selection is provided bytune_covlmc(), the direct use ofcutoff should be reserved to advanced exploration of the set of trees thatcan be obtained from a complex one, e.g. to implement model selectiontechniques that are not provided bytune_covlmc().

Value

a vector of cut off values,NULL if none can be computed

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))m_nocovariate <- vlmc(dts)draw(m_nocovariate)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)draw(m_cov)cutoff(m_cov)

Cut off value for pruning a node in the context tree of a VLMC

Description

This function returns the cut off value associated to a specific node in thecontext tree interpreted as a VLMC. The node is represented by actx_nodeobject as returned byfind_sequence() orcontexts(). For details, seecutoff.vlmc().

Usage

## S3 method for class 'ctx_node'cutoff(model, scale = c("quantile", "native"), raw = FALSE, ...)

Arguments

model

actx_node object as returned byfind_sequence()

scale

specify whether the results should be "native" log likelihoodratio values or expressed in a "quantile" scale of a chi-squareddistribution (defaults to "quantile").

raw

specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails incutoff.vlmc())

...

additional arguments for thecutoff function.

Value

a cut off value

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)model_ctxs <- contexts(model)cutoff(model_ctxs[[1]])cutoff(model_ctxs[[2]], scale = "native", raw = TRUE)

Cut off values for pruning the context tree of a VLMC

Description

This function returns a collection of cut off values that are guaranteed toinduce all valid pruned trees of the context tree of a VLMC. Pruning isimplemented by theprune() function.

Usage

## S3 method for class 'vlmc'cutoff(  model,  scale = c("quantile", "native"),  raw = FALSE,  tolerance = .Machine$double.eps^0.5,  ...)## S3 method for class 'vlmc_cpp'cutoff(  model,  scale = c("quantile", "native"),  raw = FALSE,  tolerance = .Machine$double.eps^0.5,  ...)

Arguments

model

a fitted VLMC model.

scale

specify whether the results should be "native" log likelihoodratio values or expressed in a "quantile" scale of a chi-squareddistribution (defaults to "quantile").

raw

specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails)

tolerance

specify the minimum separation between two consecutivevalues of the cut off in native mode (before any transformation). Seedetails.

...

additional arguments for the cutoff function.

Details

By default, the function returns values that can be used directly to inducepruning in the context tree. This is done by computing the log likelihoodratios used by the context algorithm on the reference VLMC and by keeping therelevant ones. From them the function selects intermediate values that areguaranteed to generate via pruning all the VLMC models that could begenerated by using larger values of thecutoff parameter that was used tobuild the reference model (or smaller values of thealpha parameter in"quantile" scale).

Setting theraw parameter toTRUE removes this operation on the valuesand asks the function to return the relevant log likelihood ratios.

For large VLMC, some log likelihood ratios can be almost identical, with adifference of the order of the machine epsilon value. Thetoleranceparameter is used to keep only values that are different enough. This is donein the native scale, before transformations implemented whenraw isFALSE.

As automated model selection is provided bytune_vlmc(), the direct use ofcutoff should be reserved to advanced exploration of the set of trees thatcan be obtained from a complex one, e.g. to implement model selectiontechniques that are not provided bytune_vlmc().

Value

a vector of cut off values.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)model_cuts <- cutoff(model)model_2 <- prune(model, model_cuts[2])draw(model_2)

Depth of a context tree

Description

This function returns the depth of a context tree, i.e. the length of thelongest context represented in the tree.

Usage

depth(ct)

Arguments

ct

a context tree.

Value

the depth of the tree.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)## should be 3depth(dts_ctree)

Text based representation of a context tree

Description

This function 'draws' a context tree as a text.

Usage

draw(ct, control = draw_control(), ...)

Arguments

ct

a context tree.

control

a list of low level control parameters of the textrepresentation. See details anddraw_control().

...

additional arguments for draw.

Details

The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().

In addition to the structure of the context tree,draw can representinformation attached to the node (contexts and partial contexts). This iscontrolled by additional parameters depending on the type of the contexttree.

Value

the context tree (invisibly).

Examples

dts <- sample(c(0, 1), 100, replace = TRUE)ctree <- ctx_tree(dts, min_size = 10, max_depth = 2)draw(ctree)dts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE)ctree_c <- ctx_tree(dts_c, min_size = 10, max_depth = 2)draw(ctree_c, draw_control(root = "x"))

Text based representation of a covlmc model

Description

This function 'draws' a context tree as a text.

Usage

## S3 method for class 'covlmc'draw(  ct,  control = draw_control(),  model = c("coef", "full"),  p_value = TRUE,  digits = 4,  with_state = FALSE,  ...)

Arguments

ct

a fitted covlmc model.

control

a list of low level control parameters of the textrepresentation. See details anddraw_control().

model

this parameter controls the display of logistic modelsassociated to nodes. The defaultmodel="coef" represents the coefficientsof the logistic models associated to each context.model="full" includesthe name of the variables in the representation (see details). Settingmodel=NULL removes the model representations. Additional parameters canbe used to tweak model representations (see details).

p_value

specifies whether the p-values of the likelihood ratio testsconducted during the covlmc construction must be included in therepresentation.

digits

numerical parameters and p-values are represented using thebase::signif function, using the number of significant digits specifiedwith this parameter.

with_state

specifies whether to display the state associated to eachdimension of the logistic model (see details).

...

additional arguments for draw.

Details

The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().

Value

the context tree (invisibly).

Tweaking model representation

Model representations are affected by the following additional parameter:

time_sep: character(s) used to split the coefficients list by blocksassociated to time delays in the covariate inclusion into the logisticmodel. The first block contains the intercept(s), the second block thecovariate values a time t-1, the third block at time t-2, etc.

Variable representation

Whenmodel="full", the representation includes the names of the variablesused by the logistic models. Names are the one generated by the underlyinglogistic model, e.g.stats::glm(). Numerical variable names are used asis, while factors have levels appended. The intercept is denoted(I) tosave space. The time delays are represented by an underscore followed bythe time delay. For instance if the model uses the numerical covariateywith two delays, it will appear as to variablesy_1 andy_2.

State representation

Whenmodel is notNULL, the coefficients of the logistic models arepresented, organized in rows associated to states. One state is used as thereference state and the logistic model aims at predicting the ratio ofprobability between another state and the reference one (in log scale).Whenwith_state isTRUE, the display includes for each row ofcoefficients the target state. This is useful when using e.g.VGAM::vglmas unused levels of the target variable will be automatically dropped fromthe model, leading to a reduce number of rows. The reference state iseither shown on the first row ifmodel is"full" or after the state oneach row ifmodel is"coef".

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)draw(m_cov, digits = 3)draw(m_cov, model = NULL)draw(m_cov, p_value = FALSE)draw(m_cov, p_value = FALSE, time_sep = " | ")draw(m_cov, model = "full", time_sep = " | ")

Text based representation of a context tree

Description

This function 'draws' a context tree as a text.

Usage

## S3 method for class 'ctx_tree_cpp'draw(ct, control = draw_control(), frequency = NULL, ...)## S3 method for class 'ctx_tree'draw(ct, control = draw_control(), frequency = NULL, ...)

Arguments

ct

a context tree.

control

a list of low level control parameters of the textrepresentation. See details anddraw_control().

frequency

this parameter controls the display of node levelinformation in the tree. The defaultNULL value does not includeanything. Settingfrequency to"total" includes the frequency of the(partial) context of the node, while"detailed" includes the frequency ofthe states that follow the context (as incontexts.ctx_tree()).

...

additional arguments for draw.

Details

The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().

Value

the context tree (invisibly).

Examples

dts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE)ctree_c <- ctx_tree(dts_c, min_size = 10, max_depth = 2)draw(ctree_c, frequency = "total")draw(ctree_c, frequency = "detailed")

Text based representation of a vlmc

Description

This function 'draws' a context tree as a text.

Usage

## S3 method for class 'vlmc'draw(ct, control = draw_control(), prob = TRUE, ...)## S3 method for class 'vlmc_cpp'draw(ct, control = draw_control(), prob = TRUE, ...)

Arguments

ct

a fitted vlmc.

control

a list of low level control parameters of the textrepresentation. See details anddraw_control().

prob

this parameter controls the display of node level information inthe tree. The defaultprob=TRUE represents the conditional distributionof the states given the (partial) context associated to the node. Settingprob=FALSE replaces the conditional distribution by the frequency of thestates that follow the context as indraw.ctx_tree(). Settingprob=NULLremoves all additional information.

...

additional arguments for draw.

Details

The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().

Value

the context tree (invisibly).

Examples

dts <- sample(c("A", "B", "C"), 500, replace = TRUE)model <- vlmc(dts, alpha = 0.05)draw(model)draw(model, prob = FALSE)draw(model, prob = NULL)

Control parameters for`draw`

Description

This function returns a list used to fine tune thedraw() function behaviour.

Usage

draw_control(  root = "*",  first_node = "+",  next_node = "'",  vbranch = "|",  hbranch = "--",  open_ct = "(",  close_ct = ")")

Arguments

root

character used for the root node.

first_node

characters used for the first child of a node.

next_node

characters used for other children of a node.

vbranch

characters used to represent a branch in a vertical way.

hbranch

characters used to represent a branch in a horizontal was.

open_ct

characters used to start each node specific text representation.

close_ct

characters used to end each node specific text representation.

Value

a list

Examples

draw_control(open_ct = "[", close_ct = "]")

Find the node of a sequence in a context tree

Description

This function checks whether the sequencectx is represented in the contexttreect. If this is the case, it returns a description of matching node, anobject of classctx_node. If the sequence is not represented in the tree,the function returnNULL.

Usage

find_sequence(ct, ctx, reverse = FALSE, ...)## S3 method for class 'ctx_tree'find_sequence(ct, ctx, reverse = FALSE, ...)## S3 method for class 'ctx_tree_cpp'find_sequence(ct, ctx, reverse = FALSE, ...)

Arguments

ct

a context tree.

ctx

a sequence to search in the context tree

reverse

specifies whether the sequencectx is given thetemporal order (FALSE, default value) or in the reverse temporal order(TRUE). See the dedicated section.

...

additional parameters for the find_sequence function

Details

The function looks for sequences in general. Theis_context() function canbe used on the resulting object to test if the sequence is in addition aproper context.

Value

an object of classctx_node if the sequencectx is representedin the context tree,NULL when this is not the case.

State order in a sequence

sequence are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. In the present function,reverse refersboth to the order used for thectx parameter and for the default order used by the resultingctx_node object.

Examples

dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)find_sequence(dts_tree, "A")## returns NULL as "A" "C" does not appear in dtsfind_sequence(dts_tree, c("A", "C"))

Find the node of a sequence in a COVLMC context tree

Description

This function checks whether the sequencectx is represented in the contexttree of the COVLMC modelct. If this is the case, it returns a descriptionof matching node, an object of classctx_node_covlmc. If the sequence isnot represented in the tree, the function returnNULL.

Usage

## S3 method for class 'covlmc'find_sequence(ct, ctx, reverse = FALSE, ...)

Arguments

ct

a context tree.

ctx

a sequence to search in the context tree

reverse

specifies whether the sequencectx is given thetemporal order (FALSE, default value) or in the reverse temporal order(TRUE). See the dedicated section.

...

additional parameters for the find_sequence function

Details

The function looks for sequences in general. Theis_context() function canbe used on the resulting object to test if the sequence is in addition aproper context.

Value

an object of classctx_node_covlmc if the sequencectx is representedin the context tree,NULL when this is not the case

State order in a sequence

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)## not in the treevals <- states(m_cov)find_sequence(m_cov, c(vals[2], vals[2]))## in the tree but not a contextnode <- find_sequence(m_cov, c(vals[1]))nodeis_context(node)## in the tree and a contextnode <- find_sequence(m_cov, c(vals[1], vals[1]))nodeis_context(node)model(node)

Significant Earthquake Dataset

Description

A data set containing Earthquake that have occured during the period of1900-2022 with GPS coordinates and magnitudes.

Usage

globalearthquake

Format

A data frame with 98785 rows and 12 variables:

date_time: Date and time in POSIXct format
latitude: latitude of the earthquake, from -90° to 90°
longitude: longitude of the earthquake, from -180° to 180°
mag: the magnitude of the earthquake, indicating its strenth
Date: date when the seisme occured
nbweeks: number of weeks since 1900/01/01
year: year
month: month of the year
month_day: day of the month
week: week number
week_day: day of the week from 1 = Sunday to 7 = Saturday
year_day: day of the year from 1 to 366

Details

This is a compiled version of the full data set available onU.S. Geological Survey Earthquake Events(USGS) which is in thepublic domain.

The data set contains only the earthquake between 1900 and 2022with a magnitude higher than 5.

Source

Earthquake Catalog, U.S. Geological Survey, Department of the Interior.https://www.usgs.gov/programs/earthquake-hazards

Report the nature of a node in a context tree

Description

This function returnsTRUE if the node is a proper context,FALSEin the other case.

Usage

is_context(node)

Arguments

node

actx_node object as returned byfind_sequence()

Value

TRUE if the nodenode is a proper context,FALSE when this is not the case

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)draw(dts_ctree)## 0, 0 is a context but 1, 0 is notis_context(find_sequence(dts_ctree, c(0, 0)))is_context(find_sequence(dts_ctree, c(1, 0)))

Test if the object is a covlmc model

Description

This function returnsTRUE for VLMC models with covariates andFALSE for other objects.

Usage

is_covlmc(x)

Arguments

x

an R object.

Value

TRUE for VLMC models with covariates.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)# should be trueis_ctx_tree(m_cov)# should be trueis_covlmc(m_cov)# should be falseis_vlmc(m_cov)

Test if the object is a context tree

Description

This function returnsTRUE for context trees andFALSE for other objects.

Usage

is_ctx_tree(x)

Arguments

x

an R object.

Value

TRUE for context trees.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)is_ctx_tree(dts_ctree)is_ctx_tree(dts)

Merging status of a COVLMC context

Description

The function returnsTRUE if the context represented by this node is mergedwith at least another one andFALSE if this is not the case.

Usage

is_merged(node)

Arguments

node

Actx_node_covlmc object as returned byfind_sequence() orcontexts.covlmc()

Details

When a COVLMC is built on a time series with at least three distinct states,some contexts can be merged: they use the same logistic model, leading to amore parsimonious model. Those contexts are reported individually byfunctions such ascontexts.covlmc(). The present function can be usedto detect such merging, whilemerged_with() can be used to recover theother contexts.

Value

TRUE or FALSE, depending on the nature of the context

Examples

pc <- powerconsumption[powerconsumption$week == 15, ]dts <- cut(pc$active_power, breaks = c(0, 1, 2, 3, 8))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, alpha = 0.1)ctxs <- contexts(m_cov)## no mergingsapply(ctxs, is_merged)

Report the ordering convention of the node

Description

This function returnsTRUE if the node is using a reverse temporal orderingandFALSE in the other case.

Usage

is_reversed(node)

Arguments

node

actx_node object as returned byfind_sequence()

Value

TRUE if the nodenode use a reverse temporal ordering,FALSEwhen this is not the case

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)is_reversed(find_sequence(dts_ctree, c(0, 0)))is_reversed(find_sequence(dts_ctree, c(1, 0), reverse = TRUE))

Test if the object is a vlmc model

Description

This function returnsTRUE for VLMC models andFALSE for other objects.

Usage

is_vlmc(x)

Arguments

x

an R object.

Value

TRUE for VLMC models.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)# should be trueis_ctx_tree(model)# should be trueis_vlmc(model)# should be falseis_covlmc(model)

Log-Likelihood of a VLMC with covariates

Description

This function evaluates the log-likelihood of a VLMC with covariatesfitted on a discrete time series.

Usage

## S3 method for class 'covlmc'logLik(object, initial = c("truncated", "specific", "extended"), ...)

Arguments

object

the covlmc representation.

initial

specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Defaults to"truncated". Seeloglikelihood() for details.

...

additional parameters for logLik.

Value

an object of classlogLik. This is a number, the log-likelihood ofthe (CO)VLMC with the following attributes:

df: the number of parameters used by the VLMC for this likelihood calculation
nobs: the number of observations included in this likelihood calculation
initial: the value of theinitial parameter used to compute this likelihood

Examples

## Likelihood for a fitted VLMC with covariates.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)ll <- logLik(m_cov)attributes(ll)

Log-Likelihood of a VLMC

Description

This function evaluates the log-likelihood of a VLMC fitted on a discretetime series.

Usage

## S3 method for class 'vlmc'logLik(object, initial = c("truncated", "specific", "extended"), ...)## S3 method for class 'vlmc_cpp'logLik(object, initial = c("truncated", "specific", "extended"), ...)

Arguments

object

the vlmc representation.

initial

...

additional parameters for logLik.

Value

an object of classlogLik. This is a number, the log-likelihood ofthe (CO)VLMC with the following attributes:

df: the number of parameters used by the VLMC for this likelihood calculation
nobs: the number of observations included in this likelihood calculation
initial: the value of theinitial parameter used to compute this likelihood

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)m_nocovariate <- vlmc(dts)ll <- logLik(m_nocovariate)llattributes(ll)

Log-Likelihood of a VLMC

Description

This function evaluates the log-likelihood of a VLMC fitted on a discrete time series.When the optional argumentnewdata is provided, the function evaluates instead thelog-likelihood for this (new) discrete time series.

Usage

loglikelihood(  vlmc,  newdata,  initial = c("truncated", "specific", "extended"),  ignore,  ...)## S3 method for class 'vlmc'loglikelihood(  vlmc,  newdata,  initial = c("truncated", "specific", "extended"),  ignore,  ...)## S3 method for class 'vlmc_cpp'loglikelihood(  vlmc,  newdata,  initial = c("truncated", "specific", "extended"),  ignore,  ...)

Arguments

vlmc

the vlmc representation.

newdata

an optional discrete time series.

initial

specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated are integratedin the likelihood. Defaults to"truncated". See below for details.

ignore

specifies the number of initial values for which the loglikelihoodwill not be computed. The minimal number depends on the likelihood function asdetailed below.

...

additional parameters for loglikelihood.

Details

The definition of the likelihood function depends on the value of theinitial parameters, see the section below as well as the dedicatedvignette:vignette("likelihood", package = "mixvlmc").

For VLMC objects, the methodloglikelihood.vlmc will be used. For VLMC with covariables,loglikelihood.covlmcwill instead be called. For more informations onloglikelihood methods, usemethods(loglikelihood) and their associated documentation.

Value

an object of classlogLikMixVLMC andlogLik. This is a number,the log-likelihood of the (CO)VLMC with the following attributes:

df: the number of parameters used by the VLMC for this likelihood calculation
nobs: the number of observations included in this likelihood calculation
initial: the value of theinitial parameter used to compute this likelihood

likelihood calculation

In a (CO)VLMC ofdepth()=k, we need k past values in order to compute thecontext of a given observation. As a consequence, in a time seriesx, thecontexts ofx[1] tox[k] are unknown. Depending on the value ofinitialdifferent likelihood functions are used to tackle this difficulty:

initial=="truncated": the likelihood is computed using onlyx[(k+1):length(x)]
initial=="specific": the likelihood is computed on the full time seriesusing a specific context for the initial values,x[1] tox[k]. Each ofthe specific context is unique, leading to a perfect likelihood of 1 (0 inlog scale). Thus the numerical value of the likelihood is identical as theone obtained withinitial=="truncated" but it is computed onlength(x)with a model with more parameters than in this previous case.
initial=="extended" (default): the likelihood is computed on the full time seriesusing an extended context matching for the initial values,x[1] tox[k].This can be seen as a compromised between the two other possibilities:the relaxed context matching needs in general to turn internal nodesof the context tree into actual context, increasing the number of parameters,but not as much as with "specific". However, the likelihood of sayx[1]with an empty context is generally not 1 and thus the full likelihood issmaller than the one computed with "specific".

In all cases, theignore first values of the time series are not includedin the computed likelihood, but still used to compute contexts. Ifignoreis not specified, it is set to the minimal possible value, that is k for thetruncated likelihood and 0 for the other ones. If it is specified, it mustbe larger or equal to k fortruncated.

See the dedicated vignette for a more mathematically oriented discussion:vignette("likelihood", package = "mixvlmc").

Examples

## Likelihood for a fitted VLMC.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)m_nocovariate <- vlmc(dts)ll <- loglikelihood(m_nocovariate)llattr(ll, "nobs")attr(ll, "df")## Likelihood for a new time series with previously fitted VLMC.pc_new <- powerconsumption[powerconsumption$week == 11, ]dts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels)ll_new <- loglikelihood(m_nocovariate, newdata = dts_new)ll_newattributes(ll_new)ll_new_specific <- loglikelihood(m_nocovariate, initial = "specific", newdata = dts_new)ll_new_specificattributes(ll_new_specific)ll_new_extended <- loglikelihood(m_nocovariate, initial = "extended", newdata = dts_new)ll_new_extendedattributes(ll_new_extended)

Log-Likelihood of a VLMC with covariates

Description

This function evaluates the log-likelihood of a VLMC with covariates fittedon a discrete time series. When the optional argumentsnewdata isprovided, the function evaluates instead the log-likelihood for this (new)discrete time series on the new covariates which must be provided through thenewcov parameter.

Usage

## S3 method for class 'covlmc'loglikelihood(  vlmc,  newdata,  initial = c("truncated", "specific", "extended"),  ignore,  newcov,  ...)

Arguments

vlmc

the covlmc representation.

newdata

an optional discrete time series.

initial

specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated are integratedin the likelihood. Defaults to"truncated". See below for details.

ignore

specifies the number of initial values for which the loglikelihoodwill not be computed. The minimal number depends on the likelihood function asdetailed below.

newcov

an optional data frame with the new values for the covariates.

...

additional parameters for loglikelihood.

Details

The definition of the likelihood function depends on the value of theinitial parameters, see the section below as well as the dedicatedvignette:vignette("likelihood", package = "mixvlmc").

Value

an object of classlogLikMixVLMC andlogLik. This is a number,the log-likelihood of the (CO)VLMC with the following attributes:

df: the number of parameters used by the VLMC for this likelihood calculation
nobs: the number of observations included in this likelihood calculation
initial: the value of theinitial parameter used to compute this likelihood

likelihood calculation

initial=="truncated": the likelihood is computed using onlyx[(k+1):length(x)]
initial=="specific": the likelihood is computed on the full time seriesusing a specific context for the initial values,x[1] tox[k]. Each ofthe specific context is unique, leading to a perfect likelihood of 1 (0 inlog scale). Thus the numerical value of the likelihood is identical as theone obtained withinitial=="truncated" but it is computed onlength(x)with a model with more parameters than in this previous case.
initial=="extended" (default): the likelihood is computed on the full time seriesusing an extended context matching for the initial values,x[1] tox[k].This can be seen as a compromised between the two other possibilities:the relaxed context matching needs in general to turn internal nodesof the context tree into actual context, increasing the number of parameters,but not as much as with "specific". However, the likelihood of sayx[1]with an empty context is generally not 1 and thus the full likelihood issmaller than the one computed with "specific".

See the dedicated vignette for a more mathematically oriented discussion:vignette("likelihood", package = "mixvlmc").

Examples

## Likelihood for a fitted VLMC with covariates.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)ll <- loglikelihood(m_cov)llattr(ll, "nobs")## Likelihood for new time series and covariates with previously## fitted VLMC with covariatespc_new <- powerconsumption[powerconsumption$week == 11, ]dts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels)dts_cov_new <- data.frame(day_night = (pc_new$hour >= 7 & pc_new$hour <= 17))ll_new <- loglikelihood(m_cov, newdata = dts_new, newcov = dts_cov_new)ll_newattributes(ll_new)

Merged contexts in a COVLMC

Description

The function returnsNULL when the context represented by thenodeparameter is not merged with another context (seeis_merged()). In theother case, it returns a list of contexts with which this one is merged.

Usage

merged_with(node)

Arguments

node

Actx_node_covlmc object as returned byfind_sequence() orcontexts.covlmc()

Details

If the context is merged, the function returns a list with one value for eachelement in the state space (seestates()). The value isNULL if thecorresponding context is not merged with thenode context, while it is actx_node_covlmc object in the other case. A context merged withnodediffers from the context represented bynode only in its last value (intemporal order) which is used as its name in the list. For instance, if thecontextABC is merged only withCBC (when represented in temporalordering), then the resulting list is of the formlist("A" = NULL, "B" = NULL, "C"= ctx_node_covlmc(CBX)).

Value

NULL or a list of contexts merged withnode represented byctx_node_covlmc objects

Examples

pc_week_15_16 <- powerconsumption[powerconsumption$week %in% c(15, 16), ]elec <- pc_week_15_16$active_powerelec_dts <- cut(elec, breaks = c(0, 0.4, 2, 8), labels = c("low", "typical", "high"))elec_cov <- data.frame(day = (pc_week_15_16$hour >= 7 & pc_week_15_16$hour <= 18))elec_tune <- tune_covlmc(elec_dts, elec_cov, min_size = 5)elec_model <- prune(as_covlmc(elec_tune), alpha = 3.961e-10)ctxs <- contexts(elec_model)for (ctx in ctxs) {  if (is_merged(ctx)) {    print(ctx)    cat("\nis merged with\n\n")    print(merged_with(ctx))  }}

Predictive quality metrics for context based models

Description

This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.

Usage

metrics(model, ...)

Arguments

model

The context based model on which to compute predictive metrics.

...

Additional parameters for predictive metrics computation.

Details

A context based model computes transition probabilities for its contexts.Using a maximum transition probability decision rule, this can be used topredict the new state that is the more likely to follow the current one,given the context (seepredict.vlmc()). The quality of these predictions isevaluated using standard metrics including:

accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states

Value

The returned value is guaranteed to have at least three components

accuracy: the accuracy of the predictions
conf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columns
auc: the AUC of the predictive model

References

David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)model <- vlmc(dts)metrics(model)

Predictive quality metrics for VLMC with covariates

Description

This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.

Usage

## S3 method for class 'covlmc'metrics(model, ...)## S3 method for class 'metrics.covlmc'print(x, ...)

Arguments

model

The context based model on which to compute predictive metrics.

...

Additional parameters for predictive metrics computation.

x

A metrics.covlmc object, results of a call tometrics.covlmc()

Details

accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states

Value

An object of classmetrics.covlmc with the following components:

accuracy: the accuracy of the predictions
conf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columns
auc: the AUC of the predictive model

The object has a print method that recalls basic information about the modeltogether with the values of the components above.

Methods (by generic)

print(metrics.covlmc): Prints the predictive metrics of the VLMC model with covariates.

Extended contexts

As explained in details inloglikelihood.covlmc() documentation and inthe dedicatedvignette("likelihood", package = "mixvlmc"), the firstinitial values of a time series do not in general have a proper context fora COVLMC with a non zero order. In order to predict something meaningfulfor those values, we rely on the notion of extended context defined in thedocuments mentioned above. This follows the same logic as usingloglikelihood.covlmc() with the parameterinitial="extended". Allcovlmc functions that need to manipulate initial values with no propercontext use the same approach.

References

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)metrics(m_cov)

Predictive quality metrics for a node of a context tree

Description

This function computes and returns predictive quality metrics for a node(ctx_node) extracted from a context tree.

Usage

## S3 method for class 'ctx_node'metrics(model, ...)

Arguments

model

Tctx_node object as returned byfind_sequence().

...

Additional parameters for predictive metrics computation.

Details

Compared tometrics.vlmc(), this function focuses on a single context andassesses the quality of its predictions, disregarding observations that haveother contexts. Apart from this limited scope, the function operates asmetrics.vlmc().

Value

The returned value is guaranteed to have at least three components

accuracy: the accuracy of the predictions
conf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columns
auc: the AUC of the predictive model

References

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)model_ctxs <- contexts(model)metrics(model_ctxs[[4]])

Predictive quality metrics for a node of a COVLMC context tree

Description

This function computes and returns predictive quality metrics for a node(ctx_node_covlmc) extracted from a covlmc

Usage

## S3 method for class 'ctx_node_covlmc'metrics(model, ...)

Arguments

model

Actx_node_covlmc object as returned byfind_sequence() orcontexts.covlmc()

...

Additional parameters for predictive metrics computation.

Details

Compared tometrics.covlmc(), this function focuses on a single context andassesses the quality of its predictions, disregarding observations that haveother contexts. Apart from this limited scope, the function operates asmetrics.covlmc().

Value

an object of classmetrics.covlmc with the following components:

accuracy: the accuracy of the predictions
conf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columns
auc: the AUC of the predictive model

References

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)m_ctxs <- contexts(m_cov)## get the predictive metrics for each contextlapply(m_ctxs, metrics)

Predictive quality metrics for VLMC

Description

This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.

Usage

## S3 method for class 'vlmc'metrics(model, ...)## S3 method for class 'metrics.vlmc'print(x, ...)

Arguments

model

The context based model on which to compute predictive metrics.

...

Additional parameters for predictive metrics computation.

x

A metrics.vlmc object, results of a call tometrics.vlmc()

Details

accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states

Value

An object of classmetrics.vlmc with the following components:

accuracy: the accuracy of the predictions
conf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columns
auc: the AUC of the predictive model

The object has a print method that recalls basic information about themodel together with the values of the components above.

Methods (by generic)

print(metrics.vlmc): Prints the predictive metrics of the VLMC model.

Extended contexts

As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to predict something meaningful for thosevalues, we rely on the notion of extended context defined in the documentsmentioned above. This follows the same logic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmcfunctions that need to manipulate initial values with no proper context usethe same approach.

References

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(  0,  median(powerconsumption$active_power, na.rm = TRUE),  max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)model <- vlmc(dts)metrics(model)

Logistic model of a COVLMC context

Description

This function returns a representation of the logistic model associated to aCOVLMC context from its node in the associated context tree.

Usage

model(node, type = c("coef", "full"))

Arguments

node

Actx_node_covlmc object as returned byfind_sequence() orcontexts.covlmc()

type

specifies the model information to return, either thecoefficients only (type="coef" default case) or the full model object(type="full")

Details

Full model extraction is only possible if the COVLMC model what not fullytrimmed (seetrim.covlmc()). Notice thatfind_sequence.covlmc() canproduce node that are not context: in this case this function returnNULL.

Value

ifnode is a context, the coefficients of the logistic model (as avector or a matrix depending on the size of the state space) or a logisticmodel as a R object. Ifnode is not a context,NULL.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)vals <- states(m_cov)node <- find_sequence(m_cov, c(vals[1], vals[1]))nodemodel(node)model(node, type = "full")

Find the parent of a node in a context tree

Description

This function returns the parent node of the node represented by thenode parameter. The result isNULL ifnode is the root node ofits context tree (representing the empty sequence).

Usage

parent(node)## S3 method for class 'ctx_node'parent(node)## S3 method for class 'ctx_node_cpp'parent(node)

Arguments

node

actx_node object as returned byfind_sequence()

Details

Each node of a context tree represents a sequence. Whenfind_sequence() iscalled with success, the returned object represents the corresponding node inthe context tree. Unless the original sequence is empty, this node has aparent node which is returned as actx_node object by the present function.Another interpretation is that the function returns thenode objectassociated to the sequence obtained by removing the oldest value from theoriginal sequence.

Value

actx_node object ifnode does correspond to the emptysequence orNULL when this is not the case

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)ctx_00 <- find_sequence(dts_ctree, c(0, 0))## the parent sequence/node corresponds to the 0 contextparent(ctx_00)identical(parent(ctx_00), find_sequence(dts_ctree, c(0)))## C++ backenddts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")ctx_00 <- find_sequence(dts_ctree, c(0, 0))## the parent sequence/node corresponds to the 0 contextparent(ctx_00)identical(parent(ctx_00), find_sequence(dts_ctree, c(0)))

Plot the results of automatic (CO)VLMC complexity selection

Description

This function plots the results oftune_vlmc() ortune_covlmc().

Usage

## S3 method for class 'tune_vlmc'plot(  x,  value = c("criterion", "likelihood"),  cutoff = c("quantile", "native"),  ...)## S3 method for class 'tune_covlmc'plot(  x,  value = c("criterion", "likelihood"),  cutoff = c("quantile", "native"),  ...)

Arguments

x

atune_vlmc object

value

the criterion to plot (default "criterion").

cutoff

the scale used for the cut off criterion (default "quantile")

...

additional parameters passed tobase::plot()

Details

The standard plot consists in showing the evolution of the criterionused to select the model (AIC() orBIC()) as a function of thecut off criterion expressed in the quantile scale (the quantile is usedby default to offer a common default behaviour betweenvlmc() andcovlmc()). Parameters can be used to display instead theloglikelihood()of the model (by settingvalue="likelihood") and to use the nativescale for the cut off when available (by settingcutoff="native").

Value

thetune_vlmc object invisibly

Customisation

The function sets several default before callingbase::plot(), namely:

type: "l" by default to use a line representation;
xlab: "Cut off (quantile scale)" by default, adapted to the actualscale;
ylab: the name of the criterion or "Log likelihood".

These parameters can be overridden by specifying other values when callingthe function. All parameters specified in addition tox,value andcutoff are passed tobase::plot().

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)## default plotplot(tune_result)## likelihoodplot(tune_result, value = "likelihood")## parameters overridingplot(tune_result,  value = "likelihood",  xlab = "Cut off", type = "b")pc <- powerconsumption[powerconsumption$week %in% 10:12, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov, criterion = "AIC")plot(dts_best_model_tune)plot(dts_best_model_tune, value = "likelihood")

Report the positions of a sequence associated to a node

Description

This function returns the positions of the sequence represented bynodein the time series used to build the context tree in which the sequence isrepresented. This is only possible is those positions were saved during theconstruction of the context tree. In positions were not saved, a call to thisfunction produces an error.

Usage

positions(node)## S3 method for class 'ctx_node'positions(node)## S3 method for class 'ctx_node_cpp'positions(node)

Arguments

node

actx_node object as returned byfind_sequence()

Details

A position of a sequencectx in the time seriesx is an index valuetsuch that the sequence ends withx[t]. Thusx[t+1] is after the context.For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (in standard stateorder), then the position ofctx inx is 3.

Value

positions of the sequence represented bynode is the originaltime series as a integer vector

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)subseq <- find_sequence(dts_tree, factor(c("B", "A"), levels = c("A", "B", "C")))if (!is.null(subseq)) {  positions(subseq)}

Individual household electric power consumption

Description

A data set containing measurements of the electric power consumption of onehousehold with a time resolution of 10 minutes for the full year of 2008.

Usage

powerconsumption

Format

A data frame with 52704 rows and 15 variables:

month: month of 2008
month_day: day of the month
hour: hour (0 to 23)
minute: starting minute of the 10 minutes period of this row
active_power: global average active power on the 10 minute period(in kilowatt)
reactive_power: global average reactive power on the 10 minuteperiod (in kilowatt)
voltage: Average voltage on the 10 minute period (in volt)
intensity: global average current intensity on the 10 minuteperiod (in ampere)
sub_metering_1: energy sub-metering No. 1 (in watt-hour of activeenergy averaged over the 10 minute period). It corresponds to the kitchen,containing mainly a dishwasher, an oven and a microwave (hot plates arenot electric but gas powered)
sub_metering_2: energy sub-metering No. 2 (in watt-hour of activeenergy averaged over the 10 minute period). It corresponds to the laundryroom, containing a washing-machine, a tumble-drier, a refrigerator and a light.
sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energyaveraged over the 10 minute period). It corresponds to an electricwater-heater and an air-conditioner.
week: week number
week_day: day of the week from 1 = Sunday to 7 = Saturday
year_day: day of the year from 1 to 366 (2008 is a leap year)
date_time: Date and time in POSIXct format

Details

This is a simplified version of the full data available on the UCI MachineLearning Repository under aCreative CommonsAttribution 4.0 International (CC BY 4.0) license, and provided by GeorgesHebrail and Alice Berard.

The original data have been averaged over a 10 minute time period (discardingmissing data in each period). The data set contains onlythe measurements from year 2008.

Notice that the different variables are expressed in the adapted units.In particular, the sub-meters are measuring active energy (in watt-hour) whilethe global active power is expressed in kilowatt.

Source

Individual household electric power consumption, 2012, G. Hebrail and A. Berard,UC Irvine Machine Learning repository.doi:10.24432/C58K54

Next state prediction in a discrete time series for a VLMC with covariates

Description

This function computes one step ahead predictions for a discrete time seriesbased on a VLMC with covariates.

Usage

## S3 method for class 'covlmc'predict(  object,  newdata,  newcov,  type = c("raw", "probs"),  final_pred = TRUE,  ...)

Arguments

object

a fitted covlmc object.

newdata

a time series adapted to the covlmc object.

newcov

a data frame with the new values for the covariates.

type

character indicating the type of prediction required. The default"raw" returns actual predictions in the form of a new time series. Thealternative"probs" returns a matrix of prediction probabilities (seedetails).

final_pred

ifTRUE (default value), the predictions include a finalprediction step, made by computing the context of the full time series.WhenFALSE this final prediction is not included.

...

additional arguments.

Details

Given a time seriesX, at time stept, a context is computed usingobservations fromX[1] toX[t-1] (see the dedicated section). Theprediction is then the most probable state forX[t] given this logisticmodel of the context and the corresponding values of the covariates. The timeseries of predictions is returned by the function whentype="raw" (defaultcase).

Whentype="probs", the function returns of the probabilities of each stateforX[t] as estimated by the logistic models. Those probabilities arereturned as a matrix of probabilities with column names given by the statenames.

Value

A vector of predictions iftype="raw" or a matrix of stateprobabilities iftype="probs".

Extended contexts

Examples

pc <- powerconsumption[powerconsumption$week == 10, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.2, 0.7, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, alpha = 0.5)dts_probs <- predict(m_cov, dts[1:144], dts_cov[1:144, , drop = FALSE], type = "probs")dts_preds <- predict(m_cov, dts[1:144], dts_cov[1:144, , drop = FALSE],  type = "raw", final_pred = FALSE)

Next state prediction in a discrete time series for a VLMC

Description

This function computes one step ahead predictions for a discrete time seriesbased on a VLMC.

Usage

## S3 method for class 'vlmc'predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)## S3 method for class 'vlmc_cpp'predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)

Arguments

object

a fitted vlmc object.

newdata

a time series adapted to the vlmc object.

type

final_pred

ifTRUE (default value), the predictions include a finalprediction step, made by computing the context of the full time series.WhenFALSE this final prediction is not included.

...

additional arguments.

Details

Given a time seriesX, at time stept, a context is computed usingobservations fromX[1] toX[t-1] (see the dedicated section). Theprediction is then the most probable state forX[t] given this contexts.Ties are broken according to the natural order in the state space, favouring"small" values. The time series of predictions is returned by the functionwhentype="raw" (default case).

Whentype="probs", eachX[t] is associated to the conditionalprobabilities of the next state given the context. Those probabilities arereturned as a matrix of probabilities with column names given by the statenames.

Value

A vector of predictions iftype="raw" or a matrix of stateprobabilities iftype="probs".

Extended contexts

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)predict(model, dts[1:5])predict(model, dts[1:5], "probs")## C++ backendpc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5, backend = "C++")predict(model, dts[1:5])predict(model, dts[1:5], "probs")

Print a context list

Description

This function prints a list of contexts i.e. acontexts object listingctx_node objects.

Usage

## S3 method for class 'contexts'print(x, reverse = TRUE, ...)

Arguments

x

thecontexts object to print

reverse

specifies whether the contexts should be reported intemporal order (FALSE, default value) or in reverse temporal order (TRUE).If the parameter is not specified, the contexts are displayed in orderspecified by the call tocontexts() used to build the context list.

...

additional arguments for the print function.

Value

thex object, invisibly

Examples

dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)print(contexts(dts_tree))

Prune a Variable Length Markov Chain (VLMC)

Description

This function prunes a VLMC.

Usage

prune(vlmc, alpha = 0.05, cutoff = NULL, ...)## S3 method for class 'vlmc'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)## S3 method for class 'vlmc_cpp'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)

Arguments

vlmc

a fitted VLMC model.

alpha

number in (0,1] (default: 0.05) cut off value in quantile scalefor pruning.

cutoff

positive number: cut off value in native (log likelihood ratio)scale for pruning. Defaults to the value obtained fromalpha. Takesprecedence overalpha if specified.

...

additional arguments for the prune function.

Details

In general, pruning a VLMC is more efficient than constructing two VLMC (thebase one and pruned one). Up to numerical instabilities, building a VLMC withaa cut off and then pruning it with ab cut off (witha>b) shouldproduce the same VLMC than building directly the VLMC with ab cut off.Interesting cut off values can be extracted from a VLMC using thecutoff()function.

As automated model selection is provided bytune_vlmc(), the direct use ofcutoffshould be reserved to advanced exploration of the set of trees that can beobtained from a complex one, e.g. to implement model selection techniques thatare not provided bytune_vlmc().

Value

a pruned VLMC

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))base_model <- vlmc(dts, alpha = 0.1)model_cuts <- cutoff(base_model)pruned_model <- prune(base_model, model_cuts[3])draw(pruned_model)direct_simple <- vlmc(dts, alpha = model_cuts[3])draw(direct_simple)# pruned_model and direct_simple should be identicalall.equal(pruned_model, direct_simple)

Prune a Variable Length Markov Chain with covariates

Description

This function prunes a vlmc with covariates. This model must have beenestimated withkeep_data=TRUE to enable the pruning.

Usage

## S3 method for class 'covlmc'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)

Arguments

vlmc

a fitted VLMC model with covariates.

alpha

number in (0,1) (default: 0.05) cutoff value in quantile scalefor pruning.

cutoff

not supported by the vlmc with covariates.

...

additional arguments for the prune function.

Details

Post pruning a VLMC with covariates is not as straightforward as the sameprocedure applied tovlmc() (seecutoff.vlmc() andprune.vlmc()). Forefficiency reasons,covlmc() estimates only the logistic models that areconsidered useful for a given set construction parameters. With a moreaggressive pruning threshold, some contexts become leaves of the context treeand new logistic models must be estimated. Thus the pruning opportunitiesgiven bycutoff.covlmc() are only a subset of interesting cut offs for agiven covlmc.

Nevertheless,covlmc share withvlmc() the principle that post pruning acovlmc should give the same model as buidling directly the covlmc, providedthat the post pruning alpha is smaller than the alpha used to build theinitial model.

Value

a pruned covlmc.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, keep_data = TRUE)draw(m_cov)m_cov_cuts <- cutoff(m_cov)p_cov <- prune(m_cov, m_cov_cuts[1])draw(p_cov)

Reverse Sequence

Description

This function reverses the order in which the sequence represented by thectx_node parameter will be reported in other functions, mainlyas_sequence().

Usage

## S3 method for class 'ctx_node'rev(x)

Arguments

x

actx_node object as returned byfind_sequence()

Value

actx_node using the opposite ordering convention as the parameterof the function

Examples

dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)res <- find_sequence(dts_tree, c("A", "B"))print(res)r_res <- rev(res)print(r_res)as_sequence(r_res)

Simulate a discrete time series for a covlmc

Description

This function simulates a time series from the distribution estimated by thegiven covlmc object.

Usage

## S3 method for class 'covlmc'simulate(object, nsim = 1, seed = NULL, covariate, init = NULL, ...)

Arguments

object

a fitted covlmc object.

nsim

length of the simulated time series (defaults to 1).

seed

an optional random seed (see the dedicated section).

covariate

values of the covariates.

init

an optional initial sequence for the time series.

...

additional arguments.

Details

A VLMC with covariates model needs covariates to compute its transitionprobabilities. The covariates must be submitted as a data frame using thecovariate argument. In addition, the time series can be initiated by afixed sequence specified via theinit parameter.

Value

a simulated discrete time series of the same type as the one used tobuild the covlmc with aseed attribute (see the Random seed section). Theresults has also thedts class to hide theseed attribute when usingprint or similar function.

Extended contexts

As explained in details inloglikelihood.covlmc() documentation and inthe dedicatedvignette("likelihood", package = "mixvlmc"), the firstinitial values of a time series do not in general have a proper context fora COVLMC with a non zero order. In order to simulate something meaningfulfor those values, we rely on the notion of extended context defined in thedocuments mentioned above. This follows the same logic as usingloglikelihood.covlmc() with the parameterinitial="extended". Allcovlmc functions that need to manipulate initial values with no propercontext use the same approach.

Random seed

This function reproduce the behaviour ofstats::simulate(). Ifseed isNULL the function does not change the random generator state and returnsthe value of.Random.seed as aseed attribute in the return value. Thiscan be used to reproduce exactly the simulation results by setting.Random.seed to this value. Notice that if the random seed has not beinitialised by R so far, the function issues a call torunif(1) toperform this initialisation (as is done instats::simulate()).

Itseed is an integer, it is used in a call toset.seed() before thesimulation takes place. The integer is saved as aseed attribute in thereturn value. The integer seed is completed by an attributekind whichcontains the value⁠as.list([RNGkind()])⁠ exactly as withstats::simulate(). The random generator state is reset to its originalvalue at the end of the call.

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)# new week with day light from 6:00 to 18:00new_cov <- data.frame(day_night = rep(c(rep(FALSE, 59), rep(TRUE, 121), rep(FALSE, 60)), times = 7))new_dts <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov)new_dts_2 <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov, init = dts[1:10])

Simulate a discrete time series for a vlmc

Description

This function simulates a time series from the distribution estimated by thegiven vlmc object.

Usage

## S3 method for class 'vlmc'simulate(object, nsim = 1L, seed = NULL, init = NULL, burnin = 0L, ...)

Arguments

object

a fitted vlmc object.

nsim

length of the simulated time series (defaults to 1).

seed

an optional random seed (see the dedicated section).

init

an optional initial sequence for the time series.

burnin

number of initial observations to discard or"auto" (see thededicated section).

...

additional arguments.

Details

The time series can be initiated by a fixed sequence specified via theinitparameter.

Value

a simulated discrete time series of the same type as the one used tobuild the vlmc with aseed attribute (see the Random seed section). Theresults has also thedts class to hide theseed attribute when usingprint or similar function.

Burn in (Warm up) period

When using a VLMC for simulation purposes, we are generally interested inthe stationary distribution of the corresponding Markov chain. To reducethe dependence of the samples from the initial values and get closer tothis stationary distribution (if it exists), it is recommended to discardthe first samples which are produced in a so-called "burn in" (or "warmup") period. Theburnin parameter can be used to implement this approach.The VLMC is used to produce a sample of sizeburnin + nsim but the firstburnin values are discarded. Notice that this burn in values can bepartially given by theinit parameter if it is specified.

Ifburnin is set to"auto", theburnin period is set to64 * context_number(object), following the heuristic proposed in Mächler andBühlmann (2004).

Random seed

Extended contexts

As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to simulate something meaningful for thosevalues wheninit is not provided, we rely on the notion of extendedcontext defined in the documents mentioned above. This follows the samelogic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmc functions that need to manipulate initialvalues with no proper context use the same approach.

References

Mächler, M. and Bühlmann, P. (2004) "Variable Length MarkovChains: Methodology, Computing, and Software" Journal of Computational andGraphical Statistics, 13 (2), 435-455,doi:10.1198/1061860043524

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)new_dts <- simulate(model, 500, seed = 0)new_dts_2 <- simulate(model, 500, seed = 0, init = dts[1:5])new_dts_3 <- simulate(model, 500, seed = 0, burnin = 500)

Simulate a discrete time series for a vlmc

Description

This function simulates a time series from the distribution estimated by thegiven vlmc object.

Usage

## S3 method for class 'vlmc_cpp'simulate(  object,  nsim = 1,  seed = NULL,  init = NULL,  burnin = 0L,  sample = c("fast", "slow", "R"),  ...)

Arguments

object

a fitted vlmc object.

nsim

length of the simulated time series (defaults to 1).

seed

an optional random seed (see the dedicated section).

init

an optional initial sequence for the time series.

burnin

number of initial observations to discard or"auto" (see thededicated section).

sample

specifies which implementation ofbase::sample() to use.See the dedicated section.

...

additional arguments.

Details

The time series can be initiated by a fixed sequence specified via theinitparameter.

Value

sampling method

The R backend forvlmc() usesbase::sample() to generate samples for eachcontext. Internally, this function sorts the probabilities of each state indecreasing probability order (among other things), which is not needed in ourcase. The C++ backend can be used with three different implementations:

sample="fast" uses a dedicated C++ implementation adapted to the data structuresused internally. In general, the simulated time series obtained with thisimplementation will be different from the one generated with the R backend,even using the same seed.
sample="slow" uses another C++ implementation that mimicsbase::sample() inorder to maximize the chance to provide identical simulation results regardlessof the backend (when using the same random seed). This process is not perfectas we use the std::lib sort algorithm which is not guaranteed to give identicalresults as the ones of R internal 'revsort'.
sample="R" uses direct calls tobase::sample(). Results are guaranteedto be identical between the two backends, but at the price of higher runningtime.

Burn in (Warm up) period

Ifburnin is set to"auto", theburnin period is set to64 * context_number(object), following the heuristic proposed in Mächler andBühlmann (2004).

Random seed

Extended contexts

As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to simulate something meaningful for thosevalues wheninit is not provided, we rely on the notion of extendedcontext defined in the documents mentioned above. This follows the samelogic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmc functions that need to manipulate initialvalues with no proper context use the same approach.

References

Mächler, M. and Bühlmann, P. (2004) "Variable Length MarkovChains: Methodology, Computing, and Software" Journal of Computational andGraphical Statistics, 13 (2), 435-455,doi:10.1198/1061860043524

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)new_dts <- simulate(model, 500, seed = 0)new_dts_2 <- simulate(model, 500, seed = 0, init = dts[1:5])new_dts_3 <- simulate(model, 500, seed = 0, burnin = 500)

State space of a context tree

Description

This function returns the state space of a context tree.

Usage

states(ct)

Arguments

ct

a context tree.

Value

the state space of the context tree.

Examples

dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)## should be c(0, 1)states(dts_ctree)

Trim a context tree

Description

This function returns a trimmed context tree from which match positionshave been removed.

Usage

trim(ct, ...)

Arguments

ct

a context tree.

...

additional arguments for the trim function.

Value

a trimmed context tree.

Examples

## context tree trimmingdts <- sample(as.factor(c("A", "B", "C")), 1000, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 10, min_size = 5, keep_position = TRUE)print(object.size(dts_tree))dts_tree <- trim(dts_tree)print(object.size(dts_tree))

Trim a COVLMC

Description

This function returns a trimmed COVLMC from which cached data have been removed.

Usage

## S3 method for class 'covlmc'trim(ct, keep_model = FALSE, ...)

Arguments

ct

a context tree.

keep_model

specifies whether to keep the internal models (or not)

...

additional arguments for the trim function.

Details

Called withkeep_model set toFALSE (default case), the trimming is maximal and reducesfurther usability of the model. In particularloglikelihood.covlmc() cannot be usedfor new data,contexts.covlmc() do not support model extraction, andsimulate.covlmc(),metrics.covlmc() andprune.covlmc() cannot be used at all.

Called withkeep_model set toTRUE, the trimming process is less complete. Inparticular internal models are simplified usingbutcher::butcher() and someadditional minor reductions. This saves less memory but enables the use ofloglikelihood.covlmc() for new data aswell as the use ofsimulate.covlmc().

Value

a trimmed context tree.

Examples

pc <- powerconsumption[powerconsumption$week %in% 5:7, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10, keep_data = TRUE)print(object.size(m_cov), units = "Mb")t_m_cov_model <- trim(m_cov, keep_model = TRUE)print(object.size(t_m_cov_model), units = "Mb")t_m_cov <- trim(m_cov)print(object.size(t_m_cov), units = "Mb")

This function returns a trimmed VLMC from which match positions have beenremoved.

Description

This function returns a trimmed context tree from which match positionshave been removed.

Usage

## S3 method for class 'vlmc'trim(ct, ...)

Arguments

ct

a VLMC.

...

additional arguments for the trim function.

Value

a trimmed VLMC

Examples

## VLMC trimming is generally useless unless match positions were keptpc <- powerconsumption[powerconsumption$week %in% 5:6, ]dts <- cut(pc$active_power, breaks = 4)model <- vlmc(dts, keep_match = TRUE)print(object.size(model))model <- trim(model)## memory use should be reducedprint(object.size(model))nm_model <- vlmc(dts)print(object.size(nm_model))nm_model <- trim(nm_model)## no effect when match positions are not keptprint(object.size(nm_model))

This function returns a trimmed VLMC from which match positions have beenremoved.

Description

This function returns a trimmed context tree from which match positionshave been removed.

Usage

## S3 method for class 'vlmc_cpp'trim(ct, ...)

Arguments

ct

a VLMC.

...

additional arguments for the trim function.

Details

Trimming in the C++ backend is done directly in theRcpp managed memory andcannot be detected at R level using e.g.utils::object.size().

Value

a trimmed VLMC

Examples

## VLMC trimming is generally useless unless match positions were keptpc <- powerconsumption[powerconsumption$week %in% 5:6, ]dts <- cut(pc$active_power, breaks = 4)model <- vlmc(dts, backend = "C++", keep_match = TRUE)model <- trim(model)

Fit an optimal Variable Length Markov Chain with Covariates (coVLMC)

Description

This function fits a Variable Length Markov Chain with Covariates (coVLMC) toa discrete time series coupled with a time series of covariates by optimizingan information criterion (BIC or AIC).

Usage

tune_covlmc(  x,  covariate,  criterion = c("BIC", "AIC"),  initial = c("truncated", "specific", "extended"),  alpha_init = NULL,  min_size = 5,  max_depth = 100,  verbose = 0,  save = c("best", "initial", "all"),  trimming = c("full", "partial", "none"),  best_trimming = c("none", "partial", "full"))

Arguments

x

a discrete time series; can be numeric, character, factor andlogical.

covariate

a data frame of covariates.

criterion

criterion used to select the best model. Either"BIC"(default) or"AIC" (see details).

initial

specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Seeloglikelihood() for details.

alpha_init

if nonNULL used as the initial cut off parameter (inquantile scale) to build the initial VLMC

min_size

integer >= 1 (default: 5). Tune the minimum number ofobservations for a context in the growing phase of the context tree (seecovlmc() for details).

max_depth

integer >= 1 (default: 100). Longest context considered ingrowing phase of the initial context tree (see details).

verbose

integer >= 0 (default: 0). Verbosity level of the pruningprocess.

save

specify which BIC models are saved during the pruning process.The default value"best" asks the function to keep only the best modelaccording to thecriterion. Whensave="initial" the function keepsinaddition the initial (complex) model which is then pruned during theselection process. Whensave="all", the function returns all the modelsconsidered during the selection process. See details for memory occupation.

trimming

specify the type of trimming used when saving theintermediate models, see details.

best_trimming

specify the type of trimming used when saving the bestmodel and the initial one (see details).

Details

This function automates the process of fitting a large coVLMC to a discretetime series withcovlmc() and of pruning the tree (withcutoff() andprune()) to get an optimal with respect to an information criterion. Toavoid missing long term dependencies, the function uses themax_depthparameter as an initial guess but then relies on an automatic increase of thevalue to make sure the initial context tree is only limited by themin_sizeparameter. The initial value of thealpha parameter ofcovlmc() is alsoset to a conservative value (0.5) to avoid prior simplification of thecontext tree. This can be overridden by setting thealpha_init parameter toa more adapted value.

Once the initial coVLMC is obtained, thecutoff() andprune() functionsare used to build all the coVLMC models that could be generated using smallervalues of the alpha parameter. The best model is selected from thiscollection, including the initial complex tree, as the one that minimizes thechosen information criterion.

Value

a list with the following components:

best_model: the optimal COVLMC
criterion: the criterion used to select the optimal VLMC
initial: the likelihood function used to select the optimal VLMC
results: a data frame with details about the pruning process
saved_models: a list of intermediate COVLMCs ifsave="initial" orsave="all". It contains aninitial component with the large coVLMCobtained first and anall component with a list of all theother coVLMCobtained by pruning the initial one.

Memory occupation

covlmc objects tend to be large and saving all the models during thesearch for the optimal model can lead to an unreasonable use of memory. Toavoid this problem, models are kept in trimmed form only usingtrim.covlmc() withkeep_model=FALSE. Both the initial model and thebest one are saved untrimmed. This default behaviour corresponds totrimming="full". Settingtrimming="partial" asks the function to usekeep_model=TRUE intrim.covlmc() for intermediate models. Finally,trimming="none" turns off trimming, which is discouraged expected forsmall data sets.

In parallel processing contexts (e.g. usingforeach::%dopar%), the memoryoccupation of the results can become very large as models tend to keepenvironments attached to the formulas. In this situation, it is highlyrecommended to trim all saved models, including the best one and theinitial one. This can be done via thebest_trimming parameter whosepossible values are identical to the ones oftrimming.

Examples

pc <- powerconsumption[powerconsumption$week %in% 6:7, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov)draw(as_covlmc(dts_best_model_tune))

Fit an optimal Variable Length Markov Chain (VLMC)

Description

This function fits a Variable Length Markov Chain (VLMC) to a discrete timeseries by optimizing an information criterion (BIC or AIC).

Usage

tune_vlmc(  x,  criterion = c("BIC", "AIC"),  initial = c("truncated", "specific", "extended"),  alpha_init = NULL,  cutoff_init = NULL,  min_size = 2L,  max_depth = 100L,  backend = getOption("mixvlmc.backend", "R"),  verbose = 0,  save = c("best", "initial", "all"))

Arguments

x

a discrete time series; can be numeric, character, factor andlogical.

criterion

criterion used to select the best model. Either"BIC"(default) or"AIC" (see details).

initial

specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Default to"truncated". Seeloglikelihood() for details.

alpha_init

if nonNULL used as the initial cut off parameter (inquantile scale) to build the initial VLMC

cutoff_init

if nonNULL used as the initial cut off parameter tobuild the initial VLMC. Takes precedence overalpha_init if specified.

min_size

integer >= 1 (default: 2). Minimum number of observations fora context in the growing phase of the initial context tree.

max_depth

integer >= 1 (default: 100). Longest context considered ingrowing phase of the initial context tree (see details).

backend

backend "R" or "C++" (default: as specified by the"mixvlmc.backend" option). Specifies the implementation used to representthe context tree and to built it. Seevlmc() for details.

verbose

integer >= 0 (default: 0). Verbosity level of the pruningprocess.

save

Details

This function automates the process of fitting a large VLMC to a discretetime series withvlmc() and of pruning the tree (withcutoff() andprune()) to get an optimal with respect to an information criterion. Toavoid missing long term dependencies, the function uses themax_depthparameter as an initial guess but then relies on an automatic increase of thevalue to make sure the initial context tree is only limited by themin_sizeparameter. The initial value of thecutoff parameter ofvlmc() is alsoset to conservative values (depending on the criterion) to avoid priorsimplification of the context tree. This default value can be overriddenusing thecutoff_init oralpha_init parameter.

Once the initial VLMC is obtained, thecutoff() andprune() functions areused to build all the VLMC models that could be generated using larger valuesof the initial cut off parameter. The best model is selected from thiscollection, including the initial complex tree, as the one that minimizes thechosen information criterion.

Value

a list with the following components:

best_model: the optimal VLMC
criterion: the criterion used to select the optimal VLMC
initial: the likelihood function used to select the optimal VLMC
results: a data frame with details about the pruning process
saved_models: a list of intermediate VLMCs ifsave="initial" orsave="all". It contains aninitial component with the large VLMCobtained first and anall component with a list of all theother VLMCobtained by pruning the initial one.

Examples

dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)draw(tune_result$best_model)

Fit a Variable Length Markov Chain (VLMC)

Description

This function fits a Variable Length Markov Chain (VLMC) to a discrete timeseries.

Usage

vlmc(  x,  alpha = 0.05,  cutoff = NULL,  min_size = 2L,  max_depth = 100L,  prune = TRUE,  keep_match = FALSE,  backend = getOption("mixvlmc.backend", "R"))

Arguments

x

a discrete time series; can be numeric, character, factor orlogical.

alpha

number in (0,1] (default: 0.05) cut off value in quantile scalein the pruning phase.

cutoff

non negative number: cut off value in native (likelihood ratio)scale in the pruning phase. Defaults to the value obtained fromalpha.Takes precedence overalpha is specified.

min_size

integer >= 1 (default: 2). Minimum number of observations fora context in the growing phase of the context tree.

max_depth

integer >= 1 (default: 100). Longest context considered ingrowing phase of the context tree.

prune

logical: specify whether the context tree should be pruned(default behaviour).

keep_match

logical: specify whether to keep the context matches(default to FALSE)

backend

"R" or "C++" (default: as specified by the "mixvlmc.backend"option). Specifies the implementation used to represent the context treeand to built it. See details.

Details

The VLMC is built using Bühlmann and Wyner's algorithm which consists infitting a context tree (seectx_tree()) to a time series and then pruningit in such as way that the conditional distribution of the next state of thetime series given the context is significantly different from thedistribution given a truncated version of the context.

The construction of the context tree is controlled bymin_size andmax_depth, exactly as inctx_tree(). Significativity is measured using alikelihood ratio test (threshold can be specified in terms of the ratioitself withcutoff) or in quantile scale withalpha.

Pruning can be postponed by settingprune=FALSE. Using a combination ofcutoff() andprune(), the complexity of the VLMC can then be adjusted.Any VLMC model can be pruned after construction,prune=FALSE is aconvenience parameter to avoid settingalpha=1 (which essentially preventsany pruning). Automated model selection is provided bytune_vlmc().

Value

a fitted vlmc model.

Back ends

Two back ends are available to compute context trees:

the "R" back end represents the tree in pure R data structures (nested lists)that be easily processed further in pure R (C++ helper functions are usedto speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end isconsidered experimental. The tree is built with an optimised suffix treealgorithm which speeds up the construction by at least a factor 10 instandard settings. As the tree is kept outside of R direct reach, contexttrees built with the C++ back end must be restored after asaveRDS()/readRDS() sequence. This is done automatically by recomputingcompletely the context tree.

References

Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markovchains. Ann. Statist." 27 (2) 480-513doi:10.1214/aos/1018031204

Examples

pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power,  breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)depth(model)## reduce the detph of the modelshallow_model <- vlmc(dts, max_depth = 3)draw(shallow_model, prob = FALSE)## improve probability estimatesrobust_model <- vlmc(dts, min_size = 25)draw(robust_model, prob = FALSE) ## show the frequenciesdraw(robust_model)

Movatterモバイル変換

mixvlmc: Variable Length Markov Chains with Covariates

Description

Package options

Author(s)

See Also

Convert an object to a Variable Length Markov Chain with covariates (coVLMC)

Description

Usage

Arguments

Value

See Also

Examples

Extract the sequence encoded by a node

Description

Usage

Arguments

Value

Examples

Convert an object to a Variable Length Markov Chain (VLMC)

Description

Usage

Arguments

Details

Value

See Also

Examples

Convert an object to a Variable Length Markov Chain (VLMC)

Description

Usage

Arguments

Details

Value

See Also

Examples

Create a complete ggplot for the results of automatic COVLMC complexityselection

Description

Usage

Arguments

Details

Value

Examples

Create a complete ggplot for the results of automatic VLMC complexityselection

Description

Usage

Arguments

Details

Value

Examples

Find the children nodes of a node in a context tree

Description

Usage

Arguments

Details

Value

Examples

Number of contexts of a context tree

Description

Usage

Arguments

Value

Examples

Contexts number of a VLMC with covariates

Description

Usage

Arguments

Value

Examples

Contexts of a context tree

Description

Usage

Arguments

Details

Value

State order in a context

See Also

Examples

Contexts of a VLMC with covariates

Description

Usage