| Type: | Package |
| Title: | Variable Length Markov Chains with Covariates |
| Version: | 0.2.2 |
| Description: | Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999) <doi:10.1214/aos/1018031204> for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022) <doi:10.1111/jtsa.12615> for VLMC with covariates. |
| License: | GPL (≥ 3) |
| URL: | https://github.com/fabrice-rossi/mixvlmc,https://fabrice-rossi.github.io/mixvlmc/ |
| BugReports: | https://github.com/fabrice-rossi/mixvlmc/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | assertthat, butcher, ggplot2, methods, nnet, pROC, Rcpp (≥1.0.8.3), rlang, stats, stringr, VGAM, withr |
| LinkingTo: | Rcpp |
| RoxygenNote: | 7.3.2 |
| Suggests: | data.table, foreach, geodist, knitr, rmarkdown, testthat (≥3.0.0), tibble, vdiffr, waldo |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| Config/testthat/start-first: | covlmc* |
| Depends: | R (≥ 2.10) |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2025-05-26 11:53:23 UTC; fabrice |
| Author: | Fabrice Rossi |
| Maintainer: | Fabrice Rossi <Fabrice.Rossi@apiacoa.org> |
| Repository: | CRAN |
| Date/Publication: | 2025-05-26 12:30:01 UTC |
mixvlmc: Variable Length Markov Chains with Covariates
Description
Estimates Variable Length Markov Chains (VLMC) models and VLMC with covariates models from discrete sequences. Supports model selection via information criteria and simulation of new sequences from an estimated model. See Bühlmann, P. and Wyner, A. J. (1999)doi:10.1214/aos/1018031204 for VLMC and Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022)doi:10.1111/jtsa.12615 for VLMC with covariates.
Package options
Mixvlmc uses the followingoptions():
mixvlmc.maxit: maximum number of iterations in model fitting forcovlmc()mixvlmc.predictive: specifies the computing engine used for model fittingforcovlmc(). Two values are supported:"glm"(default value):covlmc()usesstats::glm()with a binomiallink (stats::binomial()) for a two values state space, andVGAM::vglm()with a multinomial link (VGAM::multinomial()) for a state space withthree or more values;"multinom":covlmc()usesnnet::multinom()in all cases.
The first option
"glm"is recommended as bothstats::glm()andVGAM::vglm()are able to detect and deal with degeneracy in the data set.mixvlmc.backend: specifies the implementation used for the context treeconstruction inctx_tree(),vlmc()andtune_vlmc(). Two values aresupported:"R"(default value): this corresponds to the original almost pure Rimplementation."C++": this corresponds to the experimental C++ implementation. Thisversion is significantly faster than the R version, but is stillconsidered experimental.
Author(s)
Maintainer: Fabrice RossiFabrice.Rossi@apiacoa.org (ORCID) [copyright holder]
Other contributors:
Hugo Le Picardlepicardhugo@gmail.com (ORCID) [contributor]
Guénolé Joubiouxguenole.joubioux@gmail.com [contributor]
See Also
Useful links:
Report bugs athttps://github.com/fabrice-rossi/mixvlmc/issues
Convert an object to a Variable Length Markov Chain with covariates (coVLMC)
Description
This generic function converts an object into a covlmc.
Usage
as_covlmc(x, ...)## S3 method for class 'tune_covlmc'as_covlmc(x, ...)Arguments
x | an object to convert into a covlmc. |
... | additional arguments for conversion functions. |
Value
a covlmc
See Also
Examples
## conversion from the results of tune_covlmcpc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov)dts_best_model <- as_covlmc(dts_best_model_tune)draw(dts_best_model)Extract the sequence encoded by a node
Description
This function returns the sequence represented by thenode object.
Usage
as_sequence(node, reverse)Arguments
node | a |
reverse | specifies whether the sequence should be reported in reversetemporal order ( |
Value
the sequence represented by thenode object, a vector
Examples
dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)res <- find_sequence(dts_tree, "A")as_sequence(res)Convert an object to a Variable Length Markov Chain (VLMC)
Description
This generic function converts an object into a vlmc.
Usage
as_vlmc(x, ...)## S3 method for class 'ctx_tree'as_vlmc(x, alpha, cutoff, ...)## S3 method for class 'tune_vlmc'as_vlmc(x, ...)Arguments
x | an object to convert into a vlmc. |
... | additional arguments for conversion functions. |
alpha | cut off parameter applied during the conversion, quantile scale(if specified) |
cutoff | cut off parameter applied during the conversion, native scale(if specified) |
Details
This function converts a context tree into a VLMC. Ifalpha orcutoff is specified, it is used to reduce the complexity of the tree as ina direct call tovlmc() (prune()).
Value
a vlmc
See Also
Examples
## conversion from a context treedts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)draw(dts_ctree)dts_vlmc <- as_vlmc(dts_ctree)class(dts_vlmc)draw(dts_vlmc)## conversion from the result of tune_vlmcdts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)tune_resultdts_best_vlmc <- as_vlmc(tune_result)draw(dts_best_vlmc)Convert an object to a Variable Length Markov Chain (VLMC)
Description
This generic function converts an object into a vlmc.
Usage
## S3 method for class 'ctx_tree_cpp'as_vlmc(x, alpha, cutoff, ...)Arguments
x | an object to convert into a vlmc. |
alpha | cut off parameter applied during the conversion, quantile scale(if specified) |
cutoff | cut off parameter applied during the conversion, native scale(if specified) |
... | additional arguments for conversion functions. |
Details
This function converts a context tree into a VLMC. Ifalpha orcutoff is specified, it is used to reduce the complexity of the tree as ina direct call tovlmc() (prune()).
Value
a vlmc
See Also
Examples
## conversion from a context treedts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")draw(dts_ctree)dts_vlmc <- as_vlmc(dts_ctree)class(dts_vlmc)draw(dts_vlmc)Create a complete ggplot for the results of automatic COVLMC complexityselection
Description
This function prepares a plot of the results oftune_covlmc() usingggplot2. The result can be passed toprint() to display the result.
Usage
## S3 method for class 'tune_covlmc'autoplot(object, ...)Arguments
object | a |
... | additional parameters (not used currently) |
Details
The graphical representation proposed by this function is complete, while theone produced byplot.tune_covlmc() is minimalistic. We use here thefaceting capabilities of ggplot2 to combine on a single graphicalrepresentation the evolution of multiple characteristics of the VLMC duringthe pruning process, whileplot.tune_covlmc() shows only the selectioncriterion or the log likelihood. Each facet of the resulting plot shows aquantity as a function of the cut off expressed in quantile or native scale.
Value
a ggplot object
Examples
pc <- powerconsumption[powerconsumption$week %in% 10:12, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov, criterion = "AIC")covlmc_plot <- ggplot2::autoplot(dts_best_model_tune)print(covlmc_plot)Create a complete ggplot for the results of automatic VLMC complexityselection
Description
This function prepares a plot of the results oftune_vlmc() using ggplot2.The result can be passed toprint() to display the result.
Usage
## S3 method for class 'tune_vlmc'autoplot(object, cutoff = c("quantile", "native"), ...)Arguments
object | a |
cutoff | the scale used for the cut off criterion (default "quantile") |
... | additional parameters (not used currently) |
Details
The graphical representation proposed by this function is complete, while theone produced byplot.tune_vlmc() is minimalistic. We use here the facetingcapabilities of ggplot2 to combine on a single graphical representation theevolution of multiple characteristics of the VLMC during the pruning process,whileplot.tune_vlmc() shows only the selection criterion or the loglikelihood. Each facet of the resulting plot shows a quantity as a functionof the cut off expressed in quantile or native scale.
Value
a ggplot object
Examples
pc <- powerconsumption[powerconsumption$week %in% 10:11, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_best_model_tune <- tune_vlmc(dts, criterion = "BIC")vlmc_plot <- ggplot2::autoplot(dts_best_model_tune)print(vlmc_plot)## simple post customisationprint(vlmc_plot + ggplot2::geom_point())Find the children nodes of a node in a context tree
Description
This function returns a list (possibly empty) ofctx_node objects. Eachobject represents one of the children of the node represented by thenodeparameter.
Usage
children(node)## S3 method for class 'ctx_node'children(node)## S3 method for class 'ctx_node_cpp'children(node)Arguments
node | a |
Details
Each node of a context tree represents a sequence. Whenfind_sequence() iscalled with success, the returned object represents the corresponding node inthe context tree. If this node has no child, the present function returns anempty list. When the node has at least one child, the function returns a listwith one value for each element in the state space (seestates()). Thevalue isNULL if the corresponding child is empty, while it is actx_nodeobject when the child is present. Eachctx_node object is associated to thesequence obtained by adding to the past of the sequence represented bynodean observation of the associated state (this corresponds to an extension tothe left of the sequence in temporal order).
Value
a list ofctx_node objects, see details.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)ctx_00 <- find_sequence(dts_ctree, c(0, 0))## this context can only be extended in the past by 1:children(ctx_00)ctx_10 <- find_sequence(dts_ctree, c(1, 0))## this context can be extended by both stateschildren(ctx_10)## C++ backenddts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")ctx_00 <- find_sequence(dts_ctree, c(0, 0))## this context can only be extended in the past by 1:children(ctx_00)ctx_10 <- find_sequence(dts_ctree, c(1, 0))## this context can be extended by both stateschildren(ctx_10)Number of contexts of a context tree
Description
This function returns the number of distinct contexts in a context tree.
Usage
context_number(ct)Arguments
ct | a context tree. |
Value
the number of contexts of the tree.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)# should be 8context_number(dts_ctree)Contexts number of a VLMC with covariates
Description
This function returns the total number of contexts of a VLMC with covariates.
Usage
## S3 method for class 'covlmc'context_number(ct)Arguments
ct | a fitted covlmc model. |
Value
the number of contexts present in the VLMC with covariates.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)# should be 3context_number(m_cov)Contexts of a context tree
Description
This function extracts from a context tree a description of all of itscontexts.
Usage
contexts(ct, sequence = FALSE, reverse = FALSE, ...)Arguments
ct | a context tree. |
sequence | if |
reverse | logical (defaults to |
... | additional arguments for the contexts function. |
Details
The default behaviour consists in returning a list of all the contextscontained in the tree usingctx_node objects (as returned by e.g.find_sequence()) (withtype="list"). The properties of the contexts canthen be explored using adapted functions such ascounts() andpositions(). The result list is of classcontexts. Whensequence=TRUE,the method returns a data.frame whose first column, namedcontext, containsthe contexts as vectors (i.e. the value returned byas_sequence() appliedto actx_node object). Other columns contain context specific values whichdepend on the actual class of the tree and on additional parameters. In allimplementations ofcontexts(), setting the additional parameters to any nodefault value leads to adata.frame result.
Value
A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.
State order in a context
Notice that contexts are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. Set reverse toTRUE for the reverseconvention which is somewhat easier to relate to the way the context treesare represented bydraw() (i.e. recent values at the top the tree).
See Also
find_sequence() andfind_sequence.covlmc() for direct access toa specific context, andcontexts.ctx_tree(),contexts.vlmc() andcontexts.covlmc() for concrete implementations ofcontexts().
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)contexts(dts_tree)contexts(dts_tree, TRUE, TRUE)Contexts of a VLMC with covariates
Description
This function returns the different contexts present in a VLMC withcovariates, possibly with some associated data.
Usage
## S3 method for class 'covlmc'contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, metrics = FALSE, model = NULL, hsize = FALSE, merging = FALSE, ...)Arguments
ct | a fitted covlmc model. |
sequence | if |
reverse | logical (defaults to |
frequency | specifies the counts to be included in the resultdata.frame. The default value of |
positions | logical (defaults to FALSE). Specify whether the positionsof each context in the time series used to build the context tree should bereported in a |
local | specifies how the counts reported by |
metrics | if TRUE, adds predictive metrics for each context (see |
model | specifies whether to include the model associated to a eachcontext. The default result with |
hsize | if TRUE, adds a |
merging | if TRUE, adds a |
... | additional arguments for the contexts function. |
Details
The default behaviour of the function is to return a list of all thecontexts usingctx_node_covlmc objects (as returned byfind_sequence.covlmc()). The properties of the contexts can then beexplored using adapted functions such ascounts(),covariate_memory(),cutoff.ctx_node(),metrics.ctx_node(),model(),merged_with() andpositions().
Whensequence=TRUE the method returns a data.frame whose first column,namedcontext, contains the contexts as vectors (i.e. the value returnedbyas_sequence() applied to actx_node object). Other columns containcontext specific values specified by the additional parameters. Setting anyof those parameters to a value that ask for reporting information willtoggle the result type of the function todata.frame.
Seecontexts.ctx_tree() for details about thefrequency parameter. Whenmodel is nonNULL, the resultingdata.frame contains the modelsassociated to each context (either the full R model or its coefficients).Other columns are added is the corresponding parameters are set toTRUE.
Value
A list of classcontexts containing the contexts represented inthis tree (asctx_node_covlmc) or a data.frame.
Positions
A position of a contextctx in the time seriesx isan index valuet such that the context ends withx[t]. Thusx[t+1] isafter the context. For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (instandard state order), then the position ofctx inx is 3.
State order in a context
Notice that contexts are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. Set reverse toTRUE for the reverseconvention which is somewhat easier to relate to the way the context treesare represented bydraw() (i.e. recent values at the top the tree).
See Also
find_sequence() andfind_sequence.covlmc() for direct access toa specific context, andcontexts.ctx_tree(),contexts.vlmc() andcontexts.covlmc() for concrete implementations ofcontexts().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c(0, median(pc$active_power), max(pc$active_power))dts <- cut(pc$active_power, breaks = breaks)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)## direct representation with ctx_node_covlmc objectsm_cov_ctxs <- contexts(m_cov)m_cov_ctxssapply(m_cov_ctxs, covariate_memory)sapply(m_cov_ctxs, is_merged)sapply(m_cov_ctxs, model)## data.frame interfacecontexts(m_cov, model = "coef")contexts(m_cov, model = "full", hsize = TRUE)Contexts of a context tree
Description
This function extracts from a context tree a description of all of itscontexts.
Usage
## S3 method for class 'ctx_tree'contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ...)## S3 method for class 'ctx_tree_cpp'contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, ...)Arguments
ct | a context tree. |
sequence | if |
reverse | logical (defaults to |
frequency | specifies the counts to be included in the resultdata.frame. The default value of |
positions | logical (defaults to FALSE). Specify whether the positionsof each context in the time series used to build the context tree should bereported in a |
... | additional arguments for the contexts function. |
Details
The default behaviour of the function is to return a list of all thecontexts usingctx_node objects (as returned byfind_sequence()). Theproperties of the contexts can then be explored using adapted functionssuch ascounts() andpositions().
Whensequence=TRUE the method returns a data.frame whose first column,namedcontext, contains the contexts as vectors (i.e. the value returnedbyas_sequence() applied to actx_node object). Other columns containcontext specific values specified by the additional parameters. Setting anyof those parameters to a value that ask for reporting information willtoggle the result type of the function todata.frame.
Iffrequency="total", an additional column namedfreq gives the numberof occurrences of each context in the series used to build the tree. Iffrequency="detailed", one additional column is added per state in thecontext space. Each column records the number of times a given context isfollowed by the corresponding value in the original series.
Value
A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.
Positions
A position of a contextctx in the time seriesx isan index valuet such that the context ends withx[t]. Thusx[t+1] isafter the context. For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (instandard state order), then the position ofctx inx is 3.
State order in a context
Notice that contexts are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. Set reverse toTRUE for the reverseconvention which is somewhat easier to relate to the way the context treesare represented bydraw() (i.e. recent values at the top the tree).
See Also
find_sequence() andfind_sequence.covlmc() for direct access toa specific context, andcontexts.ctx_tree(),contexts.vlmc() andcontexts.covlmc() for concrete implementations ofcontexts().
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)## direct representation with ctx_node objectscontexts(dts_tree)## data.frame formatcontexts(dts_tree, sequence = TRUE)contexts(dts_tree, frequency = "total")contexts(dts_tree, frequency = "detailed")Contexts of a VLMC
Description
This function extracts all the contexts from a fitted VLMC, possibly withsome associated data.
Usage
## S3 method for class 'vlmc'contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ...)## S3 method for class 'vlmc_cpp'contexts( ct, sequence = FALSE, reverse = FALSE, frequency = NULL, positions = FALSE, local = FALSE, cutoff = NULL, metrics = FALSE, ...)Arguments
ct | a context tree. |
sequence | if |
reverse | logical (defaults to |
frequency | specifies the counts to be included in the resultdata.frame. The default value of |
positions | logical (defaults to FALSE). Specify whether the positionsof each context in the time series used to build the context tree should bereported in a |
local | specifies how the counts reported by |
cutoff | specifies whether to include the cut off value associated toeach context (see |
metrics | if TRUE, adds predictive metrics for each context (see |
... | additional arguments for the contexts function. |
Details
The default behaviour of the function is to return a list of all thecontexts usingctx_node objects (as returned byfind_sequence()). Theproperties of the contexts can then be explored using adapted functionssuch ascounts(),cutoff.ctx_node(),metrics.ctx_node() andpositions().
Whensequence=TRUE the method returns a data.frame whose first column,namedcontext, contains the contexts as vectors (i.e. the value returnedbyas_sequence() applied to actx_node object). Other columns containcontext specific values specified by the additional parameters. Setting anyof those parameters to a value that ask for reporting information willtoggle the result type of the function todata.frame.
Thefrequency parameter is described in details in the documentation ofcontexts.ctx_tree(). Whencutoff is nonNULL, the resultingdata.frame contains acutoff column with the cut off values, either inquantile or in native scale. Seecutoff.vlmc() andprune.vlmc() for thedefinitions of cut off values and of the two scales.
Value
A list of classcontexts containing the contexts represented inthis tree (asctx_node) or a data.frame.
Cut off values
The cut off values reported bycontexts.vlmc canbe different from the ones reported bycutoff.vlmc() for three reasons:
cutoff.vlmc()reports only useful cut off values, i.e., cut off valuesthat should induce a simplification of the VLMC when used inprune().This exclude cut off values associated to simple contexts that are smallerthan the ones of their descendants in the context tree. Those values arereported bycontext.vlmc.context.vlmcreports only cut off values of actual contexts, whilecutoff.vlmc()reports cut off values for all nodes of the context tree.values are not modified to induce pruning, contrarily to the defaultbehaviour of
cutoff.vlmc()
Positions
A position of a contextctx in the time seriesx isan index valuet such that the context ends withx[t]. Thusx[t+1] isafter the context. For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (instandard state order), then the position ofctx inx is 3.
State order in a context
Notice that contexts are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. Set reverse toTRUE for the reverseconvention which is somewhat easier to relate to the way the context treesare represented bydraw() (i.e. recent values at the top the tree).
See Also
find_sequence() andfind_sequence.covlmc() for direct access toa specific context, andcontexts.ctx_tree(),contexts.vlmc() andcontexts.covlmc() for concrete implementations ofcontexts().
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)model <- vlmc(dts, alpha = 0.5)## direct representation with ctx_node objectsmodel_ctxs <- contexts(model)model_ctxssapply(model_ctxs, cutoff, scale = "quantile")sapply(model_ctxs, cutoff, scale = "native")sapply(model_ctxs, function(x) metrics(x)$accuracy)## data.frame formatcontexts(model, frequency = "total")contexts(model, cutoff = "quantile")contexts(model, cutoff = "native", metrics = TRUE)Report the distribution of values that follow occurrences of a sequence
Description
This function reports the number of occurrences of the sequence representedbynode in the original time series used to build the associated contexttree (not including a possible final occurrence not followed by any value atthe end of the original time series). In addition iffrequency=="detailed",the function reports the frequencies of each of the possible value of thetime series when they appear just after the sequence.
Usage
counts(node, frequency = c("detailed", "total"), local = FALSE)## S3 method for class 'ctx_node'counts(node, frequency = c("detailed", "total"), local = FALSE)## S3 method for class 'ctx_node_cpp'counts(node, frequency = c("detailed", "total"), local = FALSE)Arguments
node | a |
frequency | specifies the counts to be included in the result. |
local | specifies how the counts are computed. When |
Value
either an integer whenfrequency="total" which gives the totalnumber of occurrences of the sequence represented bynode or adata.frame with atotal column with the same value and a column foreach of the possible value of the original time series, reporting counts ineach column (see the description above).
See Also
contexts() andcontexts.ctx_tree()
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)subseq <- find_sequence(dts_tree, factor(c("A", "A"), levels = c("A", "B", "C")))if (!is.null(subseq)) { counts(subseq)}Maximal covariate memory of a VLMC with covariates
Description
This function return the longest covariate memory used by a VLMCwith covariates.
Usage
covariate_depth(model)Arguments
model | a covlmc object |
Value
the longest covariate memory of this model
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))m_nocovariate <- vlmc(dts)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)covariate_depth(m_cov)Covariate memory length for a COVLMC context
Description
This function returns the length of the memory of a COVLMC context representedby actx_node_covlmc object.
Usage
covariate_memory(node)Arguments
node | A |
Value
the memory length, an integer
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)ctxs <- contexts(m_cov)## get all the memory lengthssapply(ctxs, covariate_memory)Fit a Variable Length Markov Chain with Covariates (coVLMC)
Description
This function fits a Variable Length Markov Chain with covariates (coVLMC)to a discrete time series coupled with a time series of covariates.
Usage
covlmc( x, covariate, alpha = 0.05, min_size = 5L, max_depth = 100L, keep_data = TRUE, control = covlmc_control(...), ...)Arguments
x | a discrete time series; can be numeric, character, factor or logical. |
covariate | a data frame of covariates. |
alpha | number in (0,1) (default: 0.05) cut off value in the pruningphase (in quantile scale). |
min_size | number >= 1 (default: 5). Tune the minimum number ofobservations for a context in the growing phase of the context tree (seebelow for details). |
max_depth | integer >= 1 (default: 100). Longest context considered ingrowing phase of the context tree. |
keep_data | logical (defaults to |
control | a list with control parameters, see |
... | arguments passed to |
Details
The model is built using the algorithm described in Zanin Zambom et al. Asfor thevlmc() approach, the algorithm builds first a context tree (seectx_tree()). Themin_size parameter is used to compute the actual numberof observations per context in the growing phase of the tree. It is computedasmin_size*(1+ncol(covariate)*d)*(s-1) whered is the length of thecontext (a.k.a. the depth in the tree) ands is the number of states. Thiscorresponds to ensuring min_size observations per parameter of the logisticregression during the estimation phase.
Then logistic models are adjusted in the leaves at the tree: the goal of eachlogistic model is to estimate the conditional distribution of the next stateof the times series given the context (the recent past of the time series)and delayed versions of the covariates. A pruning strategy is used tosimplified the models (mainly to reduce the time window associated to thecovariates) and the tree itself.
Parameters specified bycontrol are used to fine tune the behaviour of thealgorithm.
Value
a fitted covlmc model.
Logistic models
By default,covlmc uses two different computingengines for logisticmodels:
when the time series has only two states,
covlmcusesstats::glm()with a binomial link (stats::binomial());when the time series has at least threestates,
covlmcuseVGAM::vglm()with a multinomial link(VGAM::multinomial()).
Both engines are able to detect degenerate cases and lead to more robustresults that usingnnet::multinom(). It is nevertheless possible toreplacestats::glm() andVGAM::vglm() withnnet::multinom() by settingthe global optionmixvlmc.predictive to"multinom" (the default value is"glm"). Notice that while results should be comparable, there is noguarantee that they will be identical.
References
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markov chains." Ann.Statist. 27 (2) 480-513doi:10.1214/aos/1018031204
Zanin Zambom, A., Kim, S. and Lopes Garcia, N. (2022), "Variable length Markov chainwith exogenous covariates." J. Time Ser. Anal., 43 (2)312-328doi:10.1111/jtsa.12615
See Also
cutoff.covlmc() andprune.covlmc() for post-pruning.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(1 / 3, 2 / 3, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 15)draw(m_cov)withr::with_options( list(mixvlmc.predictive = "multinom"), m_cov_nnet <- covlmc(dts, dts_cov, min_size = 15))draw(m_cov_nnet)Control for coVLMC fitting
Description
This function creates a list with parameters used to fine tune the coVLMCfitting algorithm.
Usage
covlmc_control(pseudo_obs = 1)Arguments
pseudo_obs | number of fake observations of each state to add to theobserved ones. |
Details
pseudo_obs is used to regularize the probability estimations when acontext is only observed followed by always the same state. Transitionprobabilities are computed after addingpseudo_obs pseudo observationsof each of the states (including the observed one). This corresponds to aBayesian posterior mean estimation with a Dirichlet prior.
Value
a list.
Examples
dts <- rep(c(0, 1), 100)dts_cov <- data.frame(y = rep(0, length(dts)))default_model <- covlmc(dts, dts_cov)contexts(default_model, type = "data.frame", model = "coef")$coefcontrol <- covlmc_control(pseudo_obs = 10)model <- covlmc(dts, dts_cov, control = control)contexts(model, type = "data.frame", model = "coef")$coefBuild a context tree for a discrete time series
Description
This function builds a context tree for a time series.
Usage
ctx_tree( x, min_size = 2L, max_depth = 100L, keep_position = TRUE, backend = getOption("mixvlmc.backend", "R"))Arguments
x | a discrete time series; can be numeric, character, factor orlogical. |
min_size | integer >= 1 (default: 2). Minimum number of observations fora context to be included in the tree. |
max_depth | integer >= 1 (default: 100). Maximum length of a context tobe included in the tree. |
keep_position | logical (default: TRUE). Should the context tree keepthe position of the contexts. |
backend | "R" or "C++" (default: as specified by the "mixvlmc.backend"option). Specifies the implementation used to represent the context treeand to built it. See details. |
Details
The tree represents all the sequences of symbols/states of length smallerthanmax_depth that appear at leastmin_size times in the time series andstores the frequencies of the states that follow each context. Optionally,the positions of the contexts in the time series can be stored in the tree.
Value
a context tree (of class that inherits fromctx_tree).
Back ends
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists)that be easily processed further in pure R (C++ helper functions are usedto speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end isconsidered experimental. The tree is built with an optimised suffix treealgorithm which speeds up the construction by at least a factor 10 instandard settings. As the tree is kept outside of R direct reach, contexttrees built with the C++ back end must be restored after a
saveRDS()/readRDS()sequence. This is done automatically by recomputingcompletely the context tree.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)## get all contexts of length 2dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)draw(dts_ctree)Cut off values for VLMC like model
Description
This generic function returns one or more cut off values that are guaranteedto have an effect on themodel passed to the function when a simplificationprocedure is applied (in general a tree pruning operation as provided byprune()).
Usage
cutoff(model, ...)Arguments
model | a model. |
... | additional arguments for the cutoff function implementations |
Details
The exact definition of what is a cut off value depends on the model type andis documented in concrete implementation of the function.
Value
a cut off value or a vector of cut off values.
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)model_cuts <- cutoff(model)model_2 <- prune(model, model_cuts[2])draw(model_2)Cut off values for pruning the context tree of a VLMC with covariates
Description
This function returns all the cut off values that should induce a pruning ofthe context tree of a VLMC with covariates.
Usage
## S3 method for class 'covlmc'cutoff(model, raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)Arguments
model | a fitted COVLMC model. |
raw | specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails) |
tolerance | specify the minimum separation between two consecutivevalues of the cut off in native mode (before any transformation). Seedetails. |
... | additional arguments for the |
Details
Notice that the list of cut off values returned by the function is not ascomplete as the one computed for a VLMC without covariates. Indeed, pruningthe COVLMC tree creates new pruning opportunities that are not evaluatedduring the construction of the initial model, while all pruning opportunitiesare computed during the construction of a VLMC context tree. Nevertheless,the largest value returned by the function is guaranteed to produce the leastpruned tree consistent with the reference one.
For large COVLMC, some cut off values can be almost identical, with adifference of the order of the machine epsilon value. Thetoleranceparameter is used to keep only values that are different enough. This is donein the quantile scale, before transformations implemented whenraw isFALSE.
Notice that the loglikelihood scale is not directly useful in COVLMC as thedifferences in model sizes are not constant through the pruning process. As aconsequence, this function does not providemode parameter, contrarily tocutoff.vlmc().
Settingraw toTRUE removes the small perturbation that are subtractedfrom the log-likelihood ratio values computed from the COVLMC (in quantilescale).
As automated model selection is provided bytune_covlmc(), the direct use ofcutoff should be reserved to advanced exploration of the set of trees thatcan be obtained from a complex one, e.g. to implement model selectiontechniques that are not provided bytune_covlmc().
Value
a vector of cut off values,NULL if none can be computed
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))m_nocovariate <- vlmc(dts)draw(m_nocovariate)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)draw(m_cov)cutoff(m_cov)Cut off value for pruning a node in the context tree of a VLMC
Description
This function returns the cut off value associated to a specific node in thecontext tree interpreted as a VLMC. The node is represented by actx_nodeobject as returned byfind_sequence() orcontexts(). For details, seecutoff.vlmc().
Usage
## S3 method for class 'ctx_node'cutoff(model, scale = c("quantile", "native"), raw = FALSE, ...)Arguments
model | a |
scale | specify whether the results should be "native" log likelihoodratio values or expressed in a "quantile" scale of a chi-squareddistribution (defaults to "quantile"). |
raw | specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails in |
... | additional arguments for the |
Value
a cut off value
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)model_ctxs <- contexts(model)cutoff(model_ctxs[[1]])cutoff(model_ctxs[[2]], scale = "native", raw = TRUE)Cut off values for pruning the context tree of a VLMC
Description
This function returns a collection of cut off values that are guaranteed toinduce all valid pruned trees of the context tree of a VLMC. Pruning isimplemented by theprune() function.
Usage
## S3 method for class 'vlmc'cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)## S3 method for class 'vlmc_cpp'cutoff( model, scale = c("quantile", "native"), raw = FALSE, tolerance = .Machine$double.eps^0.5, ...)Arguments
model | a fitted VLMC model. |
scale | specify whether the results should be "native" log likelihoodratio values or expressed in a "quantile" scale of a chi-squareddistribution (defaults to "quantile"). |
raw | specify whether the returned values should be limit valuescomputed in the model or modified values that guarantee pruning (seedetails) |
tolerance | specify the minimum separation between two consecutivevalues of the cut off in native mode (before any transformation). Seedetails. |
... | additional arguments for the cutoff function. |
Details
By default, the function returns values that can be used directly to inducepruning in the context tree. This is done by computing the log likelihoodratios used by the context algorithm on the reference VLMC and by keeping therelevant ones. From them the function selects intermediate values that areguaranteed to generate via pruning all the VLMC models that could begenerated by using larger values of thecutoff parameter that was used tobuild the reference model (or smaller values of thealpha parameter in"quantile" scale).
Setting theraw parameter toTRUE removes this operation on the valuesand asks the function to return the relevant log likelihood ratios.
For large VLMC, some log likelihood ratios can be almost identical, with adifference of the order of the machine epsilon value. Thetoleranceparameter is used to keep only values that are different enough. This is donein the native scale, before transformations implemented whenraw isFALSE.
As automated model selection is provided bytune_vlmc(), the direct use ofcutoff should be reserved to advanced exploration of the set of trees thatcan be obtained from a complex one, e.g. to implement model selectiontechniques that are not provided bytune_vlmc().
Value
a vector of cut off values.
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)model_cuts <- cutoff(model)model_2 <- prune(model, model_cuts[2])draw(model_2)Depth of a context tree
Description
This function returns the depth of a context tree, i.e. the length of thelongest context represented in the tree.
Usage
depth(ct)Arguments
ct | a context tree. |
Value
the depth of the tree.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)## should be 3depth(dts_ctree)Text based representation of a context tree
Description
This function 'draws' a context tree as a text.
Usage
draw(ct, control = draw_control(), ...)Arguments
ct | a context tree. |
control | a list of low level control parameters of the textrepresentation. See details and |
... | additional arguments for draw. |
Details
The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().
In addition to the structure of the context tree,draw can representinformation attached to the node (contexts and partial contexts). This iscontrolled by additional parameters depending on the type of the contexttree.
Value
the context tree (invisibly).
Examples
dts <- sample(c(0, 1), 100, replace = TRUE)ctree <- ctx_tree(dts, min_size = 10, max_depth = 2)draw(ctree)dts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE)ctree_c <- ctx_tree(dts_c, min_size = 10, max_depth = 2)draw(ctree_c, draw_control(root = "x"))Text based representation of a covlmc model
Description
This function 'draws' a context tree as a text.
Usage
## S3 method for class 'covlmc'draw( ct, control = draw_control(), model = c("coef", "full"), p_value = TRUE, digits = 4, with_state = FALSE, ...)Arguments
ct | a fitted covlmc model. |
control | a list of low level control parameters of the textrepresentation. See details and |
model | this parameter controls the display of logistic modelsassociated to nodes. The default |
p_value | specifies whether the p-values of the likelihood ratio testsconducted during the covlmc construction must be included in therepresentation. |
digits | numerical parameters and p-values are represented using thebase::signif function, using the number of significant digits specifiedwith this parameter. |
with_state | specifies whether to display the state associated to eachdimension of the logistic model (see details). |
... | additional arguments for draw. |
Details
The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().
In addition to the structure of the context tree,draw can representinformation attached to the node (contexts and partial contexts). This iscontrolled by additional parameters depending on the type of the contexttree.
Value
the context tree (invisibly).
Tweaking model representation
Model representations are affected by the following additional parameter:
time_sep: character(s) used to split the coefficients list by blocksassociated to time delays in the covariate inclusion into the logisticmodel. The first block contains the intercept(s), the second block thecovariate values a time t-1, the third block at time t-2, etc.
Variable representation
Whenmodel="full", the representation includes the names of the variablesused by the logistic models. Names are the one generated by the underlyinglogistic model, e.g.stats::glm(). Numerical variable names are used asis, while factors have levels appended. The intercept is denoted(I) tosave space. The time delays are represented by an underscore followed bythe time delay. For instance if the model uses the numerical covariateywith two delays, it will appear as to variablesy_1 andy_2.
State representation
Whenmodel is notNULL, the coefficients of the logistic models arepresented, organized in rows associated to states. One state is used as thereference state and the logistic model aims at predicting the ratio ofprobability between another state and the reference one (in log scale).Whenwith_state isTRUE, the display includes for each row ofcoefficients the target state. This is useful when using e.g.VGAM::vglmas unused levels of the target variable will be automatically dropped fromthe model, leading to a reduce number of rows. The reference state iseither shown on the first row ifmodel is"full" or after the state oneach row ifmodel is"coef".
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)draw(m_cov, digits = 3)draw(m_cov, model = NULL)draw(m_cov, p_value = FALSE)draw(m_cov, p_value = FALSE, time_sep = " | ")draw(m_cov, model = "full", time_sep = " | ")Text based representation of a context tree
Description
This function 'draws' a context tree as a text.
Usage
## S3 method for class 'ctx_tree_cpp'draw(ct, control = draw_control(), frequency = NULL, ...)## S3 method for class 'ctx_tree'draw(ct, control = draw_control(), frequency = NULL, ...)Arguments
ct | a context tree. |
control | a list of low level control parameters of the textrepresentation. See details and |
frequency | this parameter controls the display of node levelinformation in the tree. The default |
... | additional arguments for draw. |
Details
The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().
In addition to the structure of the context tree,draw can representinformation attached to the node (contexts and partial contexts). This iscontrolled by additional parameters depending on the type of the contexttree.
Value
the context tree (invisibly).
Examples
dts_c <- sample(c("A", "B", "CD"), 100, replace = TRUE)ctree_c <- ctx_tree(dts_c, min_size = 10, max_depth = 2)draw(ctree_c, frequency = "total")draw(ctree_c, frequency = "detailed")Text based representation of a vlmc
Description
This function 'draws' a context tree as a text.
Usage
## S3 method for class 'vlmc'draw(ct, control = draw_control(), prob = TRUE, ...)## S3 method for class 'vlmc_cpp'draw(ct, control = draw_control(), prob = TRUE, ...)Arguments
ct | a fitted vlmc. |
control | a list of low level control parameters of the textrepresentation. See details and |
prob | this parameter controls the display of node level information inthe tree. The default |
... | additional arguments for draw. |
Details
The function uses basic "ascii art" to represent the context tree. Charactersused to represent the structure of the tree, e.g. branches, can be modifiedusingdraw_control().
In addition to the structure of the context tree,draw can representinformation attached to the node (contexts and partial contexts). This iscontrolled by additional parameters depending on the type of the contexttree.
Value
the context tree (invisibly).
Examples
dts <- sample(c("A", "B", "C"), 500, replace = TRUE)model <- vlmc(dts, alpha = 0.05)draw(model)draw(model, prob = FALSE)draw(model, prob = NULL)Control parameters fordraw
Description
This function returns a list used to fine tune thedraw() function behaviour.
Usage
draw_control( root = "*", first_node = "+", next_node = "'", vbranch = "|", hbranch = "--", open_ct = "(", close_ct = ")")Arguments
root | character used for the root node. |
first_node | characters used for the first child of a node. |
next_node | characters used for other children of a node. |
vbranch | characters used to represent a branch in a vertical way. |
hbranch | characters used to represent a branch in a horizontal was. |
open_ct | characters used to start each node specific text representation. |
close_ct | characters used to end each node specific text representation. |
Value
a list
Examples
draw_control(open_ct = "[", close_ct = "]")Find the node of a sequence in a context tree
Description
This function checks whether the sequencectx is represented in the contexttreect. If this is the case, it returns a description of matching node, anobject of classctx_node. If the sequence is not represented in the tree,the function returnNULL.
Usage
find_sequence(ct, ctx, reverse = FALSE, ...)## S3 method for class 'ctx_tree'find_sequence(ct, ctx, reverse = FALSE, ...)## S3 method for class 'ctx_tree_cpp'find_sequence(ct, ctx, reverse = FALSE, ...)Arguments
ct | a context tree. |
ctx | a sequence to search in the context tree |
reverse | specifies whether the sequence |
... | additional parameters for the find_sequence function |
Details
The function looks for sequences in general. Theis_context() function canbe used on the resulting object to test if the sequence is in addition aproper context.
Value
an object of classctx_node if the sequencectx is representedin the context tree,NULL when this is not the case.
State order in a sequence
sequence are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. In the present function,reverse refersboth to the order used for thectx parameter and for the default order used by the resultingctx_node object.
Examples
dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)find_sequence(dts_tree, "A")## returns NULL as "A" "C" does not appear in dtsfind_sequence(dts_tree, c("A", "C"))Find the node of a sequence in a COVLMC context tree
Description
This function checks whether the sequencectx is represented in the contexttree of the COVLMC modelct. If this is the case, it returns a descriptionof matching node, an object of classctx_node_covlmc. If the sequence isnot represented in the tree, the function returnNULL.
Usage
## S3 method for class 'covlmc'find_sequence(ct, ctx, reverse = FALSE, ...)Arguments
ct | a context tree. |
ctx | a sequence to search in the context tree |
reverse | specifies whether the sequence |
... | additional parameters for the find_sequence function |
Details
The function looks for sequences in general. Theis_context() function canbe used on the resulting object to test if the sequence is in addition aproper context.
Value
an object of classctx_node_covlmc if the sequencectx is representedin the context tree,NULL when this is not the case
State order in a sequence
sequence are given by defaultin the temporal order and not in the "reverse" order used by many VLMCresearch papers: older values are on the left. For instance, the contextc(1, 0) is reported if the sequence 0, then 1 appeared in the time seriesused to build the context tree. In the present function,reverse refersboth to the order used for thectx parameter and for the default order used by the resultingctx_node object.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)## not in the treevals <- states(m_cov)find_sequence(m_cov, c(vals[2], vals[2]))## in the tree but not a contextnode <- find_sequence(m_cov, c(vals[1]))nodeis_context(node)## in the tree and a contextnode <- find_sequence(m_cov, c(vals[1], vals[1]))nodeis_context(node)model(node)Significant Earthquake Dataset
Description
A data set containing Earthquake that have occured during the period of1900-2022 with GPS coordinates and magnitudes.
Usage
globalearthquakeFormat
A data frame with 98785 rows and 12 variables:
- date_time
Date and time in POSIXct format
- latitude
latitude of the earthquake, from -90° to 90°
- longitude
longitude of the earthquake, from -180° to 180°
- mag
the magnitude of the earthquake, indicating its strenth
- Date
date when the seisme occured
- nbweeks
number of weeks since 1900/01/01
- year
year
- month
month of the year
- month_day
day of the month
- week
week number
- week_day
day of the week from 1 = Sunday to 7 = Saturday
- year_day
day of the year from 1 to 366
Details
This is a compiled version of the full data set available onU.S. Geological Survey Earthquake Events(USGS) which is in thepublic domain.
The data set contains only the earthquake between 1900 and 2022with a magnitude higher than 5.
Source
Earthquake Catalog, U.S. Geological Survey, Department of the Interior.https://www.usgs.gov/programs/earthquake-hazards
Report the nature of a node in a context tree
Description
This function returnsTRUE if the node is a proper context,FALSEin the other case.
Usage
is_context(node)Arguments
node | a |
Value
TRUE if the nodenode is a proper context,FALSE when this is not the case
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)draw(dts_ctree)## 0, 0 is a context but 1, 0 is notis_context(find_sequence(dts_ctree, c(0, 0)))is_context(find_sequence(dts_ctree, c(1, 0)))Test if the object is a covlmc model
Description
This function returnsTRUE for VLMC models with covariates andFALSE for other objects.
Usage
is_covlmc(x)Arguments
x | an R object. |
Value
TRUE for VLMC models with covariates.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)# should be trueis_ctx_tree(m_cov)# should be trueis_covlmc(m_cov)# should be falseis_vlmc(m_cov)Test if the object is a context tree
Description
This function returnsTRUE for context trees andFALSE for other objects.
Usage
is_ctx_tree(x)Arguments
x | an R object. |
Value
TRUE for context trees.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)is_ctx_tree(dts_ctree)is_ctx_tree(dts)Merging status of a COVLMC context
Description
The function returnsTRUE if the context represented by this node is mergedwith at least another one andFALSE if this is not the case.
Usage
is_merged(node)Arguments
node | A |
Details
When a COVLMC is built on a time series with at least three distinct states,some contexts can be merged: they use the same logistic model, leading to amore parsimonious model. Those contexts are reported individually byfunctions such ascontexts.covlmc(). The present function can be usedto detect such merging, whilemerged_with() can be used to recover theother contexts.
Value
TRUE or FALSE, depending on the nature of the context
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 15, ]dts <- cut(pc$active_power, breaks = c(0, 1, 2, 3, 8))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, alpha = 0.1)ctxs <- contexts(m_cov)## no mergingsapply(ctxs, is_merged)Report the ordering convention of the node
Description
This function returnsTRUE if the node is using a reverse temporal orderingandFALSE in the other case.
Usage
is_reversed(node)Arguments
node | a |
Value
TRUE if the nodenode use a reverse temporal ordering,FALSEwhen this is not the case
See Also
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)is_reversed(find_sequence(dts_ctree, c(0, 0)))is_reversed(find_sequence(dts_ctree, c(1, 0), reverse = TRUE))Test if the object is a vlmc model
Description
This function returnsTRUE for VLMC models andFALSE for other objects.
Usage
is_vlmc(x)Arguments
x | an R object. |
Value
TRUE for VLMC models.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)# should be trueis_ctx_tree(model)# should be trueis_vlmc(model)# should be falseis_covlmc(model)Log-Likelihood of a VLMC with covariates
Description
This function evaluates the log-likelihood of a VLMC with covariatesfitted on a discrete time series.
Usage
## S3 method for class 'covlmc'logLik(object, initial = c("truncated", "specific", "extended"), ...)Arguments
object | the covlmc representation. |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Defaults to |
... | additional parameters for logLik. |
Value
an object of classlogLik. This is a number, the log-likelihood ofthe (CO)VLMC with the following attributes:
df: the number of parameters used by the VLMC for this likelihood calculationnobs: the number of observations included in this likelihood calculationinitial: the value of theinitialparameter used to compute this likelihood
See Also
Examples
## Likelihood for a fitted VLMC with covariates.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)ll <- logLik(m_cov)attributes(ll)Log-Likelihood of a VLMC
Description
This function evaluates the log-likelihood of a VLMC fitted on a discretetime series.
Usage
## S3 method for class 'vlmc'logLik(object, initial = c("truncated", "specific", "extended"), ...)## S3 method for class 'vlmc_cpp'logLik(object, initial = c("truncated", "specific", "extended"), ...)Arguments
object | the vlmc representation. |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Defaults to |
... | additional parameters for logLik. |
Value
an object of classlogLik. This is a number, the log-likelihood ofthe (CO)VLMC with the following attributes:
df: the number of parameters used by the VLMC for this likelihood calculationnobs: the number of observations included in this likelihood calculationinitial: the value of theinitialparameter used to compute this likelihood
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)m_nocovariate <- vlmc(dts)ll <- logLik(m_nocovariate)llattributes(ll)Log-Likelihood of a VLMC
Description
This function evaluates the log-likelihood of a VLMC fitted on a discrete time series.When the optional argumentnewdata is provided, the function evaluates instead thelog-likelihood for this (new) discrete time series.
Usage
loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ...)## S3 method for class 'vlmc'loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ...)## S3 method for class 'vlmc_cpp'loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, ...)Arguments
vlmc | the vlmc representation. |
newdata | an optional discrete time series. |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated are integratedin the likelihood. Defaults to |
ignore | specifies the number of initial values for which the loglikelihoodwill not be computed. The minimal number depends on the likelihood function asdetailed below. |
... | additional parameters for loglikelihood. |
Details
The definition of the likelihood function depends on the value of theinitial parameters, see the section below as well as the dedicatedvignette:vignette("likelihood", package = "mixvlmc").
For VLMC objects, the methodloglikelihood.vlmc will be used. For VLMC with covariables,loglikelihood.covlmcwill instead be called. For more informations onloglikelihood methods, usemethods(loglikelihood) and their associated documentation.
Value
an object of classlogLikMixVLMC andlogLik. This is a number,the log-likelihood of the (CO)VLMC with the following attributes:
df: the number of parameters used by the VLMC for this likelihood calculationnobs: the number of observations included in this likelihood calculationinitial: the value of theinitialparameter used to compute this likelihood
likelihood calculation
In a (CO)VLMC ofdepth()=k, we need k past values in order to compute thecontext of a given observation. As a consequence, in a time seriesx, thecontexts ofx[1] tox[k] are unknown. Depending on the value ofinitialdifferent likelihood functions are used to tackle this difficulty:
initial=="truncated": the likelihood is computed using onlyx[(k+1):length(x)]initial=="specific": the likelihood is computed on the full time seriesusing a specific context for the initial values,x[1]tox[k]. Each ofthe specific context is unique, leading to a perfect likelihood of 1 (0 inlog scale). Thus the numerical value of the likelihood is identical as theone obtained withinitial=="truncated"but it is computed onlength(x)with a model with more parameters than in this previous case.initial=="extended"(default): the likelihood is computed on the full time seriesusing an extended context matching for the initial values,x[1]tox[k].This can be seen as a compromised between the two other possibilities:the relaxed context matching needs in general to turn internal nodesof the context tree into actual context, increasing the number of parameters,but not as much as with "specific". However, the likelihood of sayx[1]with an empty context is generally not 1 and thus the full likelihood issmaller than the one computed with "specific".
In all cases, theignore first values of the time series are not includedin the computed likelihood, but still used to compute contexts. Ifignoreis not specified, it is set to the minimal possible value, that is k for thetruncated likelihood and 0 for the other ones. If it is specified, it mustbe larger or equal to k fortruncated.
See the dedicated vignette for a more mathematically oriented discussion:vignette("likelihood", package = "mixvlmc").
See Also
Examples
## Likelihood for a fitted VLMC.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)m_nocovariate <- vlmc(dts)ll <- loglikelihood(m_nocovariate)llattr(ll, "nobs")attr(ll, "df")## Likelihood for a new time series with previously fitted VLMC.pc_new <- powerconsumption[powerconsumption$week == 11, ]dts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels)ll_new <- loglikelihood(m_nocovariate, newdata = dts_new)ll_newattributes(ll_new)ll_new_specific <- loglikelihood(m_nocovariate, initial = "specific", newdata = dts_new)ll_new_specificattributes(ll_new_specific)ll_new_extended <- loglikelihood(m_nocovariate, initial = "extended", newdata = dts_new)ll_new_extendedattributes(ll_new_extended)Log-Likelihood of a VLMC with covariates
Description
This function evaluates the log-likelihood of a VLMC with covariates fittedon a discrete time series. When the optional argumentsnewdata isprovided, the function evaluates instead the log-likelihood for this (new)discrete time series on the new covariates which must be provided through thenewcov parameter.
Usage
## S3 method for class 'covlmc'loglikelihood( vlmc, newdata, initial = c("truncated", "specific", "extended"), ignore, newcov, ...)Arguments
vlmc | the covlmc representation. |
newdata | an optional discrete time series. |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated are integratedin the likelihood. Defaults to |
ignore | specifies the number of initial values for which the loglikelihoodwill not be computed. The minimal number depends on the likelihood function asdetailed below. |
newcov | an optional data frame with the new values for the covariates. |
... | additional parameters for loglikelihood. |
Details
The definition of the likelihood function depends on the value of theinitial parameters, see the section below as well as the dedicatedvignette:vignette("likelihood", package = "mixvlmc").
Value
an object of classlogLikMixVLMC andlogLik. This is a number,the log-likelihood of the (CO)VLMC with the following attributes:
df: the number of parameters used by the VLMC for this likelihood calculationnobs: the number of observations included in this likelihood calculationinitial: the value of theinitialparameter used to compute this likelihood
likelihood calculation
In a (CO)VLMC ofdepth()=k, we need k past values in order to compute thecontext of a given observation. As a consequence, in a time seriesx, thecontexts ofx[1] tox[k] are unknown. Depending on the value ofinitialdifferent likelihood functions are used to tackle this difficulty:
initial=="truncated": the likelihood is computed using onlyx[(k+1):length(x)]initial=="specific": the likelihood is computed on the full time seriesusing a specific context for the initial values,x[1]tox[k]. Each ofthe specific context is unique, leading to a perfect likelihood of 1 (0 inlog scale). Thus the numerical value of the likelihood is identical as theone obtained withinitial=="truncated"but it is computed onlength(x)with a model with more parameters than in this previous case.initial=="extended"(default): the likelihood is computed on the full time seriesusing an extended context matching for the initial values,x[1]tox[k].This can be seen as a compromised between the two other possibilities:the relaxed context matching needs in general to turn internal nodesof the context tree into actual context, increasing the number of parameters,but not as much as with "specific". However, the likelihood of sayx[1]with an empty context is generally not 1 and thus the full likelihood issmaller than the one computed with "specific".
In all cases, theignore first values of the time series are not includedin the computed likelihood, but still used to compute contexts. Ifignoreis not specified, it is set to the minimal possible value, that is k for thetruncated likelihood and 0 for the other ones. If it is specified, it mustbe larger or equal to k fortruncated.
See the dedicated vignette for a more mathematically oriented discussion:vignette("likelihood", package = "mixvlmc").
See Also
Examples
## Likelihood for a fitted VLMC with covariates.pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)ll <- loglikelihood(m_cov)llattr(ll, "nobs")## Likelihood for new time series and covariates with previously## fitted VLMC with covariatespc_new <- powerconsumption[powerconsumption$week == 11, ]dts_new <- cut(pc_new$active_power, breaks = breaks, labels = labels)dts_cov_new <- data.frame(day_night = (pc_new$hour >= 7 & pc_new$hour <= 17))ll_new <- loglikelihood(m_cov, newdata = dts_new, newcov = dts_cov_new)ll_newattributes(ll_new)Merged contexts in a COVLMC
Description
The function returnsNULL when the context represented by thenodeparameter is not merged with another context (seeis_merged()). In theother case, it returns a list of contexts with which this one is merged.
Usage
merged_with(node)Arguments
node | A |
Details
If the context is merged, the function returns a list with one value for eachelement in the state space (seestates()). The value isNULL if thecorresponding context is not merged with thenode context, while it is actx_node_covlmc object in the other case. A context merged withnodediffers from the context represented bynode only in its last value (intemporal order) which is used as its name in the list. For instance, if thecontextABC is merged only withCBC (when represented in temporalordering), then the resulting list is of the formlist("A" = NULL, "B" = NULL, "C"= ctx_node_covlmc(CBX)).
Value
NULL or a list of contexts merged withnode represented byctx_node_covlmc objects
See Also
Examples
pc_week_15_16 <- powerconsumption[powerconsumption$week %in% c(15, 16), ]elec <- pc_week_15_16$active_powerelec_dts <- cut(elec, breaks = c(0, 0.4, 2, 8), labels = c("low", "typical", "high"))elec_cov <- data.frame(day = (pc_week_15_16$hour >= 7 & pc_week_15_16$hour <= 18))elec_tune <- tune_covlmc(elec_dts, elec_cov, min_size = 5)elec_model <- prune(as_covlmc(elec_tune), alpha = 3.961e-10)ctxs <- contexts(elec_model)for (ctx in ctxs) { if (is_merged(ctx)) { print(ctx) cat("\nis merged with\n\n") print(merged_with(ctx)) }}Predictive quality metrics for context based models
Description
This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.
Usage
metrics(model, ...)Arguments
model | The context based model on which to compute predictive metrics. |
... | Additional parameters for predictive metrics computation. |
Details
A context based model computes transition probabilities for its contexts.Using a maximum transition probability decision rule, this can be used topredict the new state that is the more likely to follow the current one,given the context (seepredict.vlmc()). The quality of these predictions isevaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states
Value
The returned value is guaranteed to have at least three components
accuracy: the accuracy of the predictionsconf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columnsauc: the AUC of the predictive model
References
David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.
See Also
metrics.vlmc(),metrics.ctx_node(),contexts.vlmc(),predict.vlmc().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)model <- vlmc(dts)metrics(model)Predictive quality metrics for VLMC with covariates
Description
This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.
Usage
## S3 method for class 'covlmc'metrics(model, ...)## S3 method for class 'metrics.covlmc'print(x, ...)Arguments
model | The context based model on which to compute predictive metrics. |
... | Additional parameters for predictive metrics computation. |
x | A metrics.covlmc object, results of a call to |
Details
A context based model computes transition probabilities for its contexts.Using a maximum transition probability decision rule, this can be used topredict the new state that is the more likely to follow the current one,given the context (seepredict.vlmc()). The quality of these predictions isevaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states
Value
An object of classmetrics.covlmc with the following components:
accuracy: the accuracy of the predictionsconf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columnsauc: the AUC of the predictive model
The object has a print method that recalls basic information about the modeltogether with the values of the components above.
Methods (by generic)
print(metrics.covlmc): Prints the predictive metrics of the VLMC model with covariates.
Extended contexts
As explained in details inloglikelihood.covlmc() documentation and inthe dedicatedvignette("likelihood", package = "mixvlmc"), the firstinitial values of a time series do not in general have a proper context fora COVLMC with a non zero order. In order to predict something meaningfulfor those values, we rely on the notion of extended context defined in thedocuments mentioned above. This follows the same logic as usingloglikelihood.covlmc() with the parameterinitial="extended". Allcovlmc functions that need to manipulate initial values with no propercontext use the same approach.
References
David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.
See Also
metrics.vlmc(),metrics.ctx_node(),contexts.vlmc(),predict.vlmc().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)metrics(m_cov)Predictive quality metrics for a node of a context tree
Description
This function computes and returns predictive quality metrics for a node(ctx_node) extracted from a context tree.
Usage
## S3 method for class 'ctx_node'metrics(model, ...)Arguments
model | T |
... | Additional parameters for predictive metrics computation. |
Details
Compared tometrics.vlmc(), this function focuses on a single context andassesses the quality of its predictions, disregarding observations that haveother contexts. Apart from this limited scope, the function operates asmetrics.vlmc().
Value
The returned value is guaranteed to have at least three components
accuracy: the accuracy of the predictionsconf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columnsauc: the AUC of the predictive model
References
David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.
See Also
metrics.vlmc(),metrics.ctx_node(),contexts.vlmc(),predict.vlmc().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)model_ctxs <- contexts(model)metrics(model_ctxs[[4]])Predictive quality metrics for a node of a COVLMC context tree
Description
This function computes and returns predictive quality metrics for a node(ctx_node_covlmc) extracted from a covlmc
Usage
## S3 method for class 'ctx_node_covlmc'metrics(model, ...)Arguments
model | A |
... | Additional parameters for predictive metrics computation. |
Details
Compared tometrics.covlmc(), this function focuses on a single context andassesses the quality of its predictions, disregarding observations that haveother contexts. Apart from this limited scope, the function operates asmetrics.covlmc().
Value
an object of classmetrics.covlmc with the following components:
accuracy: the accuracy of the predictionsconf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columnsauc: the AUC of the predictive model
References
David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.
See Also
metrics.vlmc(),metrics.ctx_node(),contexts.vlmc(),predict.vlmc().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)m_ctxs <- contexts(m_cov)## get the predictive metrics for each contextlapply(m_ctxs, metrics)Predictive quality metrics for VLMC
Description
This function computes and returns predictive quality metrics for contextbased models such as VLMC and VLMC with covariates.
Usage
## S3 method for class 'vlmc'metrics(model, ...)## S3 method for class 'metrics.vlmc'print(x, ...)Arguments
model | The context based model on which to compute predictive metrics. |
... | Additional parameters for predictive metrics computation. |
x | A metrics.vlmc object, results of a call to |
Details
A context based model computes transition probabilities for its contexts.Using a maximum transition probability decision rule, this can be used topredict the new state that is the more likely to follow the current one,given the context (seepredict.vlmc()). The quality of these predictions isevaluated using standard metrics including:
accuracy
the full confusion matrix
the area under the roc curve (AUC), considering the context based model asa (conditional) probability estimator. We use Hand and Till (2001) multiclassAUC in case of a state space with more than 2 states
Value
An object of classmetrics.vlmc with the following components:
accuracy: the accuracy of the predictionsconf_mat: the confusion matrix of the predictions, with predicted valuesin rows and true values in columnsauc: the AUC of the predictive model
The object has a print method that recalls basic information about themodel together with the values of the components above.
Methods (by generic)
print(metrics.vlmc): Prints the predictive metrics of the VLMC model.
Extended contexts
As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to predict something meaningful for thosevalues, we rely on the notion of extended context defined in the documentsmentioned above. This follows the same logic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmcfunctions that need to manipulate initial values with no proper context usethe same approach.
References
David J. Hand and Robert J. Till (2001). "A Simple Generalisationof the Area Under the ROC Curve for Multiple Class Classification Problems."Machine Learning 45(2), p. 171–186. DOI:doi:10.1023/A:1010920819831.
See Also
metrics.vlmc(),metrics.ctx_node(),contexts.vlmc(),predict.vlmc().
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]breaks <- c( 0, median(powerconsumption$active_power, na.rm = TRUE), max(powerconsumption$active_power, na.rm = TRUE))labels <- c(0, 1)dts <- cut(pc$active_power, breaks = breaks, labels = labels)model <- vlmc(dts)metrics(model)Logistic model of a COVLMC context
Description
This function returns a representation of the logistic model associated to aCOVLMC context from its node in the associated context tree.
Usage
model(node, type = c("coef", "full"))Arguments
node | A |
type | specifies the model information to return, either thecoefficients only ( |
Details
Full model extraction is only possible if the COVLMC model what not fullytrimmed (seetrim.covlmc()). Notice thatfind_sequence.covlmc() canproduce node that are not context: in this case this function returnNULL.
Value
ifnode is a context, the coefficients of the logistic model (as avector or a matrix depending on the size of the state space) or a logisticmodel as a R object. Ifnode is not a context,NULL.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10)vals <- states(m_cov)node <- find_sequence(m_cov, c(vals[1], vals[1]))nodemodel(node)model(node, type = "full")Find the parent of a node in a context tree
Description
This function returns the parent node of the node represented by thenode parameter. The result isNULL ifnode is the root node ofits context tree (representing the empty sequence).
Usage
parent(node)## S3 method for class 'ctx_node'parent(node)## S3 method for class 'ctx_node_cpp'parent(node)Arguments
node | a |
Details
Each node of a context tree represents a sequence. Whenfind_sequence() iscalled with success, the returned object represents the corresponding node inthe context tree. Unless the original sequence is empty, this node has aparent node which is returned as actx_node object by the present function.Another interpretation is that the function returns thenode objectassociated to the sequence obtained by removing the oldest value from theoriginal sequence.
Value
actx_node object ifnode does correspond to the emptysequence orNULL when this is not the case
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3)ctx_00 <- find_sequence(dts_ctree, c(0, 0))## the parent sequence/node corresponds to the 0 contextparent(ctx_00)identical(parent(ctx_00), find_sequence(dts_ctree, c(0)))## C++ backenddts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 3, backend = "C++")ctx_00 <- find_sequence(dts_ctree, c(0, 0))## the parent sequence/node corresponds to the 0 contextparent(ctx_00)identical(parent(ctx_00), find_sequence(dts_ctree, c(0)))Plot the results of automatic (CO)VLMC complexity selection
Description
This function plots the results oftune_vlmc() ortune_covlmc().
Usage
## S3 method for class 'tune_vlmc'plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ...)## S3 method for class 'tune_covlmc'plot( x, value = c("criterion", "likelihood"), cutoff = c("quantile", "native"), ...)Arguments
x | a |
value | the criterion to plot (default "criterion"). |
cutoff | the scale used for the cut off criterion (default "quantile") |
... | additional parameters passed to |
Details
The standard plot consists in showing the evolution of the criterionused to select the model (AIC() orBIC()) as a function of thecut off criterion expressed in the quantile scale (the quantile is usedby default to offer a common default behaviour betweenvlmc() andcovlmc()). Parameters can be used to display instead theloglikelihood()of the model (by settingvalue="likelihood") and to use the nativescale for the cut off when available (by settingcutoff="native").
Value
thetune_vlmc object invisibly
Customisation
The function sets several default before callingbase::plot(), namely:
type: "l" by default to use a line representation;xlab: "Cut off (quantile scale)" by default, adapted to the actualscale;ylab: the name of the criterion or "Log likelihood".
These parameters can be overridden by specifying other values when callingthe function. All parameters specified in addition tox,value andcutoff are passed tobase::plot().
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)## default plotplot(tune_result)## likelihoodplot(tune_result, value = "likelihood")## parameters overridingplot(tune_result, value = "likelihood", xlab = "Cut off", type = "b")pc <- powerconsumption[powerconsumption$week %in% 10:12, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov, criterion = "AIC")plot(dts_best_model_tune)plot(dts_best_model_tune, value = "likelihood")Report the positions of a sequence associated to a node
Description
This function returns the positions of the sequence represented bynodein the time series used to build the context tree in which the sequence isrepresented. This is only possible is those positions were saved during theconstruction of the context tree. In positions were not saved, a call to thisfunction produces an error.
Usage
positions(node)## S3 method for class 'ctx_node'positions(node)## S3 method for class 'ctx_node_cpp'positions(node)Arguments
node | a |
Details
A position of a sequencectx in the time seriesx is an index valuetsuch that the sequence ends withx[t]. Thusx[t+1] is after the context.For instance ifx=c(0, 0, 1, 1) andctx=c(0, 1) (in standard stateorder), then the position ofctx inx is 3.
Value
positions of the sequence represented bynode is the originaltime series as a integer vector
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 3, min_size = 5)subseq <- find_sequence(dts_tree, factor(c("B", "A"), levels = c("A", "B", "C")))if (!is.null(subseq)) { positions(subseq)}Individual household electric power consumption
Description
A data set containing measurements of the electric power consumption of onehousehold with a time resolution of 10 minutes for the full year of 2008.
Usage
powerconsumptionFormat
A data frame with 52704 rows and 15 variables:
- month
month of 2008
- month_day
day of the month
- hour
hour (0 to 23)
- minute
starting minute of the 10 minutes period of this row
- active_power
global average active power on the 10 minute period(in kilowatt)
- reactive_power
global average reactive power on the 10 minuteperiod (in kilowatt)
- voltage
Average voltage on the 10 minute period (in volt)
- intensity
global average current intensity on the 10 minuteperiod (in ampere)
- sub_metering_1
energy sub-metering No. 1 (in watt-hour of activeenergy averaged over the 10 minute period). It corresponds to the kitchen,containing mainly a dishwasher, an oven and a microwave (hot plates arenot electric but gas powered)
- sub_metering_2
energy sub-metering No. 2 (in watt-hour of activeenergy averaged over the 10 minute period). It corresponds to the laundryroom, containing a washing-machine, a tumble-drier, a refrigerator and a light.
- sub_metering_3
energy sub-metering No. 3 (in watt-hour of active energyaveraged over the 10 minute period). It corresponds to an electricwater-heater and an air-conditioner.
- week
week number
- week_day
day of the week from 1 = Sunday to 7 = Saturday
- year_day
day of the year from 1 to 366 (2008 is a leap year)
- date_time
Date and time in POSIXct format
Details
This is a simplified version of the full data available on the UCI MachineLearning Repository under aCreative CommonsAttribution 4.0 International (CC BY 4.0) license, and provided by GeorgesHebrail and Alice Berard.
The original data have been averaged over a 10 minute time period (discardingmissing data in each period). The data set contains onlythe measurements from year 2008.
Notice that the different variables are expressed in the adapted units.In particular, the sub-meters are measuring active energy (in watt-hour) whilethe global active power is expressed in kilowatt.
Source
Individual household electric power consumption, 2012, G. Hebrail and A. Berard,UC Irvine Machine Learning repository.doi:10.24432/C58K54
Next state prediction in a discrete time series for a VLMC with covariates
Description
This function computes one step ahead predictions for a discrete time seriesbased on a VLMC with covariates.
Usage
## S3 method for class 'covlmc'predict( object, newdata, newcov, type = c("raw", "probs"), final_pred = TRUE, ...)Arguments
object | a fitted covlmc object. |
newdata | a time series adapted to the covlmc object. |
newcov | a data frame with the new values for the covariates. |
type | character indicating the type of prediction required. The default |
final_pred | if |
... | additional arguments. |
Details
Given a time seriesX, at time stept, a context is computed usingobservations fromX[1] toX[t-1] (see the dedicated section). Theprediction is then the most probable state forX[t] given this logisticmodel of the context and the corresponding values of the covariates. The timeseries of predictions is returned by the function whentype="raw" (defaultcase).
Whentype="probs", the function returns of the probabilities of each stateforX[t] as estimated by the logistic models. Those probabilities arereturned as a matrix of probabilities with column names given by the statenames.
Value
A vector of predictions iftype="raw" or a matrix of stateprobabilities iftype="probs".
Extended contexts
As explained in details inloglikelihood.covlmc() documentation and inthe dedicatedvignette("likelihood", package = "mixvlmc"), the firstinitial values of a time series do not in general have a proper context fora COVLMC with a non zero order. In order to predict something meaningfulfor those values, we rely on the notion of extended context defined in thedocuments mentioned above. This follows the same logic as usingloglikelihood.covlmc() with the parameterinitial="extended". Allcovlmc functions that need to manipulate initial values with no propercontext use the same approach.
Examples
pc <- powerconsumption[powerconsumption$week == 10, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.2, 0.7, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, alpha = 0.5)dts_probs <- predict(m_cov, dts[1:144], dts_cov[1:144, , drop = FALSE], type = "probs")dts_preds <- predict(m_cov, dts[1:144], dts_cov[1:144, , drop = FALSE], type = "raw", final_pred = FALSE)Next state prediction in a discrete time series for a VLMC
Description
This function computes one step ahead predictions for a discrete time seriesbased on a VLMC.
Usage
## S3 method for class 'vlmc'predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)## S3 method for class 'vlmc_cpp'predict(object, newdata, type = c("raw", "probs"), final_pred = TRUE, ...)Arguments
object | a fitted vlmc object. |
newdata | a time series adapted to the vlmc object. |
type | character indicating the type of prediction required. The default |
final_pred | if |
... | additional arguments. |
Details
Given a time seriesX, at time stept, a context is computed usingobservations fromX[1] toX[t-1] (see the dedicated section). Theprediction is then the most probable state forX[t] given this contexts.Ties are broken according to the natural order in the state space, favouring"small" values. The time series of predictions is returned by the functionwhentype="raw" (default case).
Whentype="probs", eachX[t] is associated to the conditionalprobabilities of the next state given the context. Those probabilities arereturned as a matrix of probabilities with column names given by the statenames.
Value
A vector of predictions iftype="raw" or a matrix of stateprobabilities iftype="probs".
Extended contexts
As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to predict something meaningful for thosevalues, we rely on the notion of extended context defined in the documentsmentioned above. This follows the same logic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmcfunctions that need to manipulate initial values with no proper context usethe same approach.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)predict(model, dts[1:5])predict(model, dts[1:5], "probs")## C++ backendpc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5, backend = "C++")predict(model, dts[1:5])predict(model, dts[1:5], "probs")Print a context list
Description
This function prints a list of contexts i.e. acontexts object listingctx_node objects.
Usage
## S3 method for class 'contexts'print(x, reverse = TRUE, ...)Arguments
x | the |
reverse | specifies whether the contexts should be reported intemporal order ( |
... | additional arguments for the print function. |
Value
thex object, invisibly
See Also
Examples
dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)print(contexts(dts_tree))Prune a Variable Length Markov Chain (VLMC)
Description
This function prunes a VLMC.
Usage
prune(vlmc, alpha = 0.05, cutoff = NULL, ...)## S3 method for class 'vlmc'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)## S3 method for class 'vlmc_cpp'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)Arguments
vlmc | a fitted VLMC model. |
alpha | number in (0,1] (default: 0.05) cut off value in quantile scalefor pruning. |
cutoff | positive number: cut off value in native (log likelihood ratio)scale for pruning. Defaults to the value obtained from |
... | additional arguments for the prune function. |
Details
In general, pruning a VLMC is more efficient than constructing two VLMC (thebase one and pruned one). Up to numerical instabilities, building a VLMC withaa cut off and then pruning it with ab cut off (witha>b) shouldproduce the same VLMC than building directly the VLMC with ab cut off.Interesting cut off values can be extracted from a VLMC using thecutoff()function.
As automated model selection is provided bytune_vlmc(), the direct use ofcutoffshould be reserved to advanced exploration of the set of trees that can beobtained from a complex one, e.g. to implement model selection techniques thatare not provided bytune_vlmc().
Value
a pruned VLMC
See Also
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))base_model <- vlmc(dts, alpha = 0.1)model_cuts <- cutoff(base_model)pruned_model <- prune(base_model, model_cuts[3])draw(pruned_model)direct_simple <- vlmc(dts, alpha = model_cuts[3])draw(direct_simple)# pruned_model and direct_simple should be identicalall.equal(pruned_model, direct_simple)Prune a Variable Length Markov Chain with covariates
Description
This function prunes a vlmc with covariates. This model must have beenestimated withkeep_data=TRUE to enable the pruning.
Usage
## S3 method for class 'covlmc'prune(vlmc, alpha = 0.05, cutoff = NULL, ...)Arguments
vlmc | a fitted VLMC model with covariates. |
alpha | number in (0,1) (default: 0.05) cutoff value in quantile scalefor pruning. |
cutoff | not supported by the vlmc with covariates. |
... | additional arguments for the prune function. |
Details
Post pruning a VLMC with covariates is not as straightforward as the sameprocedure applied tovlmc() (seecutoff.vlmc() andprune.vlmc()). Forefficiency reasons,covlmc() estimates only the logistic models that areconsidered useful for a given set construction parameters. With a moreaggressive pruning threshold, some contexts become leaves of the context treeand new logistic models must be estimated. Thus the pruning opportunitiesgiven bycutoff.covlmc() are only a subset of interesting cut offs for agiven covlmc.
Nevertheless,covlmc share withvlmc() the principle that post pruning acovlmc should give the same model as buidling directly the covlmc, providedthat the post pruning alpha is smaller than the alpha used to build theinitial model.
Value
a pruned covlmc.
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5, keep_data = TRUE)draw(m_cov)m_cov_cuts <- cutoff(m_cov)p_cov <- prune(m_cov, m_cov_cuts[1])draw(p_cov)Reverse Sequence
Description
This function reverses the order in which the sequence represented by thectx_node parameter will be reported in other functions, mainlyas_sequence().
Usage
## S3 method for class 'ctx_node'rev(x)Arguments
x | a |
Value
actx_node using the opposite ordering convention as the parameterof the function
See Also
Examples
dts <- c("A", "B", "C", "A", "A", "B", "B", "C", "C", "A")dts_tree <- ctx_tree(dts, max_depth = 3)res <- find_sequence(dts_tree, c("A", "B"))print(res)r_res <- rev(res)print(r_res)as_sequence(r_res)Simulate a discrete time series for a covlmc
Description
This function simulates a time series from the distribution estimated by thegiven covlmc object.
Usage
## S3 method for class 'covlmc'simulate(object, nsim = 1, seed = NULL, covariate, init = NULL, ...)Arguments
object | a fitted covlmc object. |
nsim | length of the simulated time series (defaults to 1). |
seed | an optional random seed (see the dedicated section). |
covariate | values of the covariates. |
init | an optional initial sequence for the time series. |
... | additional arguments. |
Details
A VLMC with covariates model needs covariates to compute its transitionprobabilities. The covariates must be submitted as a data frame using thecovariate argument. In addition, the time series can be initiated by afixed sequence specified via theinit parameter.
Value
a simulated discrete time series of the same type as the one used tobuild the covlmc with aseed attribute (see the Random seed section). Theresults has also thedts class to hide theseed attribute when usingprint or similar function.
Extended contexts
As explained in details inloglikelihood.covlmc() documentation and inthe dedicatedvignette("likelihood", package = "mixvlmc"), the firstinitial values of a time series do not in general have a proper context fora COVLMC with a non zero order. In order to simulate something meaningfulfor those values, we rely on the notion of extended context defined in thedocuments mentioned above. This follows the same logic as usingloglikelihood.covlmc() with the parameterinitial="extended". Allcovlmc functions that need to manipulate initial values with no propercontext use the same approach.
Random seed
This function reproduce the behaviour ofstats::simulate(). Ifseed isNULL the function does not change the random generator state and returnsthe value of.Random.seed as aseed attribute in the return value. Thiscan be used to reproduce exactly the simulation results by setting.Random.seed to this value. Notice that if the random seed has not beinitialised by R so far, the function issues a call torunif(1) toperform this initialisation (as is done instats::simulate()).
Itseed is an integer, it is used in a call toset.seed() before thesimulation takes place. The integer is saved as aseed attribute in thereturn value. The integer seed is completed by an attributekind whichcontains the valueas.list([RNGkind()]) exactly as withstats::simulate(). The random generator state is reset to its originalvalue at the end of the call.
See Also
stats::simulate() for details and examples on the random number generator setting
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 5)# new week with day light from 6:00 to 18:00new_cov <- data.frame(day_night = rep(c(rep(FALSE, 59), rep(TRUE, 121), rep(FALSE, 60)), times = 7))new_dts <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov)new_dts_2 <- simulate(m_cov, nrow(new_cov), seed = 0, covariate = new_cov, init = dts[1:10])Simulate a discrete time series for a vlmc
Description
This function simulates a time series from the distribution estimated by thegiven vlmc object.
Usage
## S3 method for class 'vlmc'simulate(object, nsim = 1L, seed = NULL, init = NULL, burnin = 0L, ...)Arguments
object | a fitted vlmc object. |
nsim | length of the simulated time series (defaults to 1). |
seed | an optional random seed (see the dedicated section). |
init | an optional initial sequence for the time series. |
burnin | number of initial observations to discard or |
... | additional arguments. |
Details
The time series can be initiated by a fixed sequence specified via theinitparameter.
Value
a simulated discrete time series of the same type as the one used tobuild the vlmc with aseed attribute (see the Random seed section). Theresults has also thedts class to hide theseed attribute when usingprint or similar function.
Burn in (Warm up) period
When using a VLMC for simulation purposes, we are generally interested inthe stationary distribution of the corresponding Markov chain. To reducethe dependence of the samples from the initial values and get closer tothis stationary distribution (if it exists), it is recommended to discardthe first samples which are produced in a so-called "burn in" (or "warmup") period. Theburnin parameter can be used to implement this approach.The VLMC is used to produce a sample of sizeburnin + nsim but the firstburnin values are discarded. Notice that this burn in values can bepartially given by theinit parameter if it is specified.
Ifburnin is set to"auto", theburnin period is set to64 * context_number(object), following the heuristic proposed in Mächler andBühlmann (2004).
Random seed
This function reproduce the behaviour ofstats::simulate(). Ifseed isNULL the function does not change the random generator state and returnsthe value of.Random.seed as aseed attribute in the return value. Thiscan be used to reproduce exactly the simulation results by setting.Random.seed to this value. Notice that if the random seed has not beinitialised by R so far, the function issues a call torunif(1) toperform this initialisation (as is done instats::simulate()).
Itseed is an integer, it is used in a call toset.seed() before thesimulation takes place. The integer is saved as aseed attribute in thereturn value. The integer seed is completed by an attributekind whichcontains the valueas.list([RNGkind()]) exactly as withstats::simulate(). The random generator state is reset to its originalvalue at the end of the call.
Extended contexts
As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to simulate something meaningful for thosevalues wheninit is not provided, we rely on the notion of extendedcontext defined in the documents mentioned above. This follows the samelogic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmc functions that need to manipulate initialvalues with no proper context use the same approach.
References
Mächler, M. and Bühlmann, P. (2004) "Variable Length MarkovChains: Methodology, Computing, and Software" Journal of Computational andGraphical Statistics, 13 (2), 435-455,doi:10.1198/1061860043524
See Also
stats::simulate() for details and examples on the random numbergenerator setting
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)new_dts <- simulate(model, 500, seed = 0)new_dts_2 <- simulate(model, 500, seed = 0, init = dts[1:5])new_dts_3 <- simulate(model, 500, seed = 0, burnin = 500)Simulate a discrete time series for a vlmc
Description
This function simulates a time series from the distribution estimated by thegiven vlmc object.
Usage
## S3 method for class 'vlmc_cpp'simulate( object, nsim = 1, seed = NULL, init = NULL, burnin = 0L, sample = c("fast", "slow", "R"), ...)Arguments
object | a fitted vlmc object. |
nsim | length of the simulated time series (defaults to 1). |
seed | an optional random seed (see the dedicated section). |
init | an optional initial sequence for the time series. |
burnin | number of initial observations to discard or |
sample | specifies which implementation of |
... | additional arguments. |
Details
The time series can be initiated by a fixed sequence specified via theinitparameter.
Value
a simulated discrete time series of the same type as the one used tobuild the vlmc with aseed attribute (see the Random seed section). Theresults has also thedts class to hide theseed attribute when usingprint or similar function.
sampling method
The R backend forvlmc() usesbase::sample() to generate samples for eachcontext. Internally, this function sorts the probabilities of each state indecreasing probability order (among other things), which is not needed in ourcase. The C++ backend can be used with three different implementations:
sample="fast"uses a dedicated C++ implementation adapted to the data structuresused internally. In general, the simulated time series obtained with thisimplementation will be different from the one generated with the R backend,even using the same seed.sample="slow"uses another C++ implementation that mimicsbase::sample()inorder to maximize the chance to provide identical simulation results regardlessof the backend (when using the same random seed). This process is not perfectas we use the std::lib sort algorithm which is not guaranteed to give identicalresults as the ones of R internal 'revsort'.sample="R"uses direct calls tobase::sample(). Results are guaranteedto be identical between the two backends, but at the price of higher runningtime.
Burn in (Warm up) period
When using a VLMC for simulation purposes, we are generally interested inthe stationary distribution of the corresponding Markov chain. To reducethe dependence of the samples from the initial values and get closer tothis stationary distribution (if it exists), it is recommended to discardthe first samples which are produced in a so-called "burn in" (or "warmup") period. Theburnin parameter can be used to implement this approach.The VLMC is used to produce a sample of sizeburnin + nsim but the firstburnin values are discarded. Notice that this burn in values can bepartially given by theinit parameter if it is specified.
Ifburnin is set to"auto", theburnin period is set to64 * context_number(object), following the heuristic proposed in Mächler andBühlmann (2004).
Random seed
This function reproduce the behaviour ofstats::simulate(). Ifseed isNULL the function does not change the random generator state and returnsthe value of.Random.seed as aseed attribute in the return value. Thiscan be used to reproduce exactly the simulation results by setting.Random.seed to this value. Notice that if the random seed has not beinitialised by R so far, the function issues a call torunif(1) toperform this initialisation (as is done instats::simulate()).
Itseed is an integer, it is used in a call toset.seed() before thesimulation takes place. The integer is saved as aseed attribute in thereturn value. The integer seed is completed by an attributekind whichcontains the valueas.list([RNGkind()]) exactly as withstats::simulate(). The random generator state is reset to its originalvalue at the end of the call.
Extended contexts
As explained in details inloglikelihood.vlmc() documentation and in thededicatedvignette("likelihood", package = "mixvlmc"), the first initialvalues of a time series do not in general have a proper context for a VLMCwith a non zero order. In order to simulate something meaningful for thosevalues wheninit is not provided, we rely on the notion of extendedcontext defined in the documents mentioned above. This follows the samelogic as usingloglikelihood.vlmc() with the parameterinitial="extended". All vlmc functions that need to manipulate initialvalues with no proper context use the same approach.
References
Mächler, M. and Bühlmann, P. (2004) "Variable Length MarkovChains: Methodology, Computing, and Software" Journal of Computational andGraphical Statistics, 13 (2), 435-455,doi:10.1198/1061860043524
See Also
stats::simulate() for details and examples on the random numbergenerator setting
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts, min_size = 5)new_dts <- simulate(model, 500, seed = 0)new_dts_2 <- simulate(model, 500, seed = 0, init = dts[1:5])new_dts_3 <- simulate(model, 500, seed = 0, burnin = 500)State space of a context tree
Description
This function returns the state space of a context tree.
Usage
states(ct)Arguments
ct | a context tree. |
Value
the state space of the context tree.
Examples
dts <- c(0, 1, 1, 1, 0, 0, 1, 0, 1, 0)dts_ctree <- ctx_tree(dts, min_size = 1, max_depth = 2)## should be c(0, 1)states(dts_ctree)Trim a context tree
Description
This function returns a trimmed context tree from which match positionshave been removed.
Usage
trim(ct, ...)Arguments
ct | a context tree. |
... | additional arguments for the trim function. |
Value
a trimmed context tree.
Examples
## context tree trimmingdts <- sample(as.factor(c("A", "B", "C")), 1000, replace = TRUE)dts_tree <- ctx_tree(dts, max_depth = 10, min_size = 5, keep_position = TRUE)print(object.size(dts_tree))dts_tree <- trim(dts_tree)print(object.size(dts_tree))Trim a COVLMC
Description
This function returns a trimmed COVLMC from which cached data have been removed.
Usage
## S3 method for class 'covlmc'trim(ct, keep_model = FALSE, ...)Arguments
ct | a context tree. |
keep_model | specifies whether to keep the internal models (or not) |
... | additional arguments for the trim function. |
Details
Called withkeep_model set toFALSE (default case), the trimming is maximal and reducesfurther usability of the model. In particularloglikelihood.covlmc() cannot be usedfor new data,contexts.covlmc() do not support model extraction, andsimulate.covlmc(),metrics.covlmc() andprune.covlmc() cannot be used at all.
Called withkeep_model set toTRUE, the trimming process is less complete. Inparticular internal models are simplified usingbutcher::butcher() and someadditional minor reductions. This saves less memory but enables the use ofloglikelihood.covlmc() for new data aswell as the use ofsimulate.covlmc().
Value
a trimmed context tree.
See Also
Examples
pc <- powerconsumption[powerconsumption$week %in% 5:7, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))m_cov <- covlmc(dts, dts_cov, min_size = 10, keep_data = TRUE)print(object.size(m_cov), units = "Mb")t_m_cov_model <- trim(m_cov, keep_model = TRUE)print(object.size(t_m_cov_model), units = "Mb")t_m_cov <- trim(m_cov)print(object.size(t_m_cov), units = "Mb")This function returns a trimmed VLMC from which match positions have beenremoved.
Description
This function returns a trimmed context tree from which match positionshave been removed.
Usage
## S3 method for class 'vlmc'trim(ct, ...)Arguments
ct | a VLMC. |
... | additional arguments for the trim function. |
Value
a trimmed VLMC
Examples
## VLMC trimming is generally useless unless match positions were keptpc <- powerconsumption[powerconsumption$week %in% 5:6, ]dts <- cut(pc$active_power, breaks = 4)model <- vlmc(dts, keep_match = TRUE)print(object.size(model))model <- trim(model)## memory use should be reducedprint(object.size(model))nm_model <- vlmc(dts)print(object.size(nm_model))nm_model <- trim(nm_model)## no effect when match positions are not keptprint(object.size(nm_model))This function returns a trimmed VLMC from which match positions have beenremoved.
Description
This function returns a trimmed context tree from which match positionshave been removed.
Usage
## S3 method for class 'vlmc_cpp'trim(ct, ...)Arguments
ct | a VLMC. |
... | additional arguments for the trim function. |
Details
Trimming in the C++ backend is done directly in theRcpp managed memory andcannot be detected at R level using e.g.utils::object.size().
Value
a trimmed VLMC
Examples
## VLMC trimming is generally useless unless match positions were keptpc <- powerconsumption[powerconsumption$week %in% 5:6, ]dts <- cut(pc$active_power, breaks = 4)model <- vlmc(dts, backend = "C++", keep_match = TRUE)model <- trim(model)Fit an optimal Variable Length Markov Chain with Covariates (coVLMC)
Description
This function fits a Variable Length Markov Chain with Covariates (coVLMC) toa discrete time series coupled with a time series of covariates by optimizingan information criterion (BIC or AIC).
Usage
tune_covlmc( x, covariate, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, min_size = 5, max_depth = 100, verbose = 0, save = c("best", "initial", "all"), trimming = c("full", "partial", "none"), best_trimming = c("none", "partial", "full"))Arguments
x | a discrete time series; can be numeric, character, factor andlogical. |
covariate | a data frame of covariates. |
criterion | criterion used to select the best model. Either |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. See |
alpha_init | if non |
min_size | integer >= 1 (default: 5). Tune the minimum number ofobservations for a context in the growing phase of the context tree (see |
max_depth | integer >= 1 (default: 100). Longest context considered ingrowing phase of the initial context tree (see details). |
verbose | integer >= 0 (default: 0). Verbosity level of the pruningprocess. |
save | specify which BIC models are saved during the pruning process.The default value |
trimming | specify the type of trimming used when saving theintermediate models, see details. |
best_trimming | specify the type of trimming used when saving the bestmodel and the initial one (see details). |
Details
This function automates the process of fitting a large coVLMC to a discretetime series withcovlmc() and of pruning the tree (withcutoff() andprune()) to get an optimal with respect to an information criterion. Toavoid missing long term dependencies, the function uses themax_depthparameter as an initial guess but then relies on an automatic increase of thevalue to make sure the initial context tree is only limited by themin_sizeparameter. The initial value of thealpha parameter ofcovlmc() is alsoset to a conservative value (0.5) to avoid prior simplification of thecontext tree. This can be overridden by setting thealpha_init parameter toa more adapted value.
Once the initial coVLMC is obtained, thecutoff() andprune() functionsare used to build all the coVLMC models that could be generated using smallervalues of the alpha parameter. The best model is selected from thiscollection, including the initial complex tree, as the one that minimizes thechosen information criterion.
Value
a list with the following components:
best_model: the optimal COVLMCcriterion: the criterion used to select the optimal VLMCinitial: the likelihood function used to select the optimal VLMCresults: a data frame with details about the pruning processsaved_models: a list of intermediate COVLMCs ifsave="initial"orsave="all". It contains aninitialcomponent with the large coVLMCobtained first and anallcomponent with a list of all theother coVLMCobtained by pruning the initial one.
Memory occupation
covlmc objects tend to be large and saving all the models during thesearch for the optimal model can lead to an unreasonable use of memory. Toavoid this problem, models are kept in trimmed form only usingtrim.covlmc() withkeep_model=FALSE. Both the initial model and thebest one are saved untrimmed. This default behaviour corresponds totrimming="full". Settingtrimming="partial" asks the function to usekeep_model=TRUE intrim.covlmc() for intermediate models. Finally,trimming="none" turns off trimming, which is discouraged expected forsmall data sets.
In parallel processing contexts (e.g. usingforeach::%dopar%), the memoryoccupation of the results can become very large as models tend to keepenvironments attached to the formulas. In this situation, it is highlyrecommended to trim all saved models, including the best one and theinitial one. This can be done via thebest_trimming parameter whosepossible values are identical to the ones oftrimming.
See Also
Examples
pc <- powerconsumption[powerconsumption$week %in% 6:7, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.5, 1))))dts_cov <- data.frame(day_night = (pc$hour >= 7 & pc$hour <= 17))dts_best_model_tune <- tune_covlmc(dts, dts_cov)draw(as_covlmc(dts_best_model_tune))Fit an optimal Variable Length Markov Chain (VLMC)
Description
This function fits a Variable Length Markov Chain (VLMC) to a discrete timeseries by optimizing an information criterion (BIC or AIC).
Usage
tune_vlmc( x, criterion = c("BIC", "AIC"), initial = c("truncated", "specific", "extended"), alpha_init = NULL, cutoff_init = NULL, min_size = 2L, max_depth = 100L, backend = getOption("mixvlmc.backend", "R"), verbose = 0, save = c("best", "initial", "all"))Arguments
x | a discrete time series; can be numeric, character, factor andlogical. |
criterion | criterion used to select the best model. Either |
initial | specifies the likelihood function, more precisely the way thefirst few observations for which contexts cannot be calculated areintegrated in the likelihood. Default to |
alpha_init | if non |
cutoff_init | if non |
min_size | integer >= 1 (default: 2). Minimum number of observations fora context in the growing phase of the initial context tree. |
max_depth | integer >= 1 (default: 100). Longest context considered ingrowing phase of the initial context tree (see details). |
backend | backend "R" or "C++" (default: as specified by the"mixvlmc.backend" option). Specifies the implementation used to representthe context tree and to built it. See |
verbose | integer >= 0 (default: 0). Verbosity level of the pruningprocess. |
save | specify which BIC models are saved during the pruning process.The default value |
Details
This function automates the process of fitting a large VLMC to a discretetime series withvlmc() and of pruning the tree (withcutoff() andprune()) to get an optimal with respect to an information criterion. Toavoid missing long term dependencies, the function uses themax_depthparameter as an initial guess but then relies on an automatic increase of thevalue to make sure the initial context tree is only limited by themin_sizeparameter. The initial value of thecutoff parameter ofvlmc() is alsoset to conservative values (depending on the criterion) to avoid priorsimplification of the context tree. This default value can be overriddenusing thecutoff_init oralpha_init parameter.
Once the initial VLMC is obtained, thecutoff() andprune() functions areused to build all the VLMC models that could be generated using larger valuesof the initial cut off parameter. The best model is selected from thiscollection, including the initial complex tree, as the one that minimizes thechosen information criterion.
Value
a list with the following components:
best_model: the optimal VLMCcriterion: the criterion used to select the optimal VLMCinitial: the likelihood function used to select the optimal VLMCresults: a data frame with details about the pruning processsaved_models: a list of intermediate VLMCs ifsave="initial"orsave="all". It contains aninitialcomponent with the large VLMCobtained first and anallcomponent with a list of all theother VLMCobtained by pruning the initial one.
See Also
Examples
dts <- sample(as.factor(c("A", "B", "C")), 100, replace = TRUE)tune_result <- tune_vlmc(dts)draw(tune_result$best_model)Fit a Variable Length Markov Chain (VLMC)
Description
This function fits a Variable Length Markov Chain (VLMC) to a discrete timeseries.
Usage
vlmc( x, alpha = 0.05, cutoff = NULL, min_size = 2L, max_depth = 100L, prune = TRUE, keep_match = FALSE, backend = getOption("mixvlmc.backend", "R"))Arguments
x | a discrete time series; can be numeric, character, factor orlogical. |
alpha | number in (0,1] (default: 0.05) cut off value in quantile scalein the pruning phase. |
cutoff | non negative number: cut off value in native (likelihood ratio)scale in the pruning phase. Defaults to the value obtained from |
min_size | integer >= 1 (default: 2). Minimum number of observations fora context in the growing phase of the context tree. |
max_depth | integer >= 1 (default: 100). Longest context considered ingrowing phase of the context tree. |
prune | logical: specify whether the context tree should be pruned(default behaviour). |
keep_match | logical: specify whether to keep the context matches(default to FALSE) |
backend | "R" or "C++" (default: as specified by the "mixvlmc.backend"option). Specifies the implementation used to represent the context treeand to built it. See details. |
Details
The VLMC is built using Bühlmann and Wyner's algorithm which consists infitting a context tree (seectx_tree()) to a time series and then pruningit in such as way that the conditional distribution of the next state of thetime series given the context is significantly different from thedistribution given a truncated version of the context.
The construction of the context tree is controlled bymin_size andmax_depth, exactly as inctx_tree(). Significativity is measured using alikelihood ratio test (threshold can be specified in terms of the ratioitself withcutoff) or in quantile scale withalpha.
Pruning can be postponed by settingprune=FALSE. Using a combination ofcutoff() andprune(), the complexity of the VLMC can then be adjusted.Any VLMC model can be pruned after construction,prune=FALSE is aconvenience parameter to avoid settingalpha=1 (which essentially preventsany pruning). Automated model selection is provided bytune_vlmc().
Value
a fitted vlmc model.
Back ends
Two back ends are available to compute context trees:
the "R" back end represents the tree in pure R data structures (nested lists)that be easily processed further in pure R (C++ helper functions are usedto speed up the construction).
the "C++" back end represents the tree with C++ classes. This back end isconsidered experimental. The tree is built with an optimised suffix treealgorithm which speeds up the construction by at least a factor 10 instandard settings. As the tree is kept outside of R direct reach, contexttrees built with the C++ back end must be restored after a
saveRDS()/readRDS()sequence. This is done automatically by recomputingcompletely the context tree.
References
Bühlmann, P. and Wyner, A. J. (1999), "Variable length Markovchains. Ann. Statist." 27 (2) 480-513doi:10.1214/aos/1018031204
See Also
cutoff(),prune() andtune_vlmc()
Examples
pc <- powerconsumption[powerconsumption$week == 5, ]dts <- cut(pc$active_power, breaks = c(0, quantile(pc$active_power, probs = c(0.25, 0.5, 0.75, 1))))model <- vlmc(dts)draw(model)depth(model)## reduce the detph of the modelshallow_model <- vlmc(dts, max_depth = 3)draw(shallow_model, prob = FALSE)## improve probability estimatesrobust_model <- vlmc(dts, min_size = 25)draw(robust_model, prob = FALSE) ## show the frequenciesdraw(robust_model)