Movatterモバイル変換


[0]ホーム

URL:


theft

CRAN versionCRAN RStudio mirror downloadsDOI

Tools for Handling Extraction of Features from Time series(theft)

Installation

You can install the stable version oftheft fromCRAN:

install.packages("theft")

You can install the development version oftheft fromGitHub using the following:

devtools::install_github("hendersontrent/theft")

Please also check out our paperFeature-Based Time-SeriesAnalysis in R using the theft Package which discusses the motivationand theoretical underpinnings oftheft and walks throughall of its functionality using theBonn EEG dataset —a well-studied neuroscience dataset.

General purpose

theft is a software package for R that facilitatesuser-friendly access to a consistent interface for the extraction oftime-series features. The package provides a single point of access to\(>1100\) time-series features froma range of existing R and Python packages as well as enabling users tocalculate their own features. The packages whichtheft‘steals’ features from currently are:

As ofv0.6.1, users can also calculate their ownindividual features or sets of features too! In addition, two basicfeature sets"quantiles" (a set of 100 quantiles) and"moments" (the first four moments of the distribution:mean, variance, skewness, and kurtosis) are also available for usersseeking to compute simple baselines against which to compare the moresophisticated feature sets (seethis recent paper for morediscussion on this idea).

Note thatKats,tsfresh andTSFEL are Python packages.theft has built-infunctionality for helping you install these libraries—all you need to dois install Python on your machine (preferably Python >=3.10). If youwish to access the Python feature sets, please run?install_python_pkgs in R after downloadingtheft or consult the vignette in the package for moreinformation. For a comprehensive comparison of these six feature setsacross a range of domains (including computation speed, within-setfeature composition, and between-set feature correlations), please referto the paperAnEmpirical Evaluation of Time-Series Feature Sets.

Also note that as ofv0.8.2 parallelisation is supportedfor"tsfresh" and"tsfel" (see the vignettefor more information)!

Package extensibility

The companion packagetheftdlc(‘theft downloadable content’—just like you getDLCsand expansions for video games) contains an extensive suite offunctions for analysing, interpreting, and visualising time-seriesfeatures calculated fromtheft. Collectively, thesepackages are referred to as the ‘theft ecosystem’.

Hex stickers of the theft and theftdlc packages for R

A high-level overview of how thetheft ecosystem for Ris typically accessed by users is shown below. Note that prior tov0.6.1 of, many of thetheftdlc functions werecontained intheft but under other names. To ensure thetheft ecosystem is as user-friendly as possible and canscale to meet future demands,theft has been refactored tojust perform feature extraction, whiletheftdlc handles allthe processing, analysis, and visualisation of the extractedfeatures.

Schematic of the theft ecosystem in R

Many more functions and options for customisation are availablewithin the packages and users are encouraged to explore the vignettesand helper files for more information.

Quick tour

theft andtheftdlc combine to create anintuitive and efficient workflow consistent with the broadertidyverts collection ofpackages for tidy time-series analysis. Here is a single code chunk thatcalculates features for atsibble (tidytemporal data frame) of some simulated time series processes, includingGaussian noise, AR(1), ARMA(1,1), MA(1), noisy sinusoid, and a randomwalk.simData comes withtheft. We’ll just usethecatch22feature set and a custom set of mean and standard deviation for now.Using tidy principles and pipes, we can, in the same code chunk, feedthe calculated features straight intotheftdlc’sproject function to project the 24-dimensional featurespace into an interpretable two-dimensional space using principalcomponents analysis:

library(dplyr)library(theft)library(theftdlc)calculate_features(data = theft::simData,feature_set ="catch22",features =list("mean"= mean,"sd"= sd))%>%project(norm_method ="RobustSigmoid",unit_int =TRUE,low_dim_method ="PCA")%>%plot()

In that example,calculate_features comes fromtheft, whileproject and theplotgeneric come fromtheftdlc.

Similarly, we can perform time-series classification using a similarworkflow to compare the performance ofcatch22 against ourcustom set of the first two moments of the distribution:

calculate_features(data = theft::simData,feature_set ="catch22",features =list("mean"= mean,"sd"= sd))%>%classify(by_set =TRUE,n_resamples =10,use_null =TRUE)%>%compare_features(by_set =TRUE,hypothesis ="pairwise")%>%head()
               hypothesis feature_set_a feature_set_b   metric set_a_mean1 All features != catch22  All features       catch22 accuracy  0.80222222    All features != User  All features          User accuracy  0.80222223         catch22 != User       catch22          User accuracy  0.7400000  set_b_mean t_statistic    p.value1  0.7400000  2.35154855 0.043195362  0.8044444 -0.03932757 0.969487803  0.8044444 -1.23794041 0.24705786

In this example,classify andcompare_features come fromtheftdlc.

We can also easily see how each set performs relative to an empiricalnull distribution (i.e., how much better does each set do than we wouldexpect due to chance?):

calculate_features(data = theft::simData,feature_set ="catch22",features =list("mean"= mean,"sd"= sd))%>%classify(by_set =TRUE,n_resamples =10,use_null =TRUE)%>%compare_features(by_set =TRUE,hypothesis ="null")%>%head()
                hypothesis  feature_set   metric  set_mean null_mean1 All features != own null All features accuracy 0.8022222 0.13555562         User != own null         User accuracy 0.8044444 0.15111113      catch22 != own null      catch22 accuracy 0.7400000 0.1222222  t_statistic      p.value1    6.826807 3.835233e-052    5.882092 1.171092e-043    6.879652 3.614676e-05

Please see the vignette for more information and the fullfunctionality of both packages.

Citation

If you usetheft ortheftdlc in your ownwork, please cite both the paper:

T. Henderson and Ben D. Fulcher.Feature-Based Time-SeriesAnalysis in R using the theft Package. arXiv, (2022).

and the software:

To cite package 'theft' in publications use:  Henderson T (2025). _theft: Tools for Handling Extraction of Features  from Time Series_. R package version 0.8.2,  <https://hendersontrent.github.io/theft/>.A BibTeX entry for LaTeX users is  @Manual{,    title = {theft: Tools for Handling Extraction of Features from Time Series},    author = {Trent Henderson},    year = {2025},    note = {R package version 0.8.2},    url = {https://hendersontrent.github.io/theft/},  }To cite package 'theftdlc' in publications use:  Henderson T (2025). _theftdlc: Analyse and Interpret Time Series  Features_. R package version 0.2.0,  <https://hendersontrent.github.io/theftdlc/>.A BibTeX entry for LaTeX users is  @Manual{,    title = {theftdlc: Analyse and Interpret Time Series Features},    author = {Trent Henderson},    year = {2025},    note = {R package version 0.2.0},    url = {https://hendersontrent.github.io/theftdlc/},  }

Acknowledgements

Big thanks toJoshuaMoore for his assistance in solving issues with the Python side ofthings, including the correct specification of dependencies for theinstall_python_pkgs function.


[8]ページ先頭

©2009-2025 Movatter.jp