Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Benchmark time series data sets for PyTorch

License

NotificationsYou must be signed in to change notification settings

philipdarke/torchtime

Repository files navigation

PyPiBuild statusCoverageLicenseDOI

PyTorch data sets for supervised time series classification and prediction problems, including:

  • All UEA/UCR classification repository data sets
  • PhysioNet Challenge 2012 (in-hospital mortality)
  • PhysioNet Challenge 2019 (sepsis prediction)
  • A binary prediction variant of the 2019 PhysioNet Challenge

Why usetorchtime?

  1. Saves time. You don't have to write your own PyTorch data classes.
  2. Better research. Use common, reproducible implementations of data sets for a level playing field when evaluating models.

Installation

Install PyTorch followed bytorchtime:

$ pip install torchtime

or

$ conda install torchtime -c conda-forge

There is currently no Windows build forconda. Feedback is welcome fromconda users in particular.

Getting started

Data classes have a common API. Thesplit argument determines whether training ("train"), validation ("val") or test ("test") data are returned. The size of the splits are controlled with thetrain_prop and (optional)val_prop arguments.

PhysioNet data sets

ThreePhysioNet data sets are currently supported:

For example, to load training data for the 2012 challenge with a 70/30% training/validation split and create aDataLoader for model training:

fromtorch.utils.dataimportDataLoaderfromtorchtime.dataimportPhysioNet2012physionet2012=PhysioNet2012(split="train",train_prop=0.7,)dataloader=DataLoader(physionet2012,batch_size=32)

UEA/UCR repository data sets

Thetorchtime.data.UEA class returns theUEA/UCR repository data set specified by thedataset argument, for example:

fromtorch.utils.dataimportDataLoaderfromtorchtime.dataimportUEAarrowhead=UEA(dataset="ArrowHead",split="train",train_prop=0.7,)dataloader=DataLoader(arrowhead,batch_size=32)

Using the DataLoader

Batches are dictionaries of tensorsX,y andlength:

  • X are the time series data. The package follows thebatch first convention thereforeX has shape (n,s,c) wheren is batch size,s is (longest) trajectory length andc is the number of channels. By default, the first channel is a time stamp.
  • y are one-hot encoded labels of shape (n,l) wherel is the number of classes.
  • length are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).

For example, ArrowHead is a univariate time series thereforeX has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3). For a batch size of 32:

next_batch=next(iter(dataloader))next_batch["X"].shape# torch.Size([32, 251, 2])next_batch["y"].shape# torch.Size([32, 3])next_batch["length"].shape# torch.Size([32])

SeeUsing DataLoaders for more information.

Advanced options

  • Missing data can be imputed by settingimpute tomean (replace with training data channel means) orforward (replace with previous observation). Alternatively a custom imputation function can be passed to theimpute argument.
  • A time stamp (added by default), missing data mask and the time since previous observation can be appended with the boolean argumentstime,mask anddelta respectively.
  • Time series data are standardised using thestandardise boolean argument.
  • The location of cached data can be changed with thepath argument, for example to share a single cache location across projects.
  • For reproducibility, an optional randomseed can be specified.
  • Missing data can be simulated using themissing argument to drop data at random from UEA/UCR data sets.

See thetutorials andAPI for more information.

Other resources

If you're looking for the TensorFlow equivalent for PhysioNet data sets trymedical_ts_datasets.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020[1] and Che et al, 2018[2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

Citingtorchtime

If you use this software, please cite thepaper:

@software{darke_torchtime_2022,    author = Darke, Philip and Missier, Paolo and Bacardit, Jaume,    title = "Benchmark time series data sets for {PyTorch} - the torchtime package",    month = July,    year = 2022,    publisher={arXiv},    doi = 10.48550/arXiv.2207.12503,    url = https://doi.org/10.48550/arXiv.2207.12503,}

DOIs are also available for each version of the packagehere.

References

  1. Kidger, P, Morrill, J, Foster, J,et al. Neural Controlled Differential Equations for Irregular Time Series.arXiv 2005.08926 (2020).[arXiv]

  2. Che, Z, Purushotham, S, Cho, K,et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values.Sci Rep 8, 6085 (2018).[doi]

  3. Silva, I, Moody, G, Scott, DJ,et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012.Comput Cardiol 2012;39:245-248 (2010).[hdl]

  4. Reyna, M, Josef, C, Jeter, R,et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge.Critical Care Medicine 48 2: 210-217 (2019).[doi]

  5. Reyna, M, Josef, C, Jeter, R,et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0).PhysioNet (2019).[doi]

  6. Goldberger, A, Amaral, L, Glass, L,et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.Circulation 101 (23), pp. e215–e220 (2000).[doi]

  7. Löning, M, Bagnall, A, Ganesh, S,et al. sktime: A Unified Interface for Machine Learning with Time Series.Workshop on Systems for ML at NeurIPS 2019 (2019).[doi]

  8. Löning, M, Bagnall, A, Middlehurst, M,et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1).Zenodo (2022).[doi]

License

Released under the MIT license.


[8]ページ先頭

©2009-2025 Movatter.jp