- Notifications
You must be signed in to change notification settings - Fork170
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
License
meta-pytorch/data
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
What is TorchData? |Stateful DataLoader |Install guide |Contributing |License
The TorchData project is an iterative enhancement to the PyTorch torch.utils.data.DataLoader andtorch.utils.data.Dataset/IterableDataset to make them scalable, performant dataloading solutions. We will be iteratingon the enhancements underthe torchdata repo.
Our first change begins with adding checkpointing to torch.utils.data.DataLoader, which can be found instateful_dataloader, a drop-in replacement for torch.utils.data.DataLoader, by definingload_state_dict andstate_dict methods that enable mid-epoch checkpointing, and an API for users to track customiteration progress, and other custom states from the dataloader workers such as token buffers and/or RNG states.
torchdata.stateful_dataloader.StatefulDataLoader is a drop-in replacement for torch.utils.data.DataLoader whichprovides state_dict and load_state_dict functionality. Seethe Stateful DataLoader main page for more information and examples. Also check out theexamplesin this Colab notebook.
torchdata.nodes is a library of composable iterators (not iterables!) that let you chain together common dataloading andpre-proc operations. It follows a streaming programming model, although "sampler + Map-style" can still be configured ifyou desire. Seetorchdata.nodes main page for more details. Stay tuned for tutorial ontorchdata.nodes coming soon!
The following is the correspondingtorchdata versions and supported Python versions.
torch | torchdata | python |
|---|---|---|
master /nightly | main /nightly | >=3.9,<=3.13 |
2.6.0 | 0.11.0 | >=3.9,<=3.13 |
2.5.0 | 0.10.0 | >=3.9,<=3.12 |
2.5.0 | 0.9.0 | >=3.9,<=3.12 |
2.4.0 | 0.8.0 | >=3.8,<=3.12 |
2.0.0 | 0.6.0 | >=3.8,<=3.11 |
1.13.1 | 0.5.1 | >=3.7,<=3.10 |
1.12.1 | 0.4.1 | >=3.7,<=3.10 |
1.12.0 | 0.4.0 | >=3.7,<=3.10 |
1.11.0 | 0.3.0 | >=3.7,<=3.10 |
First, set up an environment. We will be installing a PyTorch binary as well as torchdata. If you're using conda, createa conda environment:
conda create --name torchdataconda activate torchdata
If you wish to usevenv instead:
python -m venv torchdata-envsource torchdata-env/bin/activateInstall torchdata:
Using pip:
pip install torchdata
Using conda:
conda install -c pytorch torchdata
pip install.In case building TorchData from source fails, install the nightly version of PyTorch following the linked guide on thecontributing page.
The nightly version of TorchData is also provided and updated daily from main branch.
Using pip:
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
Using conda:
conda install torchdata -c pytorch-nightly
We welcome PRs! See theCONTRIBUTING file.
We'd love to hear from and work with early adopters to shape our designs. Please reach out by raising an issue if you'reinterested in using this tooling for your project.
TorchData is BSD licensed, as found in theLICENSE file.
About
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
Resources
License
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.