Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

License

NotificationsYou must be signed in to change notification settings

meta-pytorch/data

What is TorchData? |Stateful DataLoader |Install guide |Contributing |License

What is TorchData?

The TorchData project is an iterative enhancement to the PyTorch torch.utils.data.DataLoader andtorch.utils.data.Dataset/IterableDataset to make them scalable, performant dataloading solutions. We will be iteratingon the enhancements underthe torchdata repo.

Our first change begins with adding checkpointing to torch.utils.data.DataLoader, which can be found instateful_dataloader, a drop-in replacement for torch.utils.data.DataLoader, by definingload_state_dict andstate_dict methods that enable mid-epoch checkpointing, and an API for users to track customiteration progress, and other custom states from the dataloader workers such as token buffers and/or RNG states.

Stateful DataLoader

torchdata.stateful_dataloader.StatefulDataLoader is a drop-in replacement for torch.utils.data.DataLoader whichprovides state_dict and load_state_dict functionality. Seethe Stateful DataLoader main page for more information and examples. Also check out theexamplesin this Colab notebook.

torchdata.nodes

torchdata.nodes is a library of composable iterators (not iterables!) that let you chain together common dataloading andpre-proc operations. It follows a streaming programming model, although "sampler + Map-style" can still be configured ifyou desire. Seetorchdata.nodes main page for more details. Stay tuned for tutorial ontorchdata.nodes coming soon!

Installation

Version Compatibility

The following is the correspondingtorchdata versions and supported Python versions.

torchtorchdatapython
master /nightlymain /nightly>=3.9,<=3.13
2.6.00.11.0>=3.9,<=3.13
2.5.00.10.0>=3.9,<=3.12
2.5.00.9.0>=3.9,<=3.12
2.4.00.8.0>=3.8,<=3.12
2.0.00.6.0>=3.8,<=3.11
1.13.10.5.1>=3.7,<=3.10
1.12.10.4.1>=3.7,<=3.10
1.12.00.4.0>=3.7,<=3.10
1.11.00.3.0>=3.7,<=3.10

Local pip or conda

First, set up an environment. We will be installing a PyTorch binary as well as torchdata. If you're using conda, createa conda environment:

conda create --name torchdataconda activate torchdata

If you wish to usevenv instead:

python -m venv torchdata-envsource torchdata-env/bin/activate

Install torchdata:

Using pip:

pip install torchdata

Using conda:

conda install -c pytorch torchdata

From source

pip install.

In case building TorchData from source fails, install the nightly version of PyTorch following the linked guide on thecontributing page.

From nightly

The nightly version of TorchData is also provided and updated daily from main branch.

Using pip:

pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu

Using conda:

conda install torchdata -c pytorch-nightly

Contributing

We welcome PRs! See theCONTRIBUTING file.

Beta Usage and Feedback

We'd love to hear from and work with early adopters to shape our designs. Please reach out by raising an issue if you'reinterested in using this tooling for your project.

License

TorchData is BSD licensed, as found in theLICENSE file.

About

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors84


[8]ページ先頭

©2009-2025 Movatter.jp