classicvalues/audioPublic

forked frompytorch/audio

NotificationsYou must be signed in to change notification settings
Fork0
Star1

Data manipulation and transformation for audio signal processing, powered by PyTorch

License

BSD-2-Clause license

1 star 698 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,522 Commits
.circleci		.circleci
.github		.github
cmake		cmake
docs		docs
examples		examples
packaging		packaging
test		test
third_party		third_party
tools		tools
torchaudio		torchaudio
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION		CITATION
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Repository files navigation

torchaudio: an audio library for PyTorch

The aim of torchaudio is to applyPyTorch tothe audio domain. By supporting PyTorch, torchaudio follows the same philosophyof providing strong GPU acceleration, having a focus on trainable features throughthe autograd system, and having consistent style (tensor names and dimension names).Therefore, it is primarily a machine learning library and not a general signalprocessing library. The benefits of PyTorch can be seen in torchaudio throughhaving all the computations be through PyTorch operations which makes it easyto use and feel like a natural extension.

Support audio I/O (Load files, Save files)
- Load a variety of audio formats, such aswav,mp3,ogg,flac,opus,sphere, into a torch Tensor using SoX
- Kaldi (ark/scp)
Dataloaders for common audio datasets
Common audio transforms
- Spectrogram, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample
Compliance interfaces: Run code using PyTorch that align with other libraries
- Kaldi: spectrogram, fbank, mfcc

Dependencies

PyTorch (See below for the compatible versions)
[optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above

The following are the correspondingtorchaudio versions and supported Python versions.

	`torch`	`torchaudio`	`python`
Development	`master` /`nightly`	`main` /`nightly`	`>=3.7`,`<=3.10`
Latest versioned release	`1.12.0`	`0.12.0`	`>=3.7`,`<=3.10`
LTS	`1.8.2`	`0.8.2`	`>=3.6`,`<=3.9`

Previous versions

`torch`	`torchaudio`	`python`
`1.11.0`	`0.11.0`	`>=3.7`,`<=3.9`
`1.10.0`	`0.10.0`	`>=3.6`,`<=3.9`
`1.9.1`	`0.9.1`	`>=3.6`,`<=3.9`
`1.9.0`	`0.9.0`	`>=3.6`,`<=3.9`
`1.8.2`	`0.8.2`	`>=3.6`,`<=3.9`
`1.8.0`	`0.8.0`	`>=3.6`,`<=3.9`
`1.7.1`	`0.7.2`	`>=3.6`,`<=3.9`
`1.7.0`	`0.7.0`	`>=3.6`,`<=3.8`
`1.6.0`	`0.6.0`	`>=3.6`,`<=3.8`
`1.5.0`	`0.5.0`	`>=3.5`,`<=3.8`
`1.4.0`	`0.4.0`	`==2.7`,`>=3.5`,`<=3.8`

Installation

Binary Distributions

torchaudio has binary distributions for PyPI (pip) and Anaconda (conda).

Please refer tohttps://pytorch.org/get-started/locally/ for the details.

Note Starting0.10, torchaudio has CPU-only and CUDA-enabled binary distributions, each of which requires a matching PyTorch version.

NoteLTS versions are distributed through a different channel than the other versioned releases. Please refer to the above page for details.

Note This software was compiled against an unmodified copy of FFmpeg (licensed underthe LGPLv2.1), with the specific rpath removed so as to enable the use of system libraries. The LGPL source can be downloadedhere.

From Source

On non-Windows platforms, the build process builds libsox and codecs that torchaudio need to link to. It will fetch and build libmad, lame, flac, vorbis, opus, and libsox before building extension. This process requirescmake andpkg-config. libsox-based features can be disabled withBUILD_SOX=0.The build process also builds the RNN transducer loss and CTC beam search decoder. These functionalities can be disabled by setting the environment variableBUILD_RNNT=0 andBUILD_CTC_DECODER=0, respectively.

# Linuxpython setup.py install# OSXCC=clang CXX=clang++ python setup.py install# Windows# We need to use the MSVC x64 toolset for compilation, with Visual Studio's vcvarsall.bat or directly with vcvars64.bat.# These batch files are under Visual Studio's installation folder, under 'VC\Auxiliary\Build\'.# More information available at:#   https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecturecall"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64&&set BUILD_SOX=0&& python setup.py install# orcall"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"&&set BUILD_SOX=0&& python setup.py install

This is known to work on linux and unix distributions such as Ubuntu and CentOS 7 and macOS.If you try this on a new system and find a solution to make it work, feel free to share it by opening an issue.

Quick Usage

importtorchaudiowaveform,sample_rate=torchaudio.load('foo.wav')# load tensor from filetorchaudio.save('foo_save.wav',waveform,sample_rate)# save tensor to file

Backend Dispatch

By default in OSX and Linux, torchaudio uses SoX as a backend to load and save files.The backend can be changed toSoundFileusing the following. SeeSoundFilefor installation instructions.

importtorchaudiotorchaudio.set_audio_backend("soundfile")# switch backendwaveform,sample_rate=torchaudio.load('foo.wav')# load tensor from file, as usualtorchaudio.save('foo_save.wav',waveform,sample_rate)# save tensor to file, as usual

Note

SoundFile currently does not support mp3.
"soundfile" backend is not supported by TorchScript.

API Reference

API Reference is located here:http://pytorch.org/audio/main/

Contributing Guidelines

Please refer toCONTRIBUTING.md

Citation

If you find this package useful, please cite as:

@article{yang2021torchaudio,title={TorchAudio: Building Blocks for Audio and Speech Processing},author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},journal={arXiv preprint arXiv:2110.15018},year={2021}}

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

About

Data manipulation and transformation for audio signal processing, powered by PyTorch

Releases

50tags

Packages

No packages published

Languages

Python78.3%
C++16.2%
Shell1.8%
CMake1.5%
Cuda1.2%
Batchfile0.8%
Dockerfile0.2%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

torchaudio: an audio library for PyTorch

Dependencies

Installation

Binary Distributions

From Source

Quick Usage

Backend Dispatch

API Reference

Contributing Guidelines

Citation

Disclaimer on Datasets

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

classicvalues/audio

Folders and files

Latest commit

History

Repository files navigation

torchaudio: an audio library for PyTorch

Dependencies

Installation

Binary Distributions

From Source

Quick Usage

Backend Dispatch

API Reference

Contributing Guidelines

Citation

Disclaimer on Datasets

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages