iver56/torch-audiomentationsPublic

NotificationsYou must be signed in to change notification settings
Fork93
Star1.1k

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

License

MIT license

1.1k stars 93 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 577 Commits
.github/workflows		.github/workflows
images		images
scripts		scripts
test_fixtures		test_fixtures
tests		tests
torch_audiomentations		torch_audiomentations
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
environment.yml		environment.yml
packaging.md		packaging.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Repository files navigation

Audio data augmentation in PyTorch. Inspired byaudiomentations.

Supports CPU and GPU (CUDA) - speed is a priority
Supports batches of multichannel (or mono) audio
Transforms extendnn.Module, so they can be integrated as a part of a pytorch neural network model
Most transforms are differentiable
Three modes:per_batch,per_example andper_channel
Cross-platform compatibility
Permissive MIT license
Aiming for high test coverage

Setup

pip install torch-audiomentations

Usage example

importtorchfromtorch_audiomentationsimportCompose,Gain,PolarityInversion# Initialize augmentation callableapply_augmentation=Compose(transforms=[Gain(min_gain_in_db=-15.0,max_gain_in_db=5.0,p=0.5,        ),PolarityInversion(p=0.5)    ])torch_device=torch.device("cuda"iftorch.cuda.is_available()else"cpu")# Make an example tensor with white noise.# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 s of 16 kHz audio.audio_samples=torch.rand(size=(8,2,32000),dtype=torch.float32,device=torch_device)-0.5# Apply augmentation. This varies the gain and polarity of (some of)# the audio snippets in the batch independently.perturbed_audio_samples=apply_augmentation(audio_samples,sample_rate=16000)

Known issues

Target data processing is still in an experimental state (#3). Workaround: Usefreeze_parameters andunfreeze_parameters for now if the target data is audio with the same shape as the input.
Using torch-audiomentations in a multiprocessing context can lead to memory leaks (#132). Workaround: If using torch-audiomentations in a multiprocessing context, it'll probably work better to run the transforms on CPU.
Multi-GPU / DDP is not officially supported (#136). The author does not have a multi-GPU setup to test & fix this. Get in touch if you want to donate some hardware for this. Workaround: Run the transforms on single GPU instead.
PitchShift does not support small pitch shifts, especially for low sample rates (#151). Workaround: If you need small pitch shifts applied to low sample rates, usePitchShift in audiomentations ortorch-pitch-shift directly without the function for calculating efficient pitch-shift targets.

Contribute

Contributors welcome!Join the Asteroid's slackto start discussing abouttorch-audiomentations with us.

Motivation: Speed

We don't want data augmentation to be a bottleneck in model training speed. Here is acomparison of the time it takes to run 1D convolution:

Note: Not all transforms have a speedup this impressive compared to CPU. In general, running audio data augmentation on GPU is not always the best option. For more info, see this article:https://iver56.github.io/audiomentations/guides/cpu_vs_gpu/

Current state

torch-audiomentations is in an early development stage, so the APIs are subject to change.

Waveform transforms

Every transform hasmode,p, andp_mode -- the parameters that decide how the augmentation is performed.

mode decides how the randomization of the augmentation is grouped and applied.
p decides the on/off probability of applying the augmentation.
p_mode decides how the on/off of the augmentation is applied.

This visualization shows how different combinations ofmode andp_mode would perform an augmentation.

AddBackgroundNoise

Added in v0.5.0

Add background noise to the input audio.

AddColoredNoise

Added in v0.7.0

Add colored noise to the input audio.

ApplyImpulseResponse

Added in v0.5.0

Convolve the given audio with impulse responses.

BandPassFilter

Added in v0.9.0

Apply band-pass filtering to the input audio.

BandStopFilter

Added in v0.10.0

Apply band-stop filtering to the input audio. Also known as notch filter.

Gain

Added in v0.1.0

Multiply the audio by a random amplitude factor to reduce or increase the volume. Thistechnique can help a model become somewhat invariant to the overall gain of the input audio.

Warning: This transform can return samples outside the [-1, 1] range, which may lead toclipping or wrap distortion, depending on what you do with the audio in a later stage.See alsohttps://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

HighPassFilter

Added in v0.8.0

Apply high-pass filtering to the input audio.

Identity

Added in v0.11.0

This transform returns the input unchanged. It can be used for simplifying the codein cases where data augmentation should be disabled.

LowPassFilter

Added in v0.8.0

Apply low-pass filtering to the input audio.

PeakNormalization

Added in v0.2.0

Apply a constant amount of gain, so that highest signal level present in each audio snippetin the batch becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between-1 and 1.

This transform has an alternative mode (apply_to="only_too_loud_sounds") where it onlyapplies to audio snippets that have extreme values outside the [-1, 1] range. This is usefulfor avoiding digital clipping in audio that is too loud, while leaving other audiountouched.

PitchShift

Added in v0.9.0

Pitch-shift sounds up or down without changing the tempo.

PolarityInversion

Added in v0.1.0

Flip the audio samples upside-down, reversing their polarity. In other words, multiply thewaveform by -1, so negative values become positive, and vice versa. The result will soundthe same compared to the original when played back in isolation. However, when mixed withother audio sources, the result may be different. This waveform inversion techniqueis sometimes used for audio cancellation or obtaining the difference between two waveforms.However, in the context of audio data augmentation, this transform can be useful whentraining phase-aware machine learning models.

Shift

Added in v0.5.0

Shift the audio forwards or backwards, with or without rollover

ShuffleChannels

Added in v0.6.0

Given multichannel audio input (e.g. stereo), shuffle the channels, e.g. so left can become right and vice versa.This transform can help combat positional bias in machine learning models that input multichannel waveforms.

If the input audio is mono, this transform does nothing except emit a warning.

TimeInversion

Added in v0.10.0

Reverse (invert) the audio along the time axis similar to random flip ofan image in the visual domain. This can be relevant in the context of audioclassification. It was successfully applied in the paperAudioCLIP: Extending CLIP to Image, Text and Audio

Changelog

Unreleased

Added

Add new transforms:Mix,Padding,RandomCrop andSpliceOut

[v0.12.0] - 2025-01-15

Removed

Removelibrosa dependency in favor oftorchaudio

[v0.11.2] - 2025-01-09

Fixed

Fix a device-related bug intransform_parameters when training on multiple GPUs
Fix a shape-related edge case bug inAddColoredNoise
Fix a bug where an incompatible Path data type was passed to torchaudio.info

[v0.11.1] - 2024-02-07

Changed

Add support for constant cutoff frequency inLowPassFilter andHighPassFilter
Add support for min_f_decay==max_f_decay inAddColoredNoise
Bump torchaudio dependency from >=0.7.0 to >=0.9.0

Fixed

Fix inaccurate type hints inShift
Removeset_backend to avoidUserWarning from torchaudio

[v0.11.0] - 2022-06-29

Added

Add new transform:Identity
Add API for processing targets alongside inputs. Some transforms experimentallysupport this feature already.

Changed

AddObjectDict output type as alternative totorch.Tensor. This alternative is opt-in fornow (for backwards-compatibility), but note that the old output type (torch.Tensor) isdeprecated and support for it will be removed in a future version.
Allow specifying a file path, a folder path, a list of files or a list of folders toAddBackgroundNoise andApplyImpulseResponse
Require newer version oftorch-pitch-shift to ensure support for torchaudio 0.11 inPitchShift

Fixed

Fix a bug whereBandPassFilter didn't work on GPU

[v0.10.1] - 2022-03-24

Added

Add support for min SNR == max SNR inAddBackgroundNoise
Add support for librosa 0.9.0

Fixed

Fix a bug where loaded audio snippets were sometimes resampled to an incompatiblelength inAddBackgroundNoise

[v0.10.0] - 2022-02-11

Added

ImplementOneOf andSomeOf for applying one or more of a given set of transforms
Implement new transforms:BandStopFilter andTimeInversion

Changed

Putir_paths in transform_parameters inApplyImpulseResponse so it is possibleto inspect what impulse responses were used. This also givesfreeze_parameters()the expected behavior.

Fixed

Fix a bug where the actual bandwidth was twice as large as expected inBandPassFilter. The default values have been updated accordingly.If you were previously specifyingmin_bandwidth_fraction and/ormax_bandwidth_fraction,you now need to double those numbers to get the same behavior as before.

[v0.9.1] - 2021-12-20

Added

Officially mark python>=3.9 as supported

[v0.9.0] - 2021-10-11

Added

Add parametercompensate_for_propagation_delay inApplyImpulseResponse
ImplementBandPassFilter
ImplementPitchShift

Removed

Support for torchaudio<=0.6 has been removed

[v0.8.0] - 2021-06-15

Added

ImplementHighPassFilter andLowPassFilter

Deprecated

Support for torchaudio<=0.6 is deprecated and will be removed in the future

Removed

Support for pytorch<=1.6 has been removed

[v0.7.0] - 2021-04-16

Added

ImplementAddColoredNoise

Deprecated

Support for pytorch<=1.6 is deprecated and will be removed in the future

[v0.6.0] - 2021-02-22

Added

ImplementShuffleChannels

[v0.5.1] - 2020-12-18

Fixed

Fix a bug whereAddBackgroundNoise did not work on CUDA
Fix a bug where symlinked audio files/folders were not found when looking for audio files
Use torch.fft.rfft instead of the torch.rfft (deprecated in pytorch 1.7) when possible. As abonus, the change also improves performance inApplyImpulseResponse.

[v0.5.0] - 2020-12-08

Added

ReleaseAddBackgroundNoise andApplyImpulseResponse
ImplementShift

Changed

Makesample_rate optional. Allow specifyingsample_rate in__init__ instead offorward. This means torchaudio transforms can be used inCompose now.

Removed

Remove support for 1-dimensional and 2-dimensional audio tensors. Only 3-dimensional audiotensors are supported now.

Fixed

Fix a bug where one could not use theparameters method of thenn.Module subclass
Fix a bug where files with uppercase filename extension were not found

[v0.4.0] - 2020-11-10

Added

ImplementCompose for applying multiple transforms
Implement utility functionsfrom_dict andfrom_yaml for loading data augmentationconfigurations from dict, json or yaml
Officially support differentiability in most transforms