- Notifications
You must be signed in to change notification settings - Fork0
A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations
License
NotificationsYou must be signed in to change notification settings
zabir-nabil/torch-speech-dataloader
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations.
- PyTorch speech dataloader with 5 (or less) lines of code.
get_torch_speech_dataloader_from_config(config)
- Batch augmentation in GPU, powered bytorch-audiomentations
- RIRs augmentation with any set of IR file(s) [cpu]
- MUSAN-like augmentation with any set of source files. Customizable. [cpu]
- Written in one night, may contain bugs!
pip install -U git+https://github.com/zabir-nabil/torch-speech-dataloader.git@main
fromtorch_speech_dataloaderimportget_torch_speech_dataloader,get_torch_speech_dataloader_from_configfromtorch_speech_dataloader.augmentation_utilsimportplaceholder_gpu_augmentationconfig_1= {"filenames" : ["../test.wav"]*5+ ["../test_hindi.wav"]*5,"speech_labels" : ["test"]*5+ ["test2"]*5,"batch_size" :3,"num_workers" :5,"device" :torch.device('cuda:1'),"sanity_check_path" :"../sanity_test","sanity_check_samples" :2,"batch_audio_augmentation":placeholder_gpu_augmentation,"rirs_reverb" : {"apply":True},"musan_augmentation" : {"apply":True,"mix_multiples_max_count":-1,"musan_max_len":1.},"verbose" :0}dummy_tsdl=get_torch_speech_dataloader_from_config(config_1)ford,lindummy_tsdl.get_batch():print(d.shape)print(l)
filenames
: A list of filepaths for the audio / speech files (usually wav).speech_labels
: Corresponding labels forfilenames
/ list of audio files.batch_size
: Batch size of the dataloader.num_workers
: Dataloader workers.device
: torch device [default:cpu].sanity_check_path
: If you want to look at the sample audio files generated, specify a path where the sample augmented audio files will be saved.sanity_check_samples
: Number of sample audio files to store in the sanity check folder.batch_audio_augmentation
: Usually, it will run on the GPU batch if gpu device is specified, else on the CPU batch. Any transform (compose) / augmentation, that takes a tensor of dimension[B x C x N].rirs_reverb
:apply
: If apply is true, only then this augmentation will be applied to each audio individually.reverb_source_files_path
: A list of IR filepaths.
musan_augmentation
:apply
: If apply is true, only then this augmentation will be applied to each audio individually.musan_config
:{ "music": ([list of music file paths], range_for_num_music_files_to_use, range_for_noise_snr), "speech": ([list of speech file paths], range_for_num_speech_files_to_use, range_for_noise_snr), }
[example: augmentation_utils.placeholder_musan_config]
mix_multiples_max_count
: Multiple noise types should be mixed (music + noise +...
). Number of noise types that should be mixed at most.musan_max_len
:<= 0
: take the musan noise and crop it with equal length (same as input audio);> 0
: maximum length of the cropped musan noise (in secs.).
audio_augmentation
: List offunc
s that can be applied to a single audio with shape[N,].features
: Feature extraction.[N,] ->[T,F].feature_augmentation
: List offunc
s that can be applied to a single feature with shape[T,F].
About
A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.