NotificationsYou must be signed in to change notification settings
Fork5
Star11

Target speaker automatic speech recognition (TS-ASR)

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
hparams/LibriSpeechMix		hparams/LibriSpeechMix
models		models
tasks		tasks
tools		tools
vendor/speechbrain		vendor/speechbrain
.gitignore		.gitignore
NOTICE		NOTICE
README.md		README.md
librispeechmix_prepare.py		librispeechmix_prepare.py
requirements.txt		requirements.txt
speechbrain		speechbrain
train_librispeechmix_none.py		train_librispeechmix_none.py
train_librispeechmix_pretrained.py		train_librispeechmix_pretrained.py
train_librispeechmix_scratch.py		train_librispeechmix_scratch.py
utils.py		utils.py

Repository files navigation

Target Speaker Automatic Speech Recognition

ThisSpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automaticspeech recognition (TS-ASR) systems as proposed inStreaming Target-Speaker ASR with Neural Transducer.

⚡ Datasets

LibriSpeechMix

Generate the LibriSpeechMix data in<path-to-data-folder> following theofficial readme.

🛠️️ Installation

Clone the repository, navigate to<path-to-repository>, open a terminal and run:

pip install -e vendor/speechbrainpip install -r requirements.txt

▶️ Quickstart

Navigate to<path-to-repository>, open a terminal and run:

python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder>

To use multiple GPUs on the same node, run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus> \train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder> --distributed_launch

To use multiple GPUs on multiple nodes, for each node with rank0, ..., <num-nodes> - 1 run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr<rank-0-ip-addr> --master_port 5555 \train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder> --distributed_launch

Helper functions and scripts for plotting and analyzing the results can be found inutils.py andtools.

NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

Examples

nohup python -m torch.distributed.launch --nproc_per_node=8 \train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \--data_folder datasets/LibriSpeechMix --num_epochs 100 \--distributed_launch&

📧 Contact

luca.dellalib@gmail.com

About

Target speaker automatic speech recognition (TS-ASR)

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Target Speaker Automatic Speech Recognition

⚡ Datasets

LibriSpeechMix

🛠️️ Installation

▶️ Quickstart

Examples

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

lucadellalib/ts-asr

Folders and files

Latest commit

History

Repository files navigation

Target Speaker Automatic Speech Recognition

⚡ Datasets

LibriSpeechMix

🛠️️ Installation

▶️ Quickstart

Examples

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages