Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Target speaker automatic speech recognition (TS-ASR)

NotificationsYou must be signed in to change notification settings

lucadellalib/ts-asr

Repository files navigation

Python version: 3.6 | 3.7 | 3.8 | 3.9 | 3.10 | 3.11

ThisSpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automaticspeech recognition (TS-ASR) systems as proposed inStreaming Target-Speaker ASR with Neural Transducer.


⚡ Datasets

LibriSpeechMix

Generate the LibriSpeechMix data in<path-to-data-folder> following theofficial readme.


🛠️️ Installation

Clone the repository, navigate to<path-to-repository>, open a terminal and run:

pip install -e vendor/speechbrainpip install -r requirements.txt

▶️ Quickstart

Navigate to<path-to-repository>, open a terminal and run:

python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder>

To use multiple GPUs on the same node, run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus> \train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder> --distributed_launch

To use multiple GPUs on multiple nodes, for each node with rank0, ..., <num-nodes> - 1 run:

python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr<rank-0-ip-addr> --master_port 5555 \train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder<path-to-data-folder> --distributed_launch

Helper functions and scripts for plotting and analyzing the results can be found inutils.py andtools.

NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training,gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).

Examples

nohup python -m torch.distributed.launch --nproc_per_node=8 \train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \--data_folder datasets/LibriSpeechMix --num_epochs 100 \--distributed_launch&

📧 Contact

luca.dellalib@gmail.com


About

Target speaker automatic speech recognition (TS-ASR)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp