Dataset Creation Tool Based on CTC-Segmentation#

This tool provides functionality to align long audio files with the corresponding transcripts and split them into shorter fragmentsthat are suitable for an Automatic Speech Recognition (ASR) model training.

More details could be found inNeMo/tutorials/tools/CTC_Segmentation_Tutorial.ipynb (can be executed withGoogle’s Colab).

The tool is based on theCTC-Segmentation package andCTC-Segmentation of Large Corpora for German End-to-end Speech Recognition[TOOLS1]

References#

[TOOLS1]

Ludwig Kürzinger, Dominik Winkelbauer, Lujun Li, Tobias Watzel, and Gerhard Rigoll. Ctc-segmentation of large corpora for german end-to-end speech recognition. InInternational Conference on Speech and Computer, 267–278. Springer, 2020.

On this page