- Notifications
You must be signed in to change notification settings - Fork3
An end to end ASR Transformer model training repo
License
NotificationsYou must be signed in to change notification settings
MegEngine/End-to-end-ASR-Transformer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- 本项目基于transformer 6*encoder+6*decoder的基本结构构造的端到端的语音识别系统
- 1.数据准备:
- 自行下载数据,遵循文件结构如下:
├── data│ ├── train│ ├── dev│ ├── test
- 2.数据预处理:
- 运行
prepare_data.py
对数据进行预处理, 获得整个词表,每个样本音频的mel-scale-spectrogram
,文本的token-ids
- 运行
- 3.模型训练:
- 运行
train_transformer.py --ngpus 8
进行transformer网络的训练. 该网络输入mel-scale-spectrogram
, 输出token-ids
- 运行
- 4.模型推理:
- 运行
evlauate.py
在dev/test上测试准确率
- 运行
- Ashish Vaswani et al. “Attention Is All You Need” (2017).
- Abdel-rahman Mohamed et al. “Transformers with convolutional context for ASR” arXiv: Computation and Language (2019).
- Albert Zeyer et al. “Improved Training of End-to-end Attention Models for Speech Recognition” Conference of the International Speech Communication Association (2018).