Movatterモバイル変換

TaoRuijie/ECAPA-TDNNPublic

NotificationsYou must be signed in to change notification settings
Fork136
Star766

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

License

MIT license

766 stars 136 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
exps		exps
Deep learning based speaker recognition tutorial_Ruijie.pdf		Deep learning based speaker recognition tutorial_Ruijie.pdf
ECAPAModel.py		ECAPAModel.py
LICENSE.md		LICENSE.md
README.md		README.md
dataLoader.py		dataLoader.py
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
tools.py		tools.py
trainECAPAModel.py		trainECAPAModel.py

Repository files navigation

Introduction

This repository contains my unofficial reimplementation of the standardECAPA-TDNN, which is the speaker recognition in VoxCeleb2 dataset.

This repository is modified based onvoxceleb_trainer.

Best Performance in this project (with AS-norm)

Dataset	Vox1_O	Vox1_E	Vox1_H
EER	0.86	1.18	2.17
minDCF	0.0686	0.0765	0.1295

Notice, this result is in the Vox1_O clean list, for Vox1_O Noise list: EER is 1.00 and minDCF is 0.0713.

System Description

I have uploaded thesystem description, please check the Session 3,ECAPA-TDNN SYSTEM.

Dependencies

Note: That is the setting based on my device, you can modify the torch and torchaudio version based on your device.

Start from building the environment

conda create -n ECAPA python=3.7.9 anacondaconda activate ECAPApip install -r requirements.txt

Start from the existing environment

pip install -r requirements.txt

Data preparation

Please follow the official code to perpare your VoxCeleb2 dataset from the 'Data preparation' part inthis repository.

Dataset for training usage:

VoxCeleb2 training set;
MUSAN dataset;
RIR dataset.

Dataset for evaluation:

VoxCeleb1 test set forVox1_O
VoxCeleb1 train set forVox1_E andVox1_H (Optional)

Training

Then you can change the data path in thetrainECAPAModel.py. Train ECAPA-TDNN model end-to-end by using:

python trainECAPAModel.py --save_path exps/exp1

Everytest_step epoches, system will be evaluated in Vox1_O set and print the EER.

The result will be saved inexps/exp1/score.txt. The model will saved inexps/exp1/model

In my case, I trained 80 epoches in one 3090 GPU. Each epoch takes 37 mins, the total training time is about 48 hours.

Pretrained model

Our pretrained model performsEER: 0.96 in Vox1_O set without AS-norm, you can check it by using:

python trainECAPAModel.py --eval --initial_model exps/pretrain.model

With AS-norm, this system performsEER: 0.86. We will not update this code recently since no enough time for this work. I suggest you the following paper if you want to add AS-norm or other norm methods:

Matejka, Pavel, et al. "Analysis of Score Normalization in Multilingual Speaker Recognition." INTERSPEECH. 2017.

We also update the score.txt file inexps/pretrain_score.txt, it contains the training loss, training acc and EER in Vox1_O in each epoch for your reference.

Reference

Original ECAPA-TDNN paper

@inproceedings{desplanques2020ecapa,  title={{ECAPA-TDNN: Emphasized Channel Attention, propagation and aggregation in TDNN based speaker verification}},  author={Desplanques, Brecht and Thienpondt, Jenthe and Demuynck, Kris},  booktitle={Interspeech 2020},  pages={3830--3834},  year={2020}}

Our reimplement report

@article{das2021hlt,  title={HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE},  author={Das, Rohan Kumar and Tao, Ruijie and Li, Haizhou},  journal={arXiv preprint arXiv:2111.06671},  year={2021}}

VoxCeleb_trainer paper

@inproceedings{chung2020in,  title={In defence of metric learning for speaker recognition},  author={Chung, Joon Son and Huh, Jaesung and Mun, Seongkyu and Lee, Minjae and Heo, Hee Soo and Choe, Soyeon and Ham, Chiheon and Jung, Sunghwan and Lee, Bong-Jin and Han, Icksang},  booktitle={Interspeech},  year={2020}}

Acknowledge

We study many useful projects in our codeing process, which includes:

clovaai/voxceleb_trainer.

lawlict/ECAPA-TDNN.

speechbrain/speechbrain

ranchlai/speaker-verification

Thanks for these authors to open source their code!

Notes

If you meet the problems about this repository,Please ask me from the 'issue' part in Github (using English) instead of sending the messages to me from bilibili, so others can also benifit from it. Thanks for your understanding!

If you improve the result based on this repository by some methods, please let me know. Thanks!

Cooperation

If you are interested to work on this topic and have some ideas to implement, I am glad to collaborate and contribute with my experiences & knowlegde in this topic. Please contact me withruijie.tao@u.nus.edu.

About

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Introduction

Best Performance in this project (with AS-norm)

System Description

Dependencies

Data preparation

Training

Pretrained model

Reference

Acknowledge

Notes

Cooperation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

TaoRuijie/ECAPA-TDNN

Folders and files

Latest commit

History

Repository files navigation

Introduction

Best Performance in this project (with AS-norm)

System Description

Dependencies

Data preparation

Training

Pretrained model

Reference

Acknowledge

Notes

Cooperation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages