Movatterモバイル変換

modelscope/3D-SpeakerPublic

NotificationsYou must be signed in to change notification settings
Fork149
Star1.8k

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

License

Apache-2.0 license

1.8k stars 149 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
docs/images		docs/images
egs		egs
runtime/onnxruntime		runtime/onnxruntime
speakerlab		speakerlab
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Repository files navigation

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible onModelScope. Furthermore, we present a large-scale speech corpus also called3D-Speaker-Dataset to facilitate the research of speech representation disentanglement.

Benchmark

The EER results on VoxCeleb, CNCeleb and 3D-Speaker datasets for fully-supervised speaker verification.

Model	Params	VoxCeleb1-O	CNCeleb	3D-Speaker
Res2Net	4.03 M	1.56%	7.96%	8.03%
ResNet34	6.34 M	1.05%	6.92%	7.29%
ECAPA-TDNN	20.8 M	0.86%	8.01%	8.87%
ERes2Net-base	6.61 M	0.84%	6.69%	7.21%
CAM++	7.2 M	0.65%	6.78%	7.75%
ERes2NetV2	17.8M	0.61%	6.14%	6.52%
ERes2Net-large	22.46 M	0.52%	6.17%	6.34%

The DER results on public and internal multi-speaker datasets for speaker diarization.

Test	3D-Speaker	pyannote.audio	DiariZen_WavLM
Aishell-4	10.30%	12.2%	11.7%
Alimeeting	19.73%	24.4%	17.6%
AMI_SDM	21.76%	22.4%	15.4%
VoxConverse	11.75%	11.3%	28.39%
Meeting-CN_ZH-1	18.91%	22.37%	32.66%
Meeting-CN_ZH-2	12.78%	17.86%	18%

Quickstart

Install 3D-Speaker

git clone https://github.com/modelscope/3D-Speaker.git&&cd 3D-Speakerconda create -n 3D-Speaker python=3.8conda activate 3D-Speakerpip install -r requirements.txt

Running experiments

# Speaker verification: ERes2NetV2 on 3D-Speaker datasetcd egs/3dspeaker/sv-eres2netv2/bash run.sh# Speaker verification: CAM++ on 3D-Speaker datasetcd egs/3dspeaker/sv-cam++/bash run.sh# Speaker verification: ECAPA-TDNN on 3D-Speaker datasetcd egs/3dspeaker/sv-ecapa/bash run.sh# Self-supervised speaker verification: SDPN on VoxCeleb datasetcd egs/voxceleb/sv-sdpn/bash run.sh# Audio and multimodal Speaker diarization:cd egs/3dspeaker/speaker-diarization/bash run_audio.shbash run_video.sh# Language identificationcd egs/3dspeaker/language-idenitficationbash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released onModelscope.

# Install modelscopepip install modelscope# ERes2Net trained on 200k labeled speakersmodel_id=iic/speech_eres2net_sv_zh-cn_16k-common# ERes2NetV2 trained on 200k labeled speakersmodel_id=iic/speech_eres2netv2_sv_zh-cn_16k-common# CAM++ trained on 200k labeled speakersmodel_id=iic/speech_campplus_sv_zh-cn_16k-common# Run CAM++ or ERes2Net inferencepython speakerlab/bin/infer_sv.py --model_id$model_id# Run batch inferencepython speakerlab/bin/infer_sv_batch.py --model_id$model_id --wavs$wav_list# SDPN trained on VoxCelebmodel_id=iic/speech_sdpn_ecapa_tdnn_sv_en_voxceleb_16k# Run SDPN inferencepython speakerlab/bin/infer_sv_ssl.py --model_id$model_id# Run diarization inferencepython speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir$out_dir# Enable overlap detectionpython speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir$out_dir --include_overlap --hf_access_token$hf_access_token

Overview of Content

Supervised Speaker Verification
- CAM++,ERes2Net,ERes2NetV2,ECAPA-TDNN,ResNet andRes2Net training recipes on3D-Speaker.
- CAM++,ERes2Net,ERes2NetV2,ECAPA-TDNN,ResNet andRes2Net training recipes onVoxCeleb.
- CAM++,ERes2Net,ERes2NetV2,ECAPA-TDNN,ResNet andRes2Net training recipes onCN-Celeb.
Self-supervised Speaker Verification
- RDINO andSDPN training recipes onVoxCeleb
- RDINO training recipes on3D-Speaker.
- RDINO training recipes onCN-Celeb.
Speaker Diarization
- Speaker diarization inference recipes which comprise multiple modules, including overlap detection[optional], voice activity detection, speech segmentation, speaker embedding extraction, and speaker clustering.
Language Identification
- Language identification training recipes on3D-Speaker.
3D-Speaker Dataset
- Dataset introduction and download address:3D-Speaker
- Related paper address:3D-Speaker

What‘s new 🔥

[2024.12] Updatediarization recipes and add results on multiple diarization benchmarks.
[2024.8] ReleasingERes2NetV2 andERes2NetV2_w24s4ep4 pretrained models trained on 200k-speaker datasets.
[2024.5] ReleasingSDPN model andX-vector model training and inference recipes for VoxCeleb.
[2024.5] Releasingvisual module andsemantic module training recipes.
[2024.4] ReleasingONNX Runtime and the relevant scripts for inference.
[2024.4] ReleasingERes2NetV2 model with lower parameters and faster inference speed on VoxCeleb datasets.
[2024.2] Releasinglanguage identification integrating phonetic information recipes for more higher recognition accuracy.
[2024.2] Releasingmultimodal diarization recipes which fuses audio and video image input to produce more accurate results.
[2024.1] ReleasingResNet34 andRes2Net model training and inference recipes for 3D-Speaker, VoxCeleb and CN-Celeb datasets.
[2024.1] Releasinglarge-margin finetune recipes in speaker verification and addingdiarization recipes.
[2023.11]ERes2Net-base pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.
[2023.10] ReleasingECAPA model training and inference recipes for three datasets.
[2023.9] ReleasingRDINO model training and inference recipes forCN-Celeb.
[2023.8] ReleasingCAM++,ERes2Net-Base andERes2Net-Large benchmarks inCN-Celeb.
[2023.8] ReleasingERes2Net anndCAM++ in language identification for Mandarin and English.
[2023.7] ReleasingCAM++,ERes2Net-Base,ERes2Net-Large pretrained models trained on3D-Speaker.
[2023.7] ReleasingDialogue Detection andSemantic Speaker Change Detection in speaker diarization.
[2023.7] ReleasingCAM++ in language identification for Mandarin and English.
[2023.6] Releasing3D-Speaker dataset and its corresponding benchmarks includingERes2Net,CAM++ andRDINO.
[2023.5]ERes2Net andCAM++ pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.

Contact

If you have any comment or question about 3D-Speaker, please contact us by

email: {yfchen97, wanghuii}@mail.ustc.edu.cn, {dengchong.d, zsq174630, shuli.cly}@alibaba-inc.com

License

3D-Speaker is released under theApache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:
Speechbrain,Wespeaker,D-TDNN,DINO,Vicreg,TalkNet-ASD,Ultra-Light-Fast-Generic-Face-Detector-1MB,pyannote.audio

Citations

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{chen20243d,title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},booktitle={ICASSP},year={2025}}

About

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Benchmark

Quickstart

Install 3D-Speaker

Running experiments

Inference using pretrained models from Modelscope

Overview of Content

What‘s new 🔥

Contact

License

Acknowledge

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Contributors8

Languages

Movatterモバイル変換

License

modelscope/3D-Speaker

Folders and files

Latest commit

History

Repository files navigation

Benchmark

Quickstart

Install 3D-Speaker

Running experiments

Inference using pretrained models from Modelscope

Overview of Content

What‘s new 🔥

Contact

License

Acknowledge

Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors8

Languages

Packages