Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

License

NotificationsYou must be signed in to change notification settings

modelscope/3D-Speaker

Repository files navigation



license

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible onModelScope. Furthermore, we present a large-scale speech corpus also called3D-Speaker-Dataset to facilitate the research of speech representation disentanglement.

Benchmark

The EER results on VoxCeleb, CNCeleb and 3D-Speaker datasets for fully-supervised speaker verification.

ModelParamsVoxCeleb1-OCNCeleb3D-Speaker
Res2Net4.03 M1.56%7.96%8.03%
ResNet346.34 M1.05%6.92%7.29%
ECAPA-TDNN20.8 M0.86%8.01%8.87%
ERes2Net-base6.61 M0.84%6.69%7.21%
CAM++7.2 M0.65%6.78%7.75%
ERes2NetV217.8M0.61%6.14%6.52%
ERes2Net-large22.46 M0.52%6.17%6.34%

The DER results on public and internal multi-speaker datasets for speaker diarization.

Test3D-Speakerpyannote.audioDiariZen_WavLM
Aishell-410.30%12.2%11.7%
Alimeeting19.73%24.4%17.6%
AMI_SDM21.76%22.4%15.4%
VoxConverse11.75%11.3%28.39%
Meeting-CN_ZH-118.91%22.37%32.66%
Meeting-CN_ZH-212.78%17.86%18%

Quickstart

Install 3D-Speaker

git clone https://github.com/modelscope/3D-Speaker.git&&cd 3D-Speakerconda create -n 3D-Speaker python=3.8conda activate 3D-Speakerpip install -r requirements.txt

Running experiments

# Speaker verification: ERes2NetV2 on 3D-Speaker datasetcd egs/3dspeaker/sv-eres2netv2/bash run.sh# Speaker verification: CAM++ on 3D-Speaker datasetcd egs/3dspeaker/sv-cam++/bash run.sh# Speaker verification: ECAPA-TDNN on 3D-Speaker datasetcd egs/3dspeaker/sv-ecapa/bash run.sh# Self-supervised speaker verification: SDPN on VoxCeleb datasetcd egs/voxceleb/sv-sdpn/bash run.sh# Audio and multimodal Speaker diarization:cd egs/3dspeaker/speaker-diarization/bash run_audio.shbash run_video.sh# Language identificationcd egs/3dspeaker/language-idenitficationbash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released onModelscope.

# Install modelscopepip install modelscope# ERes2Net trained on 200k labeled speakersmodel_id=iic/speech_eres2net_sv_zh-cn_16k-common# ERes2NetV2 trained on 200k labeled speakersmodel_id=iic/speech_eres2netv2_sv_zh-cn_16k-common# CAM++ trained on 200k labeled speakersmodel_id=iic/speech_campplus_sv_zh-cn_16k-common# Run CAM++ or ERes2Net inferencepython speakerlab/bin/infer_sv.py --model_id$model_id# Run batch inferencepython speakerlab/bin/infer_sv_batch.py --model_id$model_id --wavs$wav_list# SDPN trained on VoxCelebmodel_id=iic/speech_sdpn_ecapa_tdnn_sv_en_voxceleb_16k# Run SDPN inferencepython speakerlab/bin/infer_sv_ssl.py --model_id$model_id# Run diarization inferencepython speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir$out_dir# Enable overlap detectionpython speakerlab/bin/infer_diarization.py --wav [wav_list OR wav_path] --out_dir$out_dir --include_overlap --hf_access_token$hf_access_token

Overview of Content

What‘s new 🔥

Contact

If you have any comment or question about 3D-Speaker, please contact us by

  • email: {yfchen97, wanghuii}@mail.ustc.edu.cn, {dengchong.d, zsq174630, shuli.cly}@alibaba-inc.com

License

3D-Speaker is released under theApache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:
Speechbrain,Wespeaker,D-TDNN,DINO,Vicreg,TalkNet-ASD,Ultra-Light-Fast-Generic-Face-Detector-1MB,pyannote.audio

Citations

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{chen20243d,title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},booktitle={ICASSP},year={2025}}

About

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp