Movatterモバイル変換

BUTSpeechFIT/DiariZenPublic

NotificationsYou must be signed in to change notification settings
Fork12
Star172

A toolkit for speaker diarization.

License

MIT license

172 stars 12 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
diarizen		diarizen
dscore @ 824f126		dscore @ 824f126
example		example
pyannote-audio		pyannote-audio
recipes/diar_ssl		recipes/diar_ssl
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

DiariZen

DiariZen is a speaker diarization toolkit driven byAudioZen andPyannote 3.1.

Installation

# create virtual python environmentconda create --name diarizen python=3.10conda activate diarizen# install diarizen conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidiapip install -r requirements.txt && pip install -e .# install pyannote-audiocd pyannote-audio && pip install -e .[dev,testing]# install dscoregit submodule initgit submodule update

Datasets

We useSDM (first channel from the first far-field microphone array) data from publicAMI,AISHELL-4, andAliMeeting for model training and evaluation. Please download these datasets firstly. Our data partition ishere.

Usage

downloadWavLM Base+ model
downloadResNet34-LM model
modify the path of used dataset and configuration file
cd recipes/diar_ssl && bash -i run_stage.sh

Pre-trained

Our pre-trained checkpoints and the estimated rttm files can be foundhere. The local experimental path has been anonymized. To use the pre-trained models, please check thediar_ssl/run_stage.sh.
In case you have trouble reproducing our experiments, we also provide theintermediate results ofEN2002a, an AMI test recording, during inference for debugging.
Our model also supports forHugging Face 🤗. Please checkexample/run_example.py.

Results (SDM)

We aim to make the whole pipeline as simple as possible. Therefore, for the results below:

wedid not use any simulated data
wedid not apply advanced learning scheduler strategies
wedid not perform further domain adaptation to each dataset
all experiments share thesame hyper-parameters for clustering

collar=0s                           --------------------------------------------------------------System         Features       AMI   AISHELL-4   AliMeeting         --------------------------------------------------------------Pyannote3       SincNet       21.1     13.9       22.8Proposed         Fbank        19.7     12.5       21.0              WavLM-frozen    17.0     11.7       19.9              WavLM-updated   15.4     11.7       17.6--------------------------------------------------------------collar=0.25s --------------------------------------------------------------System         Features       AMI   AISHELL-4   AliMeeting         --------------------------------------------------------------Pyannote3       SincNet       13.7     7.7       13.6Proposed         Fbank        12.9     6.9       12.6              WavLM-frozen    10.9     6.1       12.0              WavLM-updated    9.8     5.9       10.2--------------------------------------------------------------Note:The results above are different from our ICASSP submission. We made a few updates to experimental numbers but the conclusions in our paper are as same as the original ones.

Citation

If you found this work helpful, please consider citing:J. Han, F. Landini, J. Rohdin, A. Silnova, M. Diez, and L. Burget,Leveraging Self-Supervised Learning for Speaker Diarization, in Proc. ICASSP, 2025.

@inproceedings{han2025leveraging,  title={Leveraging self-supervised learning for speaker diarization},  author={Han, Jiangyu and Landini, Federico and Rohdin, Johan and Silnova, Anna and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},  booktitle={Proc. ICASSP},  year={2025}}

License

This repository under theMIT license.

Contact

If you have any comment or question, please contactihan@fit.vut.cz

About

A toolkit for speaker diarization.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

DiariZen

Installation

Datasets

Usage

Pre-trained

Results (SDM)

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

BUTSpeechFIT/DiariZen

Folders and files

Latest commit

History

Repository files navigation

DiariZen

Installation

Datasets

Usage

Pre-trained

Results (SDM)

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages