hitachi-speech/EENDPublic

NotificationsYou must be signed in to change notification settings
Fork59
Star397

End-to-End Neural Diarization

License

MIT license

397 stars 59 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
eend		eend
egs		egs
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

EEND (End-to-End Neural Diarization)

EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

BLSTM EEND (INTERSPEECH 2019)
- https://www.isca-speech.org/archive/Interspeech_2019/abstracts/2899.html
Self-attentive EEND (ASRU 2019)
- https://ieeexplore.ieee.org/abstract/document/9003959/

The EEND extension for various number of speakers is also provided in this repository.

Self-attentive EEND with encoder-decoder based attractors
- https://arxiv.org/abs/2005.09921

Install tools

Requirements

NVIDIA CUDA GPU
CUDA Toolkit (8.0 <= version <= 10.1)

Install kaldi and python environment

cd toolsmake

This command builds kaldi attools/kaldi
- if you want to use pre-build kaldi
```
cd toolsmake KALDI=<existing_kaldi_root>
```
  This option make a symlink attools/kaldi
This command extracts miniconda3 attools/miniconda3, and creates conda envirionment named 'eend'
Then, installs Chainer and cupy into 'eend' environment
- use CUDA in/usr/local/cuda/
  - if you need to specify your CUDA path
```
cd toolsmake CUDA_PATH=/your/path/to/cuda-8.0
```
    This command installs cupy-cudaXX according to your CUDA version.Seehttps://docs-cupy.chainer.org/en/stable/install.html#install-cupy

Test recipe (mini_librispeech)

Configuration

Modifyegs/mini_librispeech/v1/cmd.sh according to your job schedular.If you use your local machine, use "run.pl".If you use Grid Engine, use "queue.pl"If you use SLURM, use "slurm.pl".For more information about cmd.sh seehttp://kaldi-asr.org/doc/queue.html.

Data preparation

cd egs/mini_librispeech/v1./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh

If you use encoder-decoder based attractors [3], modifyrun.sh to useconfig/eda/{train,infer}.yaml
SeeRESULT.md and compare with your result.

CALLHOME two-speaker experiment

Configuraition

Modifyegs/callhome/v1/cmd.sh according to your job schedular.If you use your local machine, use "run.pl".If you use Grid Engine, use "queue.pl"If you use SLURM, use "slurm.pl".For more information about cmd.sh seehttp://kaldi-asr.org/doc/queue.html.
Modifyegs/callhome/v1/run_prepare_shared.sh according to storage paths of your corpora.

Data preparation

cd egs/callhome/v1./run_prepare_shared.sh# If you want to conduct 1-4 speaker experiments, run below.# You also have to set paths to your corpora properly../run_prepare_shared_eda.sh

Self-attention-based model using 2-speaker mixtures

./run.sh

BLSTM-based model using 2-speaker mixtures

local/run_blstm.sh

Self-attention-based model with EDA using 1-4-speaker mixtures

./run_eda.sh

References

[1] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe, "End-to-End Neural Speaker Diarization with Permutation-free Objectives," Proc. Interspeech, pp. 4300-4304, 2019

[2] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019

[3] Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, "End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. INTERSPEECH, 2020

Citation

@inproceedings{Fujita2019Interspeech, author={Yusuke Fujita and Naoyuki Kanda and Shota Horiguchi and Kenji Nagamatsu and Shinji Watanabe}, title={{End-to-End Neural Speaker Diarization with Permutation-free Objectives}}, booktitle={Interspeech}, pages={4300--4304} year=2019}

About

End-to-End Neural Diarization

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

EEND (End-to-End Neural Diarization)

Install tools

Requirements

Install kaldi and python environment

Test recipe (mini_librispeech)

Configuration

Data preparation

Run training, inference, and scoring

CALLHOME two-speaker experiment

Configuraition

Data preparation

Self-attention-based model using 2-speaker mixtures

BLSTM-based model using 2-speaker mixtures

Self-attention-based model with EDA using 1-4-speaker mixtures

References

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases1

Packages

Contributors2

Languages

Movatterモバイル変換

License

hitachi-speech/EEND

Folders and files

Latest commit

History

Repository files navigation

EEND (End-to-End Neural Diarization)

Install tools

Requirements

Install kaldi and python environment

Test recipe (mini_librispeech)

Configuration

Data preparation

Run training, inference, and scoring

CALLHOME two-speaker experiment

Configuraition

Data preparation

Self-attention-based model using 2-speaker mixtures

BLSTM-based model using 2-speaker mixtures

Self-attention-based model with EDA using 1-4-speaker mixtures

References

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases1

Packages0

Contributors2

Languages

Packages