Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

End-to-End Neural Diarization

License

NotificationsYou must be signed in to change notification settings

hitachi-speech/EEND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EEND (End-to-End Neural Diarization) is a neural-network-based speaker diarization method.

The EEND extension for various number of speakers is also provided in this repository.

Install tools

Requirements

  • NVIDIA CUDA GPU
  • CUDA Toolkit (8.0 <= version <= 10.1)

Install kaldi and python environment

cd toolsmake
  • This command builds kaldi attools/kaldi
    • if you want to use pre-build kaldi
      cd toolsmake KALDI=<existing_kaldi_root>
      This option make a symlink attools/kaldi
  • This command extracts miniconda3 attools/miniconda3, and creates conda envirionment named 'eend'
  • Then, installs Chainer and cupy into 'eend' environment

Test recipe (mini_librispeech)

Configuration

  • Modifyegs/mini_librispeech/v1/cmd.sh according to your job schedular.If you use your local machine, use "run.pl".If you use Grid Engine, use "queue.pl"If you use SLURM, use "slurm.pl".For more information about cmd.sh seehttp://kaldi-asr.org/doc/queue.html.

Data preparation

cd egs/mini_librispeech/v1./run_prepare_shared.sh

Run training, inference, and scoring

./run.sh
  • If you use encoder-decoder based attractors [3], modifyrun.sh to useconfig/eda/{train,infer}.yaml
  • SeeRESULT.md and compare with your result.

CALLHOME two-speaker experiment

Configuraition

  • Modifyegs/callhome/v1/cmd.sh according to your job schedular.If you use your local machine, use "run.pl".If you use Grid Engine, use "queue.pl"If you use SLURM, use "slurm.pl".For more information about cmd.sh seehttp://kaldi-asr.org/doc/queue.html.
  • Modifyegs/callhome/v1/run_prepare_shared.sh according to storage paths of your corpora.

Data preparation

cd egs/callhome/v1./run_prepare_shared.sh# If you want to conduct 1-4 speaker experiments, run below.# You also have to set paths to your corpora properly../run_prepare_shared_eda.sh

Self-attention-based model using 2-speaker mixtures

./run.sh

BLSTM-based model using 2-speaker mixtures

local/run_blstm.sh

Self-attention-based model with EDA using 1-4-speaker mixtures

./run_eda.sh

References

[1] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe, "End-to-End Neural Speaker Diarization with Permutation-free Objectives," Proc. Interspeech, pp. 4300-4304, 2019

[2] Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe, "End-to-End Neural Speaker Diarization with Self-attention," Proc. ASRU, pp. 296-303, 2019

[3] Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Kenji Nagamatsu, "End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors," Proc. INTERSPEECH, 2020

Citation

@inproceedings{Fujita2019Interspeech, author={Yusuke Fujita and Naoyuki Kanda and Shota Horiguchi and Kenji Nagamatsu and Shinji Watanabe}, title={{End-to-End Neural Speaker Diarization with Permutation-free Objectives}}, booktitle={Interspeech}, pages={4300--4304} year=2019}

[8]ページ先頭

©2009-2025 Movatter.jp