sarulab-speech/xvector_jtubespeechPublic

NotificationsYou must be signed in to change notification settings
Fork4
Star45

xvector model on jtubespeech

License

MIT license

45 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
test		test
xvector_jtubespeech		xvector_jtubespeech
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample.wav		sample.wav
xvector.pth		xvector.pth

Repository files navigation

x-vector extractor for Japanese speech

This repository provides a pre-trained model for extracting thex-vector (speaker representation vector). The model is trained usingJTubeSpeech corpus, a Japanese speech corpus collected from YouTube.

このリポジトリは，x-vector (話者表現ベクトル) を抽出するための学習済みモデルを提供します．このモデルは，JTubeSpeechコーパスと呼ばれる，YouTubeから収集した日本語音声から学習されています．

Quick Usage

Instantiate the pre-trained model without explicit install as follow:

importtorchmodel=torch.hub.load("sarulab-speech/xvector_jtubespeech","xvector",trust_repo=True)

Then, follow 'Usage / 使い方' section.

Training configures / 学習時の設定

The number of speakers: 1,233
Sampling frequency: 16,000Hz
Speaker recognition accuracy: 91% (test data)
Feature: 24-dimensional MFCC
Dimensionality of x-vector: 512
Other configurations: followed the ASV recipe for VoxCeleb in Kaldi.
- In the opensourced model, model parameters of recognition layers following to the x-vector layer were randomized to protect data privacy.

Installation

pip install xvector-jtubespeech

Usage / 使い方

import numpy as npfrom scipy.io import wavfileimport torchfrom torchaudio.compliance import kaldifrom xvector_jtubespeech import XVectordef extract_xvector(  model, # xvector model  wav   # 16kHz mono):  # extract mfcc  wav = torch.from_numpy(wav.astype(np.float32)).unsqueeze(0)  mfcc = kaldi.mfcc(wav, num_ceps=24, num_mel_bins=24) # [1, T, 24]  mfcc = mfcc.unsqueeze(0)  # extract xvector  xvector = model.vectorize(mfcc) # (1, 512)  xvector = xvector.to("cpu").detach().numpy().copy()[0]    return xvector_, wav = wavfile.read("sample.wav") # 16kHz monomodel = XVector("xvector.pth")xvector = extract_xvector(model, wav) # (512, )

Contributors / 貢献者

Takaki Hamada / 濱田誉輝 (The University of Tokyo / 東京大学)
Shinnosuke Takamichi / 高道慎之介 (The University of Tokyo / 東京大学)

License / ライセンス

MIT

Others / その他

The audio samplesample.wav was copied fromPJS corpus.

About

xvector model on jtubespeech

Releases

No releases published

Packages

No packages published

Contributors4

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

x-vector extractor for Japanese speech

Quick Usage

Training configures / 学習時の設定

Installation

Usage / 使い方

Contributors / 貢献者

License / ライセンス

Others / その他

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

License

sarulab-speech/xvector_jtubespeech

Folders and files

Latest commit

History

Repository files navigation

x-vector extractor for Japanese speech

Quick Usage

Training configures / 学習時の設定

Installation

Usage / 使い方

Contributors / 貢献者

License / ライセンス

Others / その他

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Uh oh!

Languages

Packages