Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

xvector model on jtubespeech

License

NotificationsYou must be signed in to change notification settings

sarulab-speech/xvector_jtubespeech

Repository files navigation

This repository provides a pre-trained model for extracting thex-vector (speaker representation vector). The model is trained usingJTubeSpeech corpus, a Japanese speech corpus collected from YouTube.

このリポジトリは,x-vector (話者表現ベクトル) を抽出するための学習済みモデルを提供します.このモデルは,JTubeSpeechコーパスと呼ばれる,YouTubeから収集した日本語音声から学習されています.

Quick Usage

Instantiate the pre-trained model without explicit install as follow:

importtorchmodel=torch.hub.load("sarulab-speech/xvector_jtubespeech","xvector",trust_repo=True)

Then, follow 'Usage / 使い方' section.

Training configures / 学習時の設定

  • The number of speakers: 1,233
  • Sampling frequency: 16,000Hz
  • Speaker recognition accuracy: 91% (test data)
  • Feature: 24-dimensional MFCC
  • Dimensionality of x-vector: 512
  • Other configurations: followed the ASV recipe for VoxCeleb in Kaldi.
    • In the opensourced model, model parameters of recognition layers following to the x-vector layer were randomized to protect data privacy.

Installation

pip install xvector-jtubespeech

Usage / 使い方

import numpy as npfrom scipy.io import wavfileimport torchfrom torchaudio.compliance import kaldifrom xvector_jtubespeech import XVectordef extract_xvector(  model, # xvector model  wav   # 16kHz mono):  # extract mfcc  wav = torch.from_numpy(wav.astype(np.float32)).unsqueeze(0)  mfcc = kaldi.mfcc(wav, num_ceps=24, num_mel_bins=24) # [1, T, 24]  mfcc = mfcc.unsqueeze(0)  # extract xvector  xvector = model.vectorize(mfcc) # (1, 512)  xvector = xvector.to("cpu").detach().numpy().copy()[0]    return xvector_, wav = wavfile.read("sample.wav") # 16kHz monomodel = XVector("xvector.pth")xvector = extract_xvector(model, wav) # (512, )

Contributors / 貢献者

  • Takaki Hamada / 濱田 誉輝 (The University of Tokyo / 東京大学)
  • Shinnosuke Takamichi / 高道 慎之介 (The University of Tokyo / 東京大学)

License / ライセンス

MIT

Others / その他

  • The audio samplesample.wav was copied fromPJS corpus.

About

xvector model on jtubespeech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors4

  •  
  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp