- Notifications
You must be signed in to change notification settings - Fork4
xvector model on jtubespeech
License
NotificationsYou must be signed in to change notification settings
sarulab-speech/xvector_jtubespeech
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository provides a pre-trained model for extracting thex-vector (speaker representation vector). The model is trained usingJTubeSpeech corpus, a Japanese speech corpus collected from YouTube.
このリポジトリは,x-vector (話者表現ベクトル) を抽出するための学習済みモデルを提供します.このモデルは,JTubeSpeechコーパスと呼ばれる,YouTubeから収集した日本語音声から学習されています.
Instantiate the pre-trained model without explicit install as follow:
importtorchmodel=torch.hub.load("sarulab-speech/xvector_jtubespeech","xvector",trust_repo=True)
Then, follow 'Usage / 使い方' section.
- The number of speakers: 1,233
- Sampling frequency: 16,000Hz
- Speaker recognition accuracy: 91% (test data)
- Feature: 24-dimensional MFCC
- Dimensionality of x-vector: 512
- Other configurations: followed the ASV recipe for VoxCeleb in Kaldi.
- In the opensourced model, model parameters of recognition layers following to the x-vector layer were randomized to protect data privacy.
pip install xvector-jtubespeech
import numpy as npfrom scipy.io import wavfileimport torchfrom torchaudio.compliance import kaldifrom xvector_jtubespeech import XVectordef extract_xvector( model, # xvector model wav # 16kHz mono): # extract mfcc wav = torch.from_numpy(wav.astype(np.float32)).unsqueeze(0) mfcc = kaldi.mfcc(wav, num_ceps=24, num_mel_bins=24) # [1, T, 24] mfcc = mfcc.unsqueeze(0) # extract xvector xvector = model.vectorize(mfcc) # (1, 512) xvector = xvector.to("cpu").detach().numpy().copy()[0] return xvector_, wav = wavfile.read("sample.wav") # 16kHz monomodel = XVector("xvector.pth")xvector = extract_xvector(model, wav) # (512, )
- Takaki Hamada / 濱田 誉輝 (The University of Tokyo / 東京大学)
- Shinnosuke Takamichi / 高道 慎之介 (The University of Tokyo / 東京大学)
MIT
- The audio sample
sample.wav
was copied fromPJS corpus.
About
xvector model on jtubespeech
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages0
No packages published
Uh oh!
There was an error while loading.Please reload this page.
Contributors4
Uh oh!
There was an error while loading.Please reload this page.