Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Fatcord's Alternative WaveRNN (Faster training)

License

NotificationsYou must be signed in to change notification settings

geneing/WaveRNN-Pytorch

 
 

Repository files navigation

This repository is a fork of Fatcord'sAlternative WaveRNN implementation.The original model has been significantly simplified to allow real-time synthesis of high fidelity speech. This repositoryalso contains a C++ library that can be used for real-time speech synthesis on a single CPU core.

WaveRNN-Pytorch is avocoder - it converts from speech features (i.e. mel spectrograms) to speech sound. On can build acomplete text-to-speech pipeline by using, for example, Tacotron-2 to turn text into speech features, and then use thisvocoder to produce a sound file.

Highlights

  • 10 bit quantized wav modeling for higher quality
  • Weight pruning for reducing model complexity
  • Fast, CPU only, C++ inference library running faster than real time on modern cpu.
  • Compressed pruned weight format to make weight files small
  • Python bindings for the C++ library
  • Can be used with a Tacotron-2 implementation for TTS.

Planned

  • Real time inference on modern ARM processors (e.g. inference on smartphone for high quality TTS)

Audio Samples

  • See Wiki

Pretrained Checkpoints

  • See "model_outputs" directory

Requirements

Training:

  • Python 3
  • CUDA >=8.0
  • PyTorch >= v1.0
  • Python requirements:

pip install -r requirements.txt

  • sudo aptitude install libsoundtouch-dev

C++ library

  • cmake, gcc, etc
  • Eigen3 development files

apt-get install libeigen3-dev

Installation

Ensure above requirements are met.

git clone https://github.com/geneing/WaveRNN-Pytorch.gitcd WaveRNN-Pytorchpip3 install -r requirements.txt

Build C++ library

cd librarymkdir buildcd buildcmake ../srcmakecp WaveRNNVocoder*.so python_install_directory

Usage

1. Adjusting Hyperparameters

Before running scripts, one can adjust hyperparameters inhparams.py.

Some hyperparameters that you might want to adjust:

  • input_type (best performing ones are currentlybits andraw, seehparams.py for more details)
  • batch_size - depends on your GPU memory. For 8GB memory, you should use batch_size=64
  • save_every_step (checkpoint saving frequency)
  • evaluate_every_step (evaluation frequency)
  • seq_len_factor (sequence length of training audio, the longer the more GPU it takes)

2. Preprocessing

Using TTS preprocessing

If you are planning to use this vocoder together with a TTS network (e.g. Tacotron-2) you should train on exactly the same data.Each implementation of TTS network uses slightly different definition of "mel-spectrogram". I recommend using TTS preoprocessing.

This code has been tested withTacotron-2 and M-AILABS dataset. Example:

cd Tacotron-2python3 preprocess.py --dataset='M-AILABS' --language='en_US' --voice='female' --reader='mary_ann' --merge_books=True --output training_data

Using WaveRNN-Pytorch preprocessing

If you are using vocoder as standalone library you can use native preprocessing.This function processes raw wav files into corresponding mel-spectrogram and wav files according to the audio processing hyperparameters.

Example usage:

python3 preprocess.py --output_dir training_data /path/to/my/wav/files

This will process all the.wav files in the folder/path/to/my/wav/files and save them in the default local directory calleddata_dir.

3. Training

Start training process. checkpoints are by default stored in the local directorycheckpoints.The script will automatically save a checkpoint when terminated bycrtl + c.

Example 1: starting a new model for training from Tacotron-2 data

python3 train.py --dataset Tacotron training_data

training_data is the directory containing the processed files.

Example 2: starting a new model for training

python3 train.py --dataset Audiobooks training_data

Example 3: Restoring training from checkpoint

python3 train.py training_data --checkpoint=checkpoints/checkpoint0010000.pth

Evaluation.wav files and plots are saved incheckpoints/eval.

4. Converting model for C++ library

First you need to train the model for at least (hparams.start_prune+hparams.prune_steps) stepsto ensure that the model is properly pruned.

In order to use C++ library you need to convert the trained network to compressed model format.

python3 convert_model.py --output-dir model_outputs checkpoints/checkpoint_step000400000.pth

Example 1: Use python3 interface to the C++ library

import WaveRNNVocoderimport numpy as npvocoder=WaveRNNVocoder.Vocoder()vocoder.loadWeights('model_outputs/model.bin')mel = np.load(fname) #make sure that mel.shape[0] == hparams.num_melswav = vocoder.melToWav(mel)

About

Fatcord's Alternative WaveRNN (Faster training)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python53.0%
  • C++44.4%
  • Dockerfile1.2%
  • Other1.4%

[8]ページ先頭

©2009-2025 Movatter.jp