geneing/WaveRNN-PytorchPublic

forked fromG-Wang/WaveRNN-Pytorch

NotificationsYou must be signed in to change notification settings
Fork37
Star132

Fatcord's Alternative WaveRNN (Faster training)

License

MIT license

132 stars 72 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
docker		docker
inputs		inputs
library/src		library/src
model_outputs		model_outputs
.gitignore		.gitignore
.gitmodules		.gitmodules
Description.md		Description.md
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
convert_model.py		convert_model.py
dataset.py		dataset.py
distributions.py		distributions.py
hparams.py		hparams.py
loss_function.py		loss_function.py
lrschedule.py		lrschedule.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
test_wavernnvocoder.py		test_wavernnvocoder.py
train.py		train.py
utils.py		utils.py

Repository files navigation

WaveRNN-Pytorch

This repository is a fork of Fatcord'sAlternative WaveRNN implementation.The original model has been significantly simplified to allow real-time synthesis of high fidelity speech. This repositoryalso contains a C++ library that can be used for real-time speech synthesis on a single CPU core.

WaveRNN-Pytorch is avocoder - it converts from speech features (i.e. mel spectrograms) to speech sound. On can build acomplete text-to-speech pipeline by using, for example, Tacotron-2 to turn text into speech features, and then use thisvocoder to produce a sound file.

Highlights

10 bit quantized wav modeling for higher quality
Weight pruning for reducing model complexity
Fast, CPU only, C++ inference library running faster than real time on modern cpu.
Compressed pruned weight format to make weight files small
Python bindings for the C++ library
Can be used with a Tacotron-2 implementation for TTS.

Planned

Real time inference on modern ARM processors (e.g. inference on smartphone for high quality TTS)

Audio Samples

See Wiki

Pretrained Checkpoints

See "model_outputs" directory

Requirements

Training:

Python 3
CUDA >=8.0
PyTorch >= v1.0
Python requirements:

pip install -r requirements.txt

sudo aptitude install libsoundtouch-dev

C++ library

cmake, gcc, etc
Eigen3 development files

apt-get install libeigen3-dev

pybind11https://github.com/pybind/pybind11

Installation

Ensure above requirements are met.

git clone https://github.com/geneing/WaveRNN-Pytorch.gitcd WaveRNN-Pytorchpip3 install -r requirements.txt

Build C++ library

cd librarymkdir buildcd buildcmake ../srcmakecp WaveRNNVocoder*.so python_install_directory

Usage

1. Adjusting Hyperparameters

Before running scripts, one can adjust hyperparameters inhparams.py.

Some hyperparameters that you might want to adjust:

input_type (best performing ones are currentlybits andraw, seehparams.py for more details)
batch_size - depends on your GPU memory. For 8GB memory, you should use batch_size=64
save_every_step (checkpoint saving frequency)
evaluate_every_step (evaluation frequency)
seq_len_factor (sequence length of training audio, the longer the more GPU it takes)

2. Preprocessing

Using TTS preprocessing

If you are planning to use this vocoder together with a TTS network (e.g. Tacotron-2) you should train on exactly the same data.Each implementation of TTS network uses slightly different definition of "mel-spectrogram". I recommend using TTS preoprocessing.

This code has been tested withTacotron-2 and M-AILABS dataset. Example:

cd Tacotron-2python3 preprocess.py --dataset='M-AILABS' --language='en_US' --voice='female' --reader='mary_ann' --merge_books=True --output training_data

Using WaveRNN-Pytorch preprocessing

If you are using vocoder as standalone library you can use native preprocessing.This function processes raw wav files into corresponding mel-spectrogram and wav files according to the audio processing hyperparameters.

Example usage:

python3 preprocess.py --output_dir training_data /path/to/my/wav/files

This will process all the.wav files in the folder/path/to/my/wav/files and save them in the default local directory calleddata_dir.

3. Training

Start training process. checkpoints are by default stored in the local directorycheckpoints.The script will automatically save a checkpoint when terminated bycrtl + c.

Example 1: starting a new model for training from Tacotron-2 data

python3 train.py --dataset Tacotron training_data

training_data is the directory containing the processed files.

Example 2: starting a new model for training

python3 train.py --dataset Audiobooks training_data

Example 3: Restoring training from checkpoint

python3 train.py training_data --checkpoint=checkpoints/checkpoint0010000.pth

Evaluation.wav files and plots are saved incheckpoints/eval.

4. Converting model for C++ library

First you need to train the model for at least (hparams.start_prune+hparams.prune_steps) stepsto ensure that the model is properly pruned.

In order to use C++ library you need to convert the trained network to compressed model format.

python3 convert_model.py --output-dir model_outputs checkpoints/checkpoint_step000400000.pth

Example 1: Use python3 interface to the C++ library

import WaveRNNVocoderimport numpy as npvocoder=WaveRNNVocoder.Vocoder()vocoder.loadWeights('model_outputs/model.bin')mel = np.load(fname) #make sure that mel.shape[0] == hparams.num_melswav = vocoder.melToWav(mel)

About

Fatcord's Alternative WaveRNN (Faster training)

Releases

No releases published

Packages

No packages published

Languages

Python53.0%
C++44.4%
Dockerfile1.2%
Other1.4%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

WaveRNN-Pytorch

Highlights

Planned

Audio Samples

Pretrained Checkpoints

Requirements

Training:

C++ library

Installation

Build C++ library

Usage

1. Adjusting Hyperparameters

2. Preprocessing

Using TTS preprocessing

Using WaveRNN-Pytorch preprocessing

3. Training

4. Converting model for C++ library

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

geneing/WaveRNN-Pytorch

Folders and files

Latest commit

History

Repository files navigation

WaveRNN-Pytorch

Highlights

Planned

Audio Samples

Pretrained Checkpoints

Requirements

Training:

C++ library

Installation

Build C++ library

Usage

1. Adjusting Hyperparameters

2. Preprocessing

Using TTS preprocessing

Using WaveRNN-Pytorch preprocessing

3. Training

4. Converting model for C++ library

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages