- Notifications
You must be signed in to change notification settings - Fork1.4k
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
License
NVIDIA/tacotron2
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PyTorch implementation ofNatural TTS Synthesis By ConditioningWavenet On Mel Spectrogram Predictions.
This implementation includesdistributed andautomatic mixed precision supportand uses theLJSpeech dataset.
Distributed and Automatic Mixed Precision support relies on NVIDIA'sApex andAMP.
Visit ourwebsite for audio samples using our publishedTacotron 2 andWaveGlow models.
- NVIDIA GPU + CUDA cuDNN
- Download and extract theLJ Speech dataset
- Clone this repo:
git clone https://github.com/NVIDIA/tacotron2.git
- CD into this repo:
cd tacotron2
- Initialize submodule:
git submodule init; git submodule update
- Update .wav paths:
sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt
- Alternatively, set
load_mel_from_disk=True
inhparams.py
and update mel-spectrogram paths
- Alternatively, set
- InstallPyTorch 1.0
- InstallApex
- Install python requirements or build docker image
- Install python requirements:
pip install -r requirements.txt
- Install python requirements:
python train.py --output_directory=outdir --log_directory=logdir
- (OPTIONAL)
tensorboard --logdir=outdir/logdir
Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers areignored
- Download our publishedTacotron 2 model
python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start
python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True
- Download our publishedTacotron 2 model
- Download our publishedWaveGlow model
jupyter notebook --ip=127.0.0.1 --port=31337
- Load inference.ipynb
N.b. When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2and the Mel decoder were trained on the same mel-spectrogram representation.
WaveGlow Faster than real time Flow-basedGenerative Network for Speech Synthesis
nv-wavenet Faster than real timeWaveNet.
This implementation uses code from the following repos:KeithIto,PremSeetharaman as described in our code.
We are inspired byRyuchi Yamamoto'sTacotron PyTorch implementation.
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, YuxuanWang and Zongheng Yang.