Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

License

NotificationsYou must be signed in to change notification settings

NVIDIA/tacotron2

Repository files navigation

PyTorch implementation ofNatural TTS Synthesis By ConditioningWavenet On Mel Spectrogram Predictions.

This implementation includesdistributed andautomatic mixed precision supportand uses theLJSpeech dataset.

Distributed and Automatic Mixed Precision support relies on NVIDIA'sApex andAMP.

Visit ourwebsite for audio samples using our publishedTacotron 2 andWaveGlow models.

Alignment, Predicted Mel Spectrogram, Target Mel Spectrogram

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Setup

  1. Download and extract theLJ Speech dataset
  2. Clone this repo:git clone https://github.com/NVIDIA/tacotron2.git
  3. CD into this repo:cd tacotron2
  4. Initialize submodule:git submodule init; git submodule update
  5. Update .wav paths:sed -i -- 's,DUMMY,ljs_dataset_folder/wavs,g' filelists/*.txt
    • Alternatively, setload_mel_from_disk=True inhparams.py and update mel-spectrogram paths
  6. InstallPyTorch 1.0
  7. InstallApex
  8. Install python requirements or build docker image
    • Install python requirements:pip install -r requirements.txt

Training

  1. python train.py --output_directory=outdir --log_directory=logdir
  2. (OPTIONAL)tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the dataset dependent text embedding layers areignored

  1. Download our publishedTacotron 2 model
  2. python train.py --output_directory=outdir --log_directory=logdir -c tacotron2_statedict.pt --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

  1. python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Inference demo

  1. Download our publishedTacotron 2 model
  2. Download our publishedWaveGlow model
  3. jupyter notebook --ip=127.0.0.1 --port=31337
  4. Load inference.ipynb

N.b. When performing Mel-Spectrogram to Audio synthesis, make sure Tacotron 2and the Mel decoder were trained on the same mel-spectrogram representation.

Related repos

WaveGlow Faster than real time Flow-basedGenerative Network for Speech Synthesis

nv-wavenet Faster than real timeWaveNet.

Acknowledgements

This implementation uses code from the following repos:KeithIto,PremSeetharaman as described in our code.

We are inspired byRyuchi Yamamoto'sTacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, YuxuanWang and Zongheng Yang.

About

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp