Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

License

NotificationsYou must be signed in to change notification settings

1008610010/Realtime-Voice-Clone-Chinese

 
 

Repository files navigation

WechatIMG2968

MIT License

This repository is forked fromReal-Time-Voice-Cloning which only support English.

English |中文

Features

🌍Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, magicdata

🤩PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060

🌍Windows + Linux tested in both Windows OS and linux OS after fixing nits

🤩Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder

Quick Start

1. Install Requirements

Follow the original repo to test if you got all environment ready.**Python 3.7 or higher ** is needed to run the toolbox.

If you get anERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu102 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2 ) This error is probably due to a low version of python, try using 3.9 and it will install successfully

  • Installffmpeg.
  • Runpip install -r requirements.txt to install the remaining necessary packages.

Note that we are using the pretrained encoder/vocoder but synthesizer, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.

2. Train synthesizer with your dataset

  • Download aidatatang_200zh or SLR68 dataset and unzip: make sure you can access all .wav intrain folder

  • Preprocess with the audios and the mel spectrograms:python synthesizer_preprocess_audio.py <datasets_root>Allow parameter--dataset {dataset} to support adatatang_200zh, magicdata

  • Preprocess the embeddings:python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer

  • Train the synthesizer:python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer

  • Go to next step when you see attention line show and loss meet your need in training foldersynthesizer/saved_models/.

FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.attention_step_20500_sample_1step-135500-mel-spectrogram_sample_1

2.2 Use pretrained model of synthesizer

Thanks to the community, some models will be shared:

authorDownload linkPreviow Video
@mivenhttps://pan.baidu.com/s/1PI-hM3sn5wbeChRryX-RCQ code:2021https://www.bilibili.com/video/BV1uh411B7AD/

A link to my early trained model:Baidu YunCode:aid4

3. Launch the Toolbox

You can then try the toolbox:

python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py

Good news🤩: Chinese Characters are supported

TODO

  • Add demo video
  • Add support for more dataset
  • Upload pretrained model
  • Support parallel tacotron
  • Service orianted and docterize
  • 🙏 Welcome to add more

About

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python100.0%

[8]ページ先頭

©2009-2025 Movatter.jp