speech-synthesis

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr speech-translation speaker-diariazation generative-ai

UpdatedDec 17, 2025
Python

NVIDIA /DeepLearningExamples

Star14.7k

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

nlp translation computer-vision deep-learning mxnet tensorflow pytorch speech-synthesis speech-recognition forecasting drug-discovery recommender-systems paddlepaddle tensorflow2 large-language-models

UpdatedAug 12, 2024
Jupyter Notebook

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

tts speech-synthesis transformer voice-recognition speech-recognition whisper asr vocoder conformer sound-classification kws self-supervised-learning code-switch voice-cloning speech-translation punctuation-restoration wav2vec2 streaming-asr speech-alignment streaming-tts

UpdatedOct 20, 2025
Python

rhasspy /piper

Star10.3k

A fast, local neural text to speech system

text-to-speech tts speech-synthesis

UpdatedAug 26, 2025
C++

espnet /espnet

Star9.6k

End-to-End Speech Processing Toolkit

text-to-speech deep-learning chainer end-to-end machine-translation pytorch speech-synthesis speech-recognition kaldi voice-conversion speaker-diarization speech-separation speech-enhancement spoken-language-understanding speech-translation singing-voice-synthesis

UpdatedDec 16, 2025
Python

rany2 /edge-tts

Sponsor

Star9.6k

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

text-to-speech tts speech-synthesis

UpdatedDec 12, 2025
Python

open-mmlab /Amphion

Star9.6k

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

text-to-speech audit speech-synthesis audio-synthesis music-generation voice-conversion vocoder emilia text-to-audio fastspeech2 vits audio-generation singing-voice-conversion vall-e audioldm naturalspeech2 maskgct

UpdatedMay 27, 2025
Python

voicepaw /so-vits-svc-fork

Star9.2k

so-vits-svc fork with realtime support, improved interface and more features.

lightning deep-learning realtime pytorch speech-synthesis gan voice-conversion voice-changer pytorch-lightning hubert vits sovits so-vits-svc softvc contentvec

UpdatedDec 16, 2025
Python

netease-youdao /EmotiVoice

Star8.4k

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

python text-to-speech ai deep-learning style prompt speech emotion pytorch tts speech-synthesis multi-speaker emotivoice

UpdatedAug 13, 2024
Python

jaywalnut310 /vits

Star7.8k

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

text-to-speech deep-learning pytorch tts speech-synthesis

UpdatedDec 6, 2023
Python

yl4579 /StyleTTS2

Star6.1k

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

text-to-speech deep-learning pytorch tts speech-synthesis gan speaker-adaptation adversarial-training diffusion-models wavlm latent-diffusion latent-diffusion-models

UpdatedAug 10, 2024
Python

espeak-ng /espeak-ng

Star5.9k

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

android text-to-speech speech-synthesis espeak espeak-ng

UpdatedDec 15, 2025
C

snakers4 /silero-models

Star5.7k

Silero Models: pre-trained text-to-speech models made embarrassingly simple

text-to-speech speech pytorch tts speech-synthesis colab armenian russian speech-to-text ukrainian pretrained-models georgian belarus kyrgyz uzbek kazakh azerbaijani tajik tts-models torch-hub

UpdatedDec 5, 2025
Jupyter Notebook

abus-aikorea /voice-pro

Star5.2k

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.