audio-generation
Here are 174 public repositories matching this topic...
Language:All
Sort:Most stars
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more. Features: Generate Text, MCP, Audio, Video, Images, Voice Cloning, Distributed, P2P and decentralized inference
- Updated
Feb 20, 2026 - Go
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
- Updated
Feb 11, 2026 - Python
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
- Updated
May 27, 2025 - Python
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
- Updated
Jun 4, 2025 - Python
A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!
- Updated
Feb 19, 2026 - TypeScript
AudioLDM: Generate speech, sound effects, music and beyond, with text.
- Updated
Jun 25, 2025 - Python
A framework for efficient model inference with omni-modality models
- Updated
Feb 20, 2026 - Python
Audio generation using diffusion models, in PyTorch.
- Updated
Jun 12, 2023 - Python
A timeline of the latest AI models for audio generation, starting in 2023!
- Updated
Jan 4, 2024
A fundamental toolkit designed for music, song, and audio generation
- Updated
May 20, 2025 - Python
A family of diffusion models for text-to-audio generation.
- Updated
Jul 29, 2025 - Python
Official PyTorch implementation of BigVGAN (ICLR 2023)
- Updated
Sep 5, 2024 - Python
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
- Updated
Feb 12, 2026 - Python
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
- Updated
Jul 8, 2025
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
- Updated
Feb 20, 2026 - Python
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
- Updated
Jun 5, 2024 - Python
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
- Updated
Jan 25, 2024 - Python
Audio Development Tools (ADT) is a project for advancing sound, speech, and music technologies, featuring components for machine learning, sound synthesis, speech and music generation, signal processing, game audio, digital audio workstations (DAWs), and more.
- Updated
Jul 11, 2025
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
- Updated
Nov 2, 2025
Improve this page
Add a description, image, and links to theaudio-generation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theaudio-generation topic, visit your repo's landing page and select "manage topics."