Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

License

NotificationsYou must be signed in to change notification settings

abus-aikorea/voice-pro

Repository files navigation

The best AI speech recognition, translation, and multilingual dubbing solution 🚀

youtubeAmazonShopifyBuy Me a Coffeerelease

Dubbing Studio

🎙️ An AI-powered web application for speech recognition, translation, and dubbing

South Korea Flag 한국어United Kingdom Flag EnglishChina Flag 中文简体Taiwan Flag 中文繁體Japan Flag 日本語Germany Flag DeutschSpain Flag EspañolPortugal Flag Português

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.

  • 🔊 Top-tier speech recognition:Whisper,Faster-Whisper,Whisper-Timestamped,WhisperX
  • 🎤 Zero-shot voice cloning:F5-TTS,E2-TTS,CosyVoice
  • 📢 Multilingual text-to-speech:Edge-TTS,kokoro (Paid version includesAzure TTS)
  • 🎥 YouTube processing & audio extraction:yt-dlp
  • 🌍 Instant translation for 100+ languages:Deep-Translator (Paid version includesAzure Translator)

A robust alternative toElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.

⚠️ Please Note

  • Upgrading from v2.x to v3.x: Not possible. We recommend deleting theinstaller_files folder and running the latest version ofstart.bat.
  • Upgrading from v3.x to v3.x: Possible. After downloading the latest code, runupdate.bat.
  • First-time users: Please refer to the installation instructions below.
  • Troubleshooting: In most cases, issues can be resolved by deleting theinstaller_files folder and then runningconfigure.bat followed bystart.bat.
  • 🎁Free Activation Key Request: Please fill out thisGoogle Form to receive your activation key. Activation keys are limited to one per email address.
  • 🏆Request for Additional Activation Keys: Create amazing content using Voice-Pro. Please share the link to your post in theDiscussion. We will gladly reward your contributions.

📰 News & History

version 3.0
  • 🔥 Removed theAI Cover feature.
  • 🚀 Added support form-bain/whisperX.
version 2.0
  • 🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
  • 🆓 Free trial supports media up to60 seconds in length.
  • 🔥 Added theAI Cover feature.
  • 🎤 Introduced support forCosyVoice andkokoro.
  • ⏳ Initial run downloadsCozyVoice2-0.5B (9GB), which may take over an hour depending on network speed.
  • 🎧 Voice samples for cloning will be continuously updated.
  • 📝 AddedspaCy for natural sentence-by-sentence translation and TTS.
  • ☁️ Subscription version includesMicrosoft Azure Translator and TTS.
  • 🏪 Subscription offersunlimited usage (no 60-second limit) during the subscription period, available viaShopify.

▶️ Demos

Dubbing Studio Tab: Transcription, Translation & TTS

demo-short001.mp4

Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.

F5-TTS-Multi Tab: Podcast Creation

f5-tts-demo-elon-zuckerberg-1115-3.mp4

Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.

Live Translation Tab: Real-Time Recognition & Translation

voice-pro-demo-v1.5.7-h264-1080p-live.mp4

Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.

⭐ Key Features

1. Dubbing Studio

  • YouTube video downloads & audio extraction
  • Voice separation withDemucs
  • Supports 100+ languages for speech recognition & translation

2. Speech Technologies

  • Speech-to-Text:Whisper,Faster-Whisper,Whisper-Timestamped,WhisperX
  • Text-to-Speech:
    • Edge-TTS: 100+ languages, 400+ voices
    • E2-TTS,F5-TTS,CosyVoice: Zero-shot cloning
    • kokoro: Ranked #2 in HuggingFace TTS Arena

3. Real-Time Translation

  • Instant speech recognition
  • Multilingual translation on the fly
  • Customizable audio inputs

🤖 WebUI

Dubbing Studio Tab

  • All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
  • Supports all ffmpeg-compatible formats
  • Output options: WAV, FLAC, MP3
  • Subtitles & recognition for 100+ languages
  • TTS with speed, volume, & pitch controls

Multilingual Voice Conversion and Subtitle Generation Web UI Interface

Whisper Caption Tab

  • Subtitle-focused: 90+ languages
  • Video-integrated subtitle display
  • Word-level highlighting & denoise options

Translate Tab

  • Translation for 100+ languages
  • Supports subtitle files (ASS, SSA, SRT, etc.)
  • Real-time voice recognition & translation

WebUI for Real-Time Speech Recognition and Translation

Speech Generation Tab

  • Options:Edge-TTS,F5-TTS,CosyVoice,kokoro
  • Celeb voice podcasts & multilingual support

Podcast Production WebUI Using Voice-Cloning Technology

🎤✨ Reference Voice

  • Please request the voice you want to add on the Issues page.Issues
English

Andrew Bustamante

Andrew Huberman

Avi Loeb

Ben Shapiro

Brett Johnson

Brian Keating

Coffeezilla

Dan Carlin

David Buss

David Fravor

David Kipping

Dennis Whyte

Donald Hoffman

Donald Trump

Douglas Murray

Duncan Trussell

Elon Musk

Garry Nolan

Jack Barsky

James Sexton

Jeff Bezos

Joe Rogan

John Mearsheimer

Jordan Peterson

Kanye 'Ye' West

Mark Zuckerberg

Michael Levin

Michael Saylor

Michio Kaku

MrBeast

Nick Lane

Paul Rosolie

Ryan Graves

Sam Altman

Sam Harris

Stephen Wolfram

Tucker Carlson

Vitalik Buterin

Yuval Harari
Chinese

迪丽热巴 (Dílì Rèbā)

蔡依林 (Cài Yīlín)

吴亦凡 (Wú Yìfán)

李易峰 (Lǐ Yìfēng)

杨幂 (Yáng Mì)

赵丽颖 (Zhào Lìyǐng)
Korean

BTS 진 (Jin)

BTS RM

IU (아이유)

이병헌

이정재

유재석
Japanese

綾瀬はるか (Ayase Haruka)

💻 System Requirements

  • OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
  • GPU: NVIDIA with CUDA 12.4 (recommended)
  • VRAM: 4GB+ (8GB+ preferred)
  • RAM: 4GB+
  • Storage: 20GB+ free space
  • Internet: Required

📀 Installation

Install Voice-Pro with ease usingconfigure.bat andstart.bat.

1. Get the Package

  • Clone or download the latest release (Source code (zip)) fromGitHub Release
git clone https://github.com/abus-aikorea/voice-pro.git

2. Install & Run

  1. 🚀configure.bat
    • Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
    • Run once; takes 1+ hour with internet
    • Don’t close the command window
  2. 🚀start.bat
    • Launches Voice-Pro WebUI
    • First run installs dependencies (1+ hour)
    • Retry after deletinginstaller_files if issues arise

3. Update

  • 🚀update.bat: Refreshes Python environment (faster than reinstall)

4. Uninstall

  • Rununinstall.bat or delete the folder (portable install)

❓Tips & Tricks

If Browser does not run automatically

  • Close the Windows-Commnad window and run start.bat again.
  • Run the browser directly and enter the address displayed in the Windows-Command window (e.g.http://127.0.0.1:7870) in the address bar.

If a CUDA Out-Of-Memory error occurs

  • Check the GPU memory status in Windows Task Manager - Performance tab.
  • Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
  • Set Compute Type to int type. The float type has better quality, but requires more GPU memory.

How to improve the quality of subtitles?

  • The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
  • Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
  • If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.

🚨 Notice

  • This repository offers afree trial of Voice-Pro.
  • The free trial version of Voice-Pro allows you to process up to60 seconds of media.
  • The subscription version supports Microsoft Azure TTS and Translator. Purchase it onShopify.
Trial Version☕Contributor VersionSubscription Version
Media Length Limit60 secondsUnlimitedUnlimited
Translation ServiceGoogle Translate (Open Source)Google Translate (Open Source)Azure Translate (Microsoft)
Text-to-Speech ServiceEdge TTS (Open Source)Edge TTS (Open Source)Azure TTS (Microsoft)

☕ Contributions

Hello, I'm David from the Voice-Pro team.Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently.We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content.

Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team.

Thank you,ABUS Customer Service

  • If you want to participate in and help us with this project, feel free to create anIssues
  • If something goes wrong, please submit aPull requests to improve this project.
  • Any type of contribution is welcome.
  • For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. (abus.aikorea@gmail.com)."
  • If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐
  • You can support Voice-Pro with a donation here:

"Buy Me A Coffee"

📬 Contact

👍 YouTube

🙏 Credits

©️ Copyright

byABUS

About

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

    Packages

    No packages published

    [8]ページ先頭

    ©2009-2025 Movatter.jp