Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

so-vits-svc fork with realtime support, improved interface and more features.

License

NotificationsYou must be signed in to change notification settings

voicepaw/so-vits-svc-fork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

简体中文

CI StatusDocumentation StatusTest coverage percentage

uvRuffpre-commit

PyPI VersionSupported Python versionsLicense

A fork ofso-vits-svc withrealtime support andgreatly improved interface. Based on branch4.0 (v1) (or4.1) and the models are compatible.4.1 models are not supported. Other models are also not supported.

No Longer Maintained

Reasons

  • Within a year, the technology has evolved enormously and there are many better alternatives
  • Was hoping to create a more Modular, easy-to-install repository, but didn't have the skills, time, money to do so
  • PySimpleGUI is no longer LGPL
  • Using Typer is getting more popular than directly using Click

Alternatives

Always beware of the very few influencers who arequite overly surprised about any new project/technology. You need to take every social networking post with semi-doubt.

The voice changer boom that occurred in 2023 has come to an end, and many developers, not just those in this repository, have been not very active for a while.

There are too many alternatives to list here but:

Elsewhere, several start-ups have improved and marketed voice changers (probably for profit).

Updates to this repository have been limited to maintenance since Spring 2023.It is difficult to narrow the list of alternatives here, but please consider trying other projects if you are looking for a voice changer with even better performance (especially in terms of latency other than quality). >However, this project may be ideal for those who want to try out voice conversion for the moment (because it is easy to install).

Features not available in the original repo

  • Realtime voice conversion (enhanced in v1.1.0)
  • Partially integratesQuickVC
  • Fixed misuse ofContentVec in the original repository.1
  • More accurate pitch estimation usingCREPE.
  • GUI and unified CLI available
  • ~2x faster training
  • Ready to use just by installing withpip.
  • Automatically download pretrained models. No need to installfairseq.
  • Code completely formatted with black, isort, autoflake etc.

Installation

Option 1. One click easy installation

Download .bat

This BAT file will automatically perform the steps described below.

Option 2. Manual installation (using pipx, experimental)

1. Installing pipx

Windows (development version required due topypa/pipx#940):

py -3 -m pip install --user git+https://github.com/pypa/pipx.gitpy -3 -m pipx ensurepath

Linux/MacOS:

python -m pip install --user pipxpython -m pipx ensurepath

2. Installing so-vits-svc-fork

pipx install so-vits-svc-fork --python=3.11pipx inject so-vits-svc-fork torch torchaudio --pip-args="--upgrade" --index-url=https://download.pytorch.org/whl/cu121# https://download.pytorch.org/whl/nightly/cu121

Option 3. Manual installation

Creating a virtual environment

Windows:

py -3.11 -m venv venvvenv\Scripts\activate

Linux/MacOS:

python3.11 -m venv venvsource venv/bin/activate

Anaconda:

conda create -n so-vits-svc-fork python=3.11 pipconda activate so-vits-svc-fork

Installing without creating a virtual environment may cause aPermissionError if Python is installed in Program Files, etc.

Install this via pip (or your favourite package manager that uses pip):

python -m pip install -U pip setuptools wheelpip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121# https://download.pytorch.org/whl/nightly/cu121pip install -U so-vits-svc-fork
Notes
  • If no GPU is available or using MacOS, simply removepip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121. MPS is probably supported.
  • If you are using an AMD GPU on Linux, replace--index-url https://download.pytorch.org/whl/cu121 with--index-url https://download.pytorch.org/whl/nightly/rocm5.7. AMD GPUs are not supported on Windows (#120).

Update

Please update this package regularly to get the latest features and bug fixes.

pip install -U so-vits-svc-fork# pipx upgrade so-vits-svc-fork

Usage

Inference

GUI

GUI

GUI launches with the following command:

svcg

CLI

  • Realtime (from microphone)
svc vc
  • File
svc infer source.wav

Pretrained models are available onHugging Face orCIVITAI.

Notes

  • If using WSL, please note that WSL requires additional setup to handle audio and the GUI will not work without finding an audio device.
  • In real-time inference, if there is noise on the inputs, the HuBERT model will react to those as well. Consider using realtime noise reduction applications such asRTX Voice in this case.
  • Models other than for 4.0v1 or this repository are not supported.
  • GPU inference requires at least 4 GB of VRAM. If it does not work, try CPU inference as it is fast enough.2

Training

Before training

  • If your dataset has BGM, please remove the BGM using software such asUltimate Vocal Remover.3_HP-Vocal-UVR.pth orUVR-MDX-NET Main is recommended.3
  • If your dataset is a long audio file with a single speaker, usesvc pre-split to split the dataset into multiple files (usinglibrosa).
  • If your dataset is a long audio file with multiple speakers, usesvc pre-sd to split the dataset into multiple files (usingpyannote.audio). Further manual classification may be necessary due to accuracy issues. If speakers speak with a variety of speech styles, set --min-speakers larger than the actual number of speakers. Due to unresolved dependencies, please installpyannote.audio manually:pip install pyannote-audio.
  • To manually classify audio files,svc pre-classify is available. Up and down arrow keys can be used to change the playback speed.

Cloud

Open In ColabOpen In PaperspacePaperspace Referral4

If you do not have access to a GPU with more than 10 GB of VRAM, the free plan of Google Colab is recommended for light users and the Pro/Growth plan of Paperspace is recommended for heavy users. Conversely, if you have access to a high-end GPU, the use of cloud services is not recommended.

Local

Place your dataset likedataset_raw/{speaker_id}/**/{wav_file}.{any_format} (subfolders and non-ASCII filenames are acceptable) and run:

svc pre-resamplesvc pre-configsvc pre-hubertsvc train -t

Notes

  • Dataset audio duration per file should be <~ 10s.
  • Need at least 4GB of VRAM.5
  • It is recommended to increase thebatch_size as much as possible inconfig.json before thetrain command to match the VRAM capacity. Settingbatch_size toauto-{init_batch_size}-{max_n_trials} (or simplyauto) will automatically increasebatch_size until OOM error occurs, but may not be useful in some cases.
  • To useCREPE, replacesvc pre-hubert withsvc pre-hubert -fm crepe.
  • To useContentVec correctly, replacesvc pre-config with-t so-vits-svc-4.0v1. Training may take slightly longer because some weights are reset due to reusing legacy initial generator weights.
  • To useMS-iSTFT Decoder, replacesvc pre-config withsvc pre-config -t quickvc.
  • Silence removal and volume normalization are automatically performed (as in the upstream repo) and are not required.
  • If you have trained on a large, copyright-free dataset, consider releasing it as an initial model.
  • For further details (e.g. parameters, etc.), you can see theWiki orDiscussions.

Further help

For more details, runsvc -h orsvc <subcommand> -h.

> svc -hUsage: svc [OPTIONS] COMMAND [ARGS]...  so-vits-svc allows any folder structurefor training data.  However, the following folder structure is recommended.      When training: dataset_raw/{speaker_name}/**/{wav_name}.{any_format}      When inference: configs/44k/config.json, logs/44k/G_XXXX.pth  If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.  (The latest model will be automatically loaded.)  To train a model, run pre-resample, pre-config, pre-hubert, train.  To infer a model, run infer.Options:  -h, --help  Show this message and exit.Commands:  clean          Clean up files, only usefulif you are using the default file structure  infer          Inference  onnx           Export model to onnx (currently not working)  pre-classify   Classify multiple audio files into multiple files  pre-config     Preprocessing part 2: config  pre-hubert     Preprocessing part 3: hubert If the HuBERT model is not found, it will be...  pre-resample   Preprocessing part 1: resample  pre-sd         Speech diarization using pyannote.audio  pre-split      Split audio files into multiple files  train          Train model If D_0.pth or G_0.pth not found, automatically download from hub.  train-cluster  Train k-means clustering  vc             Realtime inference from microphone

External Links

Video Tutorial

Contributors ✨

Thanks goes to these wonderful people (emoji key):

34j
34j

💻🤔📖💡🚇🚧👀⚠️📣🐛
GarrettConway
GarrettConway

💻🐛📖👀
BlueAmulet
BlueAmulet

🤔💬💻🚧
ThrowawayAccount01
ThrowawayAccount01

🐛
緋

📖🐛
Lordmau5
Lordmau5

🐛💻🤔🚧💬📓
DL909
DL909

🐛
Satisfy256
Satisfy256

🐛
Pierluigi Zagaria
Pierluigi Zagaria

📓
ruckusmattster
ruckusmattster

🐛
Desuka-art
Desuka-art

🐛
heyfixit
heyfixit

📖
Nerdy Rodent
Nerdy Rodent

📹
谢宇
谢宇

📖
ColdCawfee
ColdCawfee

🐛
sbersier
sbersier

🤔📓🐛
Meldoner
Meldoner

🐛🤔💻
mmodeusher
mmodeusher

🐛
AlonDan
AlonDan

🐛
Likkkez
Likkkez

🐛
Duct Tape Games
Duct Tape Games

🐛
Xianglong He
Xianglong He

🐛
75aosu
75aosu

🐛
tonyco82
tonyco82

🐛
yxlllc
yxlllc

🤔💻
outhipped
outhipped

🐛
escoolioinglesias
escoolioinglesias

🐛📓📹
Blacksingh
Blacksingh

🐛
Mgs. M. Thoyib Antarnusa
Mgs. M. Thoyib Antarnusa

🐛
Exosfeer
Exosfeer

🐛💻
guranon
guranon

🐛🤔💻
Alexander Koumis
Alexander Koumis

💻
acekagami
acekagami

🌍
Highupech
Highupech

🐛
Scorpi
Scorpi

💻
Maximxls
Maximxls

💻
Star3Lord
Star3Lord

🐛💻
Forkoz
Forkoz

🐛💻
Zerui Chen
Zerui Chen

💻🤔
Roee Shenberg
Roee Shenberg

📓🤔💻
Justas
Justas

🐛💻
Onako2
Onako2

📖
4ll0w3v1l
4ll0w3v1l

💻
j5y0V6b
j5y0V6b

🛡️
marcellocirelli
marcellocirelli

🐛
Priyanshu Patel
Priyanshu Patel

💻
Anna Gorshunova
Anna Gorshunova

🐛💻

This project follows theall-contributors specification. Contributions of any kind welcome!

Credits

Copier

This package was created withCopier and thebrowniebroke/pypackage-templateproject template.

Footnotes

  1. #206

  2. #469

  3. https://ytpmv.info/how-to-use-uvr/

  4. If you register a referral code and then add a payment method, you may save about $5 on your first month's monthly billing. Note that both referral rewards are Paperspace credits and not cash. It was a tough decision but inserted because debugging and training the initial model requires a large amount of computing power and the developer is a student.

  5. #456

Sponsor this project

    Packages

    No packages published

    Contributors33


    [8]ページ先頭

    ©2009-2025 Movatter.jp