ZiqiaoPeng/SyncTalkPublic

NotificationsYou must be signed in to change notification settings
Fork191
Star1.6k

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

License

View license

1.6k stars 191 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets/image		assets/image
data		data
data_utils		data_utils
demo		demo
freqencoder		freqencoder
gridencoder		gridencoder
model		model
nerf_triplane		nerf_triplane
raymarching		raymarching
scripts		scripts
shencoder		shencoder
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Repository files navigation

SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis [CVPR 2024]

The official repository of the paperSyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Paper |Project Page |Code

Colab notebook demonstration:

A short demo video can be foundhere.

The proposedSyncTalk synthesizes synchronized talking head videos, employing tri-plane hash representations to maintain subject identity. It can generate synchronized lip movements, facial expressions, and stable head poses, and restores hair details to create high-resolution videos.

🔥Try usingSyncTalk_2D to achieve faster and better visual quality.🔥

🔥🔥🔥 News

[2023-11-30] Update arXiv paper.
[2024-03-04] The code and pre-trained model are released.
[2024-03-22] The Google Colab notebook is released.
[2024-04-14] Add Windows support.
[2024-04-28] The preprocessing code is released.
[2024-04-29] Fix bugs: audio encoder, blendshape capture, and face tracker.
[2024-05-24] Introduce torso training to repair double chin.
[2025-06-25] UpdateSyncTalk_2D.

For Windows

Thanks tookgpt, we have launched a Windows integration package, you can downloadSyncTalk-Windows.zip and unzip it, double-clickinference.bat to run the demo.

Download link:Hugging Face ||Baidu Netdisk

For Linux

Installation

Tested on Ubuntu 18.04, Pytorch 1.12.1 and CUDA 11.3.

git clone https://github.com/ZiqiaoPeng/SyncTalk.gitcd SyncTalk

Install dependency

conda create -n synctalk python==3.8.8conda activate synctalkpip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113sudo apt-get install portaudio19-devpip install -r requirements.txtpip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1121/download.htmlpip install tensorflow-gpu==2.8.1pip install ./freqencoderpip install ./shencoderpip install ./gridencoderpip install ./raymarching

If you encounter problems installing PyTorch3D, you can use the following command to install it:

python ./scripts/install_pytorch3d.py

Data Preparation

Pre-trained model

Please place theMay.zip in thedata folder, thetrial_may.zip in themodel folder, and then unzip them.

[New] Process your video

Prepare face-parsing model.

wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth

Prepare the 3DMM model for head pose estimation.

wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npywget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npywget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.objwget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy

Download 3DMM model fromBasel Face Model 2009:

# 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/# 2.  cd data_utils/face_tracking  python convert_BFM.py

Put your video underdata/<ID>/<ID>.mp4, and then run the following command to process the video.
[Note] The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 4-5 min.
```
python data_utils/process.py data/<ID>/<ID>.mp4 --asr ave
```
You can choose to use AVE, DeepSpeech or Hubert. The processed video will be saved in thedata folder.
[Optional] Obtain AU45 for eyes blinking
RunFeatureExtraction inOpenFace, rename and move the output CSV file todata/<ID>/au.csv.
[Note] Since EmoTalk's blendshape capture is not open source, the preprocessing code here is replaced with mediapipe's blendshape capture. But according to some feedback, it doesn't work well, you can choose to replace it with AU45. If you want to compare with SyncTalk, some results from using EmoTalk capture can be obtainedhere and videos fromGeneFace.

Quick Start

Run the evaluation code

python main.py data/May --workspace model/trial_may -O --test --asr_model avepython main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait

“ave” refers to our Audio Visual Encoder, “portrait” signifies pasting the generated face back onto the original image, representing higher quality.

If it runs correctly, you will get the following results.

Setting	PSNR	LPIPS	LMD
SyncTalk (w/o Portrait)	32.201	0.0394	2.822
SyncTalk (Portrait)	37.644	0.0117	2.825

This is for a single subject; the paper reports the average results for multiple subjects.

Inference with target audio

python main.py data/May --workspace model/trial_may -O --test --test_train --asr_model ave --portrait --aud ./demo/test.wav

Please use files with the “.wav” extension for inference, and the inference results will be saved in “model/trial_may/results/”. If do not use Audio Visual Encoder, replace wav with the npy file path.

DeepSpeech

python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav# save to data/<name>.npy

HuBERT

# Borrowed from GeneFace. English pre-trained.python data_utils/hubert.py --wav data/<name>.wav# save to data/<name>_hu.npy

Train

# by default, we load data from disk on the fly.# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.# `--preload 0`: load from disk (default, slower).# `--preload 1`: load to CPU (slightly slower)# `--preload 2`: load to GPU (fast)python main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model avepython main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model ave# or you can use the script to trainsh ./scripts/train_may.sh

[Tips] Audio visual encoder (AVE) is suitable for characters with accurate lip sync and large lip movements such as May and Shaheen. Using AVE in the inference stage can achieve more accurate lip sync. If your training results show lip jitter, please try using deepspeech or hubert model as audio feature encoder.

# Use deepspeech modelpython main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model deepspeechpython main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model deepspeech# Use hubert modelpython main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model hubertpython main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model hubert

If you want to use the OpenFace au45 as the eye parameter, please add "--au45" to the command line.

# Use OpenFace AU45python main.py data/May --workspace model/trial_may -O --iters 60000 --asr_model ave --au45python main.py data/May --workspace model/trial_may -O --iters 100000 --finetune_lips --patch_size 64 --asr_model ave --au45

Test

python main.py data/May --workspace model/trial_may -O --test --asr_model ave --portrait

Train & Test Torso [Repair Double Chin]

If your character trained only the head appeared double chin problem, you can introduce torso training. By training the torso, this problem can be solved, butyou will not be able to use the "--portrait" mode. If you add "--portrait", the torso model will fail!

# Train# <head>.pth should be the latest checkpoint in trial_maypython main.py data/May/ --workspace model/trial_may_torso/ -O --torso --head_ckpt<head>.pth --iters 150000 --asr_model ave# For examplepython main.py data/May/ --workspace model/trial_may_torso/ -O --torso --head_ckpt model/trial_may/ngp_ep0019.pth --iters 150000 --asr_model ave# Testpython main.py data/May --workspace model/trial_may_torso -O  --torso --test --asr_model ave# not support --portrait# Inference with target audiopython main.py data/May --workspace model/trial_may_torso -O  --torso --test --test_train --asr_model ave --aud ./demo/test.wav# not support --portrait

Citation

@inproceedings{peng2024synctalk,  title={Synctalk: The devil is in the synchronization for talking head synthesis},  author={Peng, Ziqiao and Hu, Wentao and Shi, Yue and Zhu, Xiangyu and Zhang, Xiaomei and Zhao, Hao and He, Jun and Liu, Hongyan and Fan, Zhaoxin},  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},  pages={666--676},  year={2024}}

Acknowledgement

This code is developed heavily relying onER-NeRF, and alsoRAD-NeRF,GeneFace,DFRF,DFA-NeRF,AD-NeRF, andDeep3DFaceRecon_pytorch.

Thanks for these great projects. Thanks toTiandishihua for helping us fix the bug that loss equals NaN.

Disclaimer

By using the "SyncTalk", users agree to comply with all applicable laws and regulations, and acknowledge that misuse of the software, including the creation or distribution of harmful content, is strictly prohibited. The developers of the software disclaim all liability for any direct, indirect, or consequential damages arising from the use or misuse of the software.

About

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

ziqiaopeng.github.io/synctalk/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

ZiqiaoPeng/SyncTalk

Folders and files

Latest commit

History

Repository files navigation

SyncTalk: The Devil😈 is in the Synchronization for Talking Head Synthesis [CVPR 2024]

🔥🔥🔥 News

For Windows

For Linux

Installation

Install dependency

Data Preparation

Pre-trained model

[New] Process your video

Quick Start

Run the evaluation code

Inference with target audio

Train

Test

Train & Test Torso [Repair Double Chin]

Citation

Acknowledgement

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages