OpenTalker/StyleHEATPublic

NotificationsYou must be signed in to change notification settings
Fork81
Star654

[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation

License

MIT license

654 stars 81 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bash		bash
configs		configs
data		data
dnnlib		dnnlib
docs		docs
loss		loss
models		models
third_part		third_part
trainers		trainers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Repository files navigation

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)

paper |project website

News: Audio-Reenactment Module Incoporated

We incorporateSadTalker into our framework to support audio-driven talking head. Thanks for their awesome work!

We add a script for pre-processing checkpoints inbash/download.sh.

Abstract

We investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.Based on the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities,i.e.,high-resolution video generation, disentangled control by driving video or audio, and flexible face editing.

Environment

git clone https://github.com/FeiiYin/StyleHEAT.gitcd StyleHEATconda create -n StyleHEAT python=3.7conda activate StyleHEATpip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.htmlpip install -r requirements

Quick Start

Pretrained Models

Please directly usingbash bash/download.sh to pre-process the checkpoints.

Or you can manually download ourpre-trained model and put it in ./checkpoints.

Model	Description
checkpoints/Encoder_e4e.pth	Pre-trained E4E StyleGAN Inversion Encoder.
checkpoints/hfgi.pth	Pre-trained HFGI StyleGAN Inversion Encoder.
checkpoints/StyleGAN_e4e.pth	Pre-trained StyleGAN.
checkpoints/ffhq_pca.pt	StyleGAN editing directions.
checkpoints/ffhq_PCA.npz	StyleGAN optimization parameters.
checkpoints/interfacegan_directions/	StyleGAN editing directions.
checkpoints/stylegan2_d_256.pth	Pre-trained StyleGAN discriminator.
checkpoints/model_ir_se50.pth	Pre-trained id-loss discriminator.
checkpoints/StyleHEAT_visual.pt	Pre-trained StyleHEAT model.
checkpoints/BFM	3DMM library. (Note the zip file should be unzipped to BFM/.)
checkpoints/Deep3D/epoch_20.pth	Pre-trained 3DMM extractor.

We also provide some example videos along with their corresponding 3dmm parameters invideos.zip.Please unzip and put them indocs/demo/videos/ for later inference.

Inference

Same-Identity Reenactment with a video.

python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --output_dir=./docs/demo/output --if_extract

Cross-Identity Reenactment with a single image and a video.

python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output

The--video_source and--image_source can be specified as either a single file or a folder.

For a better inversion result but taking more time, please specify--inversion_option=optimizeand we will optimize the feature latent of StyleGAN-V2.Otherwise we will use HFGI encoder to get the style code and inversion condition with--inversion_option=encode.

If you need align (crop) images during the inference process, please specify--if_align.Or you can first align the source images following FFHQ dataset.

If you need to extract the 3dmm parameters of the target video during the inference process, please specify--if_extract.Or you can first extract the 3dmm parameters with the scriptTODO.sh and save the 3dmm in the{video_source}/3dmm/3dmm_{video_name}.npy

If you only need to edit the expression without modifying the pose, please specify--edit_expression_only.

Intuitive Editing.

python inference.py \ --config configs/inference.yaml \ --image_source=./docs/demo/images/40.jpg \ --inversion_option=optimize \ --intuitive_edit \ --output_dir=./docs/demo/output \ --if_extract

The 3dmm parameters of the images can also be pre-extracted or online-extracted with the parameter--if_extract.

Attribute Editing.

python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/40.jpg \ --attribute_edit --attribute=young \ --cross_id \ --output_dir=./docs/demo/output

The support editable attributes includeyoung,old,beard,lip.Note to preserve the editing attributes details in W space, the optimized inversion method is banned here.

Audio Reenactment.

Please first install SadTalker in the folder ofthird_part as the format ofthird_part/SadTalker.Download its pre-trained checkpoints according to their instructions.Install the additional libraries withpip install pydub==0.25.1 yacs==0.1.8 librosa==0.6.0 numba==0.48.0 resampy==0.3.1 imageio-ffmpeg==0.4.7.Then you can run audio reenactment freely.

python inference.py \ --config configs/inference.yaml \ --audio_path=./docs/demo/audios/RD_Radio31_000.wav \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output \ --inversion_option=optimize

Training

Data preprocessing.

To train the VideoWarper, please followvideo-preprocessingto download and pre-process the VoxCelebA dataset.
To train the whole framework, please followHDTFto download the HDTF dataset and seeHDTF-preprocessing to pre-process the dataset.
Please followPIRenderer to extract the 3DMM parameters and prepare all the data into lmdb files.

Training include 2 stages.

Train VideoWarper

bash bash/train_video_warper.sh

Train Video Calibrator

bash bash/train_video_styleheat.sh

Note several path hyper-parameter of dataset need to be modified and then run the script.

🥂 Related Works

Citation

If you find this work useful for your research, please cite:

@article{2203.04036,      author = {Yin, Fei and Zhang, Yong and Cun, Xiaodong and Cao, Mingdeng and Fan, Yanbo and Wang, Xuan and Bai, Qingyan and Wu, Baoyuan and Wang, Jue and Yang, Yujiu},      title = {StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN},       journal = {arxiv:2203.04036},        year = {2022}}

Acknowledgement

Thanks toStyleGAN-2,PIRenderer,HFGI,BaberShop,GFP-GAN,Pixel2Style2Pixel,SadTalkerfor sharing their code.

About

[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation

Releases1

v1.0 Latest

Mar 24, 2023

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)

News: Audio-Reenactment Module Incoporated

Abstract

Environment

Quick Start

Pretrained Models

Inference

Training

🥂 Related Works

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Uh oh!

Languages

Movatterモバイル変換

License

OpenTalker/StyleHEAT

Folders and files

Latest commit

History

Repository files navigation

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)

News: Audio-Reenactment Module Incoporated

Abstract

Environment

Quick Start

Pretrained Models

Inference

Training

🥂 Related Works

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Languages

Packages