- Notifications
You must be signed in to change notification settings - Fork81
[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
License
OpenTalker/StyleHEAT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)
We incorporateSadTalker into our framework to support audio-driven talking head. Thanks for their awesome work!
We add a script for pre-processing checkpoints inbash/download.sh
.
We investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.Based on the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities,i.e.,high-resolution video generation, disentangled control by driving video or audio, and flexible face editing.
git clone https://github.com/FeiiYin/StyleHEAT.gitcd StyleHEATconda create -n StyleHEAT python=3.7conda activate StyleHEATpip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.htmlpip install -r requirements
Please directly usingbash bash/download.sh
to pre-process the checkpoints.
Or you can manually download ourpre-trained model and put it in ./checkpoints.
Model | Description |
---|---|
checkpoints/Encoder_e4e.pth | Pre-trained E4E StyleGAN Inversion Encoder. |
checkpoints/hfgi.pth | Pre-trained HFGI StyleGAN Inversion Encoder. |
checkpoints/StyleGAN_e4e.pth | Pre-trained StyleGAN. |
checkpoints/ffhq_pca.pt | StyleGAN editing directions. |
checkpoints/ffhq_PCA.npz | StyleGAN optimization parameters. |
checkpoints/interfacegan_directions/ | StyleGAN editing directions. |
checkpoints/stylegan2_d_256.pth | Pre-trained StyleGAN discriminator. |
checkpoints/model_ir_se50.pth | Pre-trained id-loss discriminator. |
checkpoints/StyleHEAT_visual.pt | Pre-trained StyleHEAT model. |
checkpoints/BFM | 3DMM library. (Note the zip file should be unzipped to BFM/.) |
checkpoints/Deep3D/epoch_20.pth | Pre-trained 3DMM extractor. |
We also provide some example videos along with their corresponding 3dmm parameters invideos.zip.Please unzip and put them indocs/demo/videos/
for later inference.
- Same-Identity Reenactment with a video.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --output_dir=./docs/demo/output --if_extract
- Cross-Identity Reenactment with a single image and a video.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output
The--video_source
and--image_source
can be specified as either a single file or a folder.
For a better inversion result but taking more time, please specify--inversion_option=optimize
and we will optimize the feature latent of StyleGAN-V2.Otherwise we will use HFGI encoder to get the style code and inversion condition with--inversion_option=encode
.
If you need align (crop) images during the inference process, please specify--if_align
.Or you can first align the source images following FFHQ dataset.
If you need to extract the 3dmm parameters of the target video during the inference process, please specify--if_extract
.Or you can first extract the 3dmm parameters with the scriptTODO.sh
and save the 3dmm in the{video_source}/3dmm/3dmm_{video_name}.npy
If you only need to edit the expression without modifying the pose, please specify--edit_expression_only
.
- Intuitive Editing.
python inference.py \ --config configs/inference.yaml \ --image_source=./docs/demo/images/40.jpg \ --inversion_option=optimize \ --intuitive_edit \ --output_dir=./docs/demo/output \ --if_extract
The 3dmm parameters of the images can also be pre-extracted or online-extracted with the parameter--if_extract
.
- Attribute Editing.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/40.jpg \ --attribute_edit --attribute=young \ --cross_id \ --output_dir=./docs/demo/output
The support editable attributes includeyoung
,old
,beard
,lip
.Note to preserve the editing attributes details in W space, the optimized inversion method is banned here.
- Audio Reenactment.
Please first install SadTalker in the folder ofthird_part
as the format ofthird_part/SadTalker
.Download its pre-trained checkpoints according to their instructions.Install the additional libraries withpip install pydub==0.25.1 yacs==0.1.8 librosa==0.6.0 numba==0.48.0 resampy==0.3.1 imageio-ffmpeg==0.4.7
.Then you can run audio reenactment freely.
python inference.py \ --config configs/inference.yaml \ --audio_path=./docs/demo/audios/RD_Radio31_000.wav \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output \ --inversion_option=optimize
- Data preprocessing.
To train the VideoWarper, please followvideo-preprocessingto download and pre-process the VoxCelebA dataset.
To train the whole framework, please followHDTFto download the HDTF dataset and seeHDTF-preprocessing to pre-process the dataset.
Please followPIRenderer to extract the 3DMM parameters and prepare all the data into lmdb files.
Training include 2 stages.
- Train VideoWarper
bash bash/train_video_warper.sh
- Train Video Calibrator
bash bash/train_video_styleheat.sh
Note several path hyper-parameter of dataset need to be modified and then run the script.
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)
If you find this work useful for your research, please cite:
@article{2203.04036, author = {Yin, Fei and Zhang, Yong and Cun, Xiaodong and Cao, Mingdeng and Fan, Yanbo and Wang, Xuan and Bai, Qingyan and Wu, Baoyuan and Wang, Jue and Yang, Yujiu}, title = {StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN}, journal = {arxiv:2203.04036}, year = {2022}}
Thanks toStyleGAN-2,PIRenderer,HFGI,BaberShop,GFP-GAN,Pixel2Style2Pixel,SadTalkerfor sharing their code.
About
[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.