- Notifications
You must be signed in to change notification settings - Fork80
[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
License
OpenTalker/StyleHEAT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN (ECCV 2022)
We incorporateSadTalker into our framework to support audio-driven talking head. Thanks for their awesome work!
We add a script for pre-processing checkpoints inbash/download.sh
.
We investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.Based on the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities,i.e.,high-resolution video generation, disentangled control by driving video or audio, and flexible face editing.
git clone https://github.com/FeiiYin/StyleHEAT.gitcd StyleHEATconda create -n StyleHEAT python=3.7conda activate StyleHEATpip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.htmlpip install -r requirements
Please directly usingbash bash/download.sh
to pre-process the checkpoints.
Or you can manually download ourpre-trained model and put it in ./checkpoints.
Model | Description |
---|---|
checkpoints/Encoder_e4e.pth | Pre-trained E4E StyleGAN Inversion Encoder. |
checkpoints/hfgi.pth | Pre-trained HFGI StyleGAN Inversion Encoder. |
checkpoints/StyleGAN_e4e.pth | Pre-trained StyleGAN. |
checkpoints/ffhq_pca.pt | StyleGAN editing directions. |
checkpoints/ffhq_PCA.npz | StyleGAN optimization parameters. |
checkpoints/interfacegan_directions/ | StyleGAN editing directions. |
checkpoints/stylegan2_d_256.pth | Pre-trained StyleGAN discriminator. |
checkpoints/model_ir_se50.pth | Pre-trained id-loss discriminator. |
checkpoints/StyleHEAT_visual.pt | Pre-trained StyleHEAT model. |
checkpoints/BFM | 3DMM library. (Note the zip file should be unzipped to BFM/.) |
checkpoints/Deep3D/epoch_20.pth | Pre-trained 3DMM extractor. |
We also provide some example videos along with their corresponding 3dmm parameters invideos.zip.Please unzip and put them indocs/demo/videos/
for later inference.
- Same-Identity Reenactment with a video.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --output_dir=./docs/demo/output --if_extract
- Cross-Identity Reenactment with a single image and a video.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output
The--video_source
and--image_source
can be specified as either a single file or a folder.
For a better inversion result but taking more time, please specify--inversion_option=optimize
and we will optimize the feature latent of StyleGAN-V2.Otherwise we will use HFGI encoder to get the style code and inversion condition with--inversion_option=encode
.
If you need align (crop) images during the inference process, please specify--if_align
.Or you can first align the source images following FFHQ dataset.
If you need to extract the 3dmm parameters of the target video during the inference process, please specify--if_extract
.Or you can first extract the 3dmm parameters with the scriptTODO.sh
and save the 3dmm in the{video_source}/3dmm/3dmm_{video_name}.npy
If you only need to edit the expression without modifying the pose, please specify--edit_expression_only
.
- Intuitive Editing.
python inference.py \ --config configs/inference.yaml \ --image_source=./docs/demo/images/40.jpg \ --inversion_option=optimize \ --intuitive_edit \ --output_dir=./docs/demo/output \ --if_extract
The 3dmm parameters of the images can also be pre-extracted or online-extracted with the parameter--if_extract
.
- Attribute Editing.
python inference.py \ --config configs/inference.yaml \ --video_source=./docs/demo/videos/RD_Radio34_003_512.mp4 \ --image_source=./docs/demo/images/40.jpg \ --attribute_edit --attribute=young \ --cross_id \ --output_dir=./docs/demo/output
The support editable attributes includeyoung
,old
,beard
,lip
.Note to preserve the editing attributes details in W space, the optimized inversion method is banned here.
- Audio Reenactment.
Please first install SadTalker in the folder ofthird_part
as the format ofthird_part/SadTalker
.Download its pre-trained checkpoints according to their instructions.Install the additional libraries withpip install pydub==0.25.1 yacs==0.1.8 librosa==0.6.0 numba==0.48.0 resampy==0.3.1 imageio-ffmpeg==0.4.7
.Then you can run audio reenactment freely.
python inference.py \ --config configs/inference.yaml \ --audio_path=./docs/demo/audios/RD_Radio31_000.wav \ --image_source=./docs/demo/images/100.jpg \ --cross_id --if_extract \ --output_dir=./docs/demo/output \ --inversion_option=optimize
- Data preprocessing.
To train the VideoWarper, please followvideo-preprocessingto download and pre-process the VoxCelebA dataset.
To train the whole framework, please followHDTFto download the HDTF dataset and seeHDTF-preprocessing to pre-process the dataset.
Please followPIRenderer to extract the 3DMM parameters and prepare all the data into lmdb files.
Training include 2 stages.
- Train VideoWarper
bash bash/train_video_warper.sh
- Train Video Calibrator
bash bash/train_video_styleheat.sh
Note several path hyper-parameter of dataset need to be modified and then run the script.
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)
If you find this work useful for your research, please cite:
@article{2203.04036, author = {Yin, Fei and Zhang, Yong and Cun, Xiaodong and Cao, Mingdeng and Fan, Yanbo and Wang, Xuan and Bai, Qingyan and Wu, Baoyuan and Wang, Jue and Yang, Yujiu}, title = {StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN}, journal = {arxiv:2203.04036}, year = {2022}}
Thanks toStyleGAN-2,PIRenderer,HFGI,BaberShop,GFP-GAN,Pixel2Style2Pixel,SadTalkerfor sharing their code.
About
[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation