- Notifications
You must be signed in to change notification settings - Fork100
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)
License
MCG-NKU/E2FGVI
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
English |简体中文
This repository contains the official implementation of the following paper:
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[Paper][Demo Video (Youtube)][演示视频 (B站)][MindSpore Implementation][Project Page (TBD)][Poster (TBD)]
You can try our colab demo here:
2022.05.15: We release E2FGVI-HQ, which can handle videos witharbitrary resolution. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performsbetter than our original model on both PSNR and SSIM metrics.:link: Download links: [Google Drive] [Baidu Disk] 🎥 Demo video: [Youtube] [B站]
2022.04.06: Our code is publicly available.
- SOTA performance: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods.
- Highly effiency: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTAmethods.
- Update website page
- Hugging Face demo
- Efficient inference
Clone Repo
git clone https://github.com/MCG-NKU/E2FGVI.git
Create Conda Environment and Install Dependencies
conda env create -f environment.ymlconda activate e2fgvi
- Python >= 3.7
- PyTorch >= 1.5
- CUDA >= 9.2
- mmcv-full (following the pipeline to install)
If the
environment.yml
file does not work for you, please followthis issue to solve the problem.
Before performing the following steps, please download our pretrained model first.
Model | 🔗 Download Links | Support Arbitrary Resolution ? | PSNR / SSIM / VFID (DAVIS) |
---|---|---|---|
E2FGVI | [Google Drive] [Baidu Disk] | ❌ | 33.01 / 0.9721 / 0.116 |
E2FGVI-HQ | [Google Drive] [Baidu Disk] | ⭕ | 33.06 / 0.9722 / 0.117 |
Then, unzip the file and place the models torelease_model
directory.
The directory structure will be arranged as:
release_model |- E2FGVI-CVPR22.pth |- E2FGVI-HQ-CVPR22.pth |- i3d_rgb_imagenet.pt (for evaluating VFID metric) |- README.md
We provide two examples in theexamples
directory.
Run the following command to enjoy them:
# The first example (using split video frames)python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)# The second example (using mp4 format video)python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth)
The inpainting video will be saved in theresults
directory.Please prepare your ownmp4 video (orsplit frames) andframe-wise masks if you want to test more cases.
Note: E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the--set_size
flag and set the values of--width
and--height
.
Example:
# Using this command to output a 720p videopython test.py --model e2fgvi_hq --video<video_path> --mask<mask_path> --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720
Dataset | YouTube-VOS | DAVIS |
---|---|---|
Details | For training (3,471) and evaluation (508) | For evaluation (50 in 90) |
Images | [Official Link] (Download train and test all frames) | [Official Link] (2017, 480p, TrainVal) |
Masks | [Google Drive] [Baidu Disk] (For reproducing paper results) |
The training and test split files are provided indatasets/<dataset_name>
.
For each dataset, you should placeJPEGImages
todatasets/<dataset_name>
.
Then, runsh datasets/zip_dir.sh
(Note: please edit the folder path accordingly) for compressing each video indatasets/<dataset_name>/JPEGImages
.
Unzip downloaded mask files todatasets
.
Thedatasets
directory structure will be arranged as: (Note: please check it carefully)
datasets |- davis |- JPEGImages |- <video_name>.zip |- <video_name>.zip |- test_masks |- <video_name> |- 00000.png |- 00001.png |- train.json |- test.json |- youtube-vos |- JPEGImages |- <video_id>.zip |- <video_id>.zip |- test_masks |- <video_id> |- 00000.png |- 00001.png |- train.json |- test.json |- zip_file.sh
Run one of the following commands for evaluation:
# For evaluating E2FGVI model python evaluate.py --model e2fgvi --dataset<dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth# For evaluating E2FGVI-HQ model python evaluate.py --model e2fgvi_hq --dataset<dataset_name> --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth
You will get scores as paper reported if you evaluate E2FGVI.The scores of E2FGVI-HQ can be found in [Prepare pretrained models].
The scores will also be saved in theresults/<model_name>_<dataset_name>
directory.
Please--save_results
for furtherevaluating temporal warping error.
Our training configures are provided intrain_e2fgvi.json
(for E2FGVI) andtrain_e2fgvi_hq.json
(for E2FGVI-HQ).
Run one of the following commands for training:
# For training E2FGVI python train.py -c configs/train_e2fgvi.json# For training E2FGVI-HQ python train.py -c configs/train_e2fgvi_hq.json
You could run the same command if you want to resume your training.
The training loss can be monitored by running:
tensorboard --logdir release_model
You could followthis pipeline to evaluate your model.
If you find our repo useful for your research, please consider citing our paper:
@inproceedings{liCvpr22vInpainting,title={Towards An End-to-End Framework for Flow-Guided Video Inpainting},author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming},booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},year={2022}}
If you have any question, please feel free to contact us viazhenli1031ATgmail.com
orczlu919AToutlook.com
.
Licensed under aCreative Commons Attribution-NonCommercial 4.0 International for Non-commercial use only.Any commercial use should get formal permission first.
This repository is maintained byZhen Li andCheng-Ze Lu.
This code is based onSTTN,FuseFormer,Focal-Transformer, andMMEditing.