- Notifications
You must be signed in to change notification settings - Fork23
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
License
FoundationVision/FlashVideo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Shilong Zhang,Wenbo Li,Shoufa Chen,Chongjian Ge,Peize Sun,
Yida Zhang,Yi Jiang,Zehuan Yuan,Bingyue Peng,Ping Luo,
HKU, CUHK, ByteDance
- [2025.02.10] 🔥 🔥 🔥 Inference code and both stage modelweights have been released.
In this repository, we provide:
- The stage-I weight for 270P video generation.
- The stage-II for enhancing 270P video to 1080P.
- Inference code of both stages.
- Training code and related augmentation. Work in processPR#12
- Loss function
- Dataset and augmentation
- Configuration and training script
- Implementation with diffusers.
- Gradio.
This repository is tested with PyTorch 2.4.0+cu121 and Python 3.11.11. You can install the necessary dependencies using the following command:
pip install -r requirements.txt
To get the 3D VAE (identical to CogVideoX), along with Stage-I and Stage-II weights, set them up as follows:
cd FlashVideomkdir -p ./checkpointshuggingface-cli download --local-dir ./checkpoints FoundationVision/FlashVideo
The checkpoints should be organized as shown below:
├── 3d-vae.pt├── stage1.pt└── stage2.pt
⚠️ IMPORTANT NOTICE⚠️ : Both stage-I and stage-II are trained with long prompts only. For achieving the best results, include comprehensive and detailed descriptions in your prompts, akin to the example provided inexample.txt.
You can conveniently provide user prompts in our Jupyter notebook. The default configuration for spatial and temporal slices in the VAE Decoder is tailored for an 80G GPU. For GPUs with less memory, one might consider increasing thespatial and temporal slice.
flashvideo/demo.ipynb
You can conveniently provide the user prompt in a text file and generate videos with multiple gpus.
bashinf_270_1080p.sh
This project is developed based onCogVideoX. Please refer to their originallicense for usage details.
@article{zhang2025flashvideo,title={FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation},author={Zhang, Shilong and Li, Wenbo and Chen, Shoufa and Ge, Chongjian and Sun, Peize and Zhang, Yida and Jiang, Yi and Yuan, Zehuan and Peng, Binyue and Luo, Ping},journal={arXiv preprint arXiv:2502.05179},year={2025}}