FoundationVision/FlashVideoPublic

NotificationsYou must be signed in to change notification settings
Fork23
Star421

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

foundationvision.github.io/flashvideo-page/

License

Apache-2.0 license

421 stars 23 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figs		figs
flashvideo		flashvideo
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
example.txt		example.txt
inf_270_1080p.sh		inf_270_1080p.sh
requirements.txt		requirements.txt

Repository files navigation

Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Shilong Zhang,Wenbo Li,Shoufa Chen,Chongjian Ge,Peize Sun,
Yida Zhang,Yi Jiang,Zehuan Yuan,Bingyue Peng,Ping Luo,
HKU, CUHK, ByteDance

🤗 More video examples 👀 can be accessed at the

⚡⚡ User Prompt to270p, NFE = 50, Takes ~30s⚡⚡

⚡⚡270p to1080p , NFE = 4, Takes ~72s⚡⚡

🔥 Update

[2025.02.10] 🔥 🔥 🔥 Inference code and both stage modelweights have been released.

🌿 Introduction

In this repository, we provide:

The stage-I weight for 270P video generation.
The stage-II for enhancing 270P video to 1080P.
Inference code of both stages.
Training code and related augmentation. Work in processPR#12
- Loss function
- Dataset and augmentation
- Configuration and training script
Implementation with diffusers.
Gradio.

Install

1. Environment Setup

This repository is tested with PyTorch 2.4.0+cu121 and Python 3.11.11. You can install the necessary dependencies using the following command:

pip install -r requirements.txt

2. Preparing the Checkpoints

To get the 3D VAE (identical to CogVideoX), along with Stage-I and Stage-II weights, set them up as follows:

cd FlashVideomkdir -p ./checkpointshuggingface-cli download --local-dir ./checkpoints  FoundationVision/FlashVideo

The checkpoints should be organized as shown below:

├── 3d-vae.pt├── stage1.pt└── stage2.pt

🚀 Text to Video Generation

⚠️ IMPORTANT NOTICE⚠️ : Both stage-I and stage-II are trained with long prompts only. For achieving the best results, include comprehensive and detailed descriptions in your prompts, akin to the example provided inexample.txt.

Jupyter Notebook

You can conveniently provide user prompts in our Jupyter notebook. The default configuration for spatial and temporal slices in the VAE Decoder is tailored for an 80G GPU. For GPUs with less memory, one might consider increasing thespatial and temporal slice.

flashvideo/demo.ipynb

Inferring from a Text File Containing Prompts

You can conveniently provide the user prompt in a text file and generate videos with multiple gpus.

bashinf_270_1080p.sh

License

This project is developed based onCogVideoX. Please refer to their originallicense for usage details.

BibTeX

@article{zhang2025flashvideo,title={FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation},author={Zhang, Shilong and Li, Wenbo and Chen, Shoufa and Ge, Chongjian and Sun, Peize and Zhang, Yida and Jiang, Yi and Yuan, Zehuan and Peng, Binyue and Luo, Ping},journal={arXiv preprint arXiv:2502.05179},year={2025}}

About

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

foundationvision.github.io/flashvideo-page/

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

🤗 More video examples 👀 can be accessed at the

⚡⚡ User Prompt to270p, NFE = 50, Takes ~30s⚡⚡

⚡⚡270p to1080p , NFE = 4, Takes ~72s⚡⚡

🔥 Update

🌿 Introduction

Install

1. Environment Setup

2. Preparing the Checkpoints

🚀 Text to Video Generation

⚠️ IMPORTANT NOTICE⚠️ : Both stage-I and stage-II are trained with long prompts only. For achieving the best results, include comprehensive and detailed descriptions in your prompts, akin to the example provided inexample.txt.

Jupyter Notebook

Inferring from a Text File Containing Prompts

License

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

FoundationVision/FlashVideo

Folders and files

Latest commit

History

Repository files navigation

Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

🤗 More video examples 👀 can be accessed at the

⚡⚡ User Prompt to270p, NFE = 50, Takes ~30s⚡⚡

⚡⚡270p to1080p , NFE = 4, Takes ~72s⚡⚡

🔥 Update

🌿 Introduction

Install

1. Environment Setup

2. Preparing the Checkpoints

🚀 Text to Video Generation

⚠️ IMPORTANT NOTICE⚠️ : Both stage-I and stage-II are trained with long prompts only. For achieving the best results, include comprehensive and detailed descriptions in your prompts, akin to the example provided inexample.txt.

Jupyter Notebook

Inferring from a Text File Containing Prompts

License

BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages