VITA-MLLM/Long-VITAPublic

NotificationsYou must be signed in to change notification settings
Fork29
Star254

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

License

View license

254 stars 29 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
VLMEvalKit		VLMEvalKit
configs		configs
long_vita		long_vita
long_vita_megatron		long_vita_megatron
long_vita_modellink		long_vita_modellink
scripts		scripts
third_party		third_party
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
DATA.md		DATA.md
GPU_DeepSpeed.md		GPU_DeepSpeed.md
GPU_Megatron.md		GPU_Megatron.md
LICENSE		LICENSE
NPU_MindSpeed.md		NPU_MindSpeed.md
README.md		README.md
requirements.txt		requirements.txt
requirements_npu.txt		requirements_npu.txt
setup.py		setup.py

Repository files navigation

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

🔥 News

2025.02.27 🌟 We have anOneline Demo now.
2025.02.27 🌟VLMEvalKit of OpenCompass has supported our Long-VITA.
2025.02.17 🌟 We support training onNvidia GPU with DeepSpeed and inference onNvidia GPU with Transformer.
2025.02.09 🌟 We support training and inference onNvidia GPU with Megatron.
2025.02.05 🌟 We release training code,training log, deployment code, and model weights, which supportAscend NPU with MindSpeed.
2024.02.05 🌟 We are proud to launch Long-VITA, a strong long-context visual language model supporting over one million tokens.

✨ Highlights

Long Context. Long-VITA can process more than4K frames or over1M visual tokens. It achieves state-of-the-art performance on Video-MME under 20B models.
Open Source. Long-VITA is trained onopen-source data only, consisting of a mix of 17M samples that are publicly available.
Strong Performance. Long-VITA achieves competitive results on image and video understanding benchmarks among cutting-edge models under 20B parameters.

📈 Experimental Results

Comparison of image understanding.

Comparison of video understanding.

Effectiveness of Logits-Masked LM Head.

🐍 Models

Model	LLM Size	Training Context	Training Frames	MindSpeed Weights	Megatron Weights	Huggingface Weights
Long-VITA-16K	14B	16,384	64	https://huggingface.co/VITA-MLLM/Long-VITA-16K	https://huggingface.co/VITA-MLLM/Long-VITA-16K_MG	https://huggingface.co/VITA-MLLM/Long-VITA-16K_HF
Long-VITA-128K	14B	131,072	512	https://huggingface.co/VITA-MLLM/Long-VITA-128K	https://huggingface.co/VITA-MLLM/Long-VITA-128K_MG	https://huggingface.co/VITA-MLLM/Long-VITA-128K_HF
Long-VITA-1M	14B	1,048,576	4,096	https://huggingface.co/VITA-MLLM/Long-VITA-1M	https://huggingface.co/VITA-MLLM/Long-VITA-1M_MG	https://huggingface.co/VITA-MLLM/Long-VITA-1M_HF

⭐ Training, Inference and Evaluation

We originally implemented Long-VITA on Ascend NPU and will adapt to Nvidia GPU.

About

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

🔥 News

Contents

✨ Highlights

📈 Experimental Results

🐍 Models

⭐ Training, Inference and Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Contributors2

Languages

Movatterモバイル変換

License

VITA-MLLM/Long-VITA

Folders and files

Latest commit

History

Repository files navigation

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

🔥 News

Contents

✨ Highlights

📈 Experimental Results

🐍 Models

⭐ Training, Inference and Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Languages

Packages