- Notifications
You must be signed in to change notification settings - Fork29
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
License
NotificationsYou must be signed in to change notification settings
VITA-MLLM/Long-VITA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
2025.02.27
🌟 We have anOneline Demo now.2025.02.27
🌟VLMEvalKit of OpenCompass has supported our Long-VITA.2025.02.17
🌟 We support training onNvidia GPU with DeepSpeed and inference onNvidia GPU with Transformer.2025.02.09
🌟 We support training and inference onNvidia GPU with Megatron.2025.02.05
🌟 We release training code,training log, deployment code, and model weights, which supportAscend NPU with MindSpeed.2024.02.05
🌟 We are proud to launch Long-VITA, a strong long-context visual language model supporting over one million tokens.
- Long Context. Long-VITA can process more than4K frames or over1M visual tokens. It achieves state-of-the-art performance on Video-MME under 20B models.
- Open Source. Long-VITA is trained onopen-source data only, consisting of a mix of 17M samples that are publicly available.
- Strong Performance. Long-VITA achieves competitive results on image and video understanding benchmarks among cutting-edge models under 20B parameters.
- Comparison of image understanding.
- Comparison of video understanding.
- Effectiveness of Logits-Masked LM Head.
Model | LLM Size | Training Context | Training Frames | MindSpeed Weights | Megatron Weights | Huggingface Weights |
---|---|---|---|---|---|---|
Long-VITA-16K | 14B | 16,384 | 64 | https://huggingface.co/VITA-MLLM/Long-VITA-16K | https://huggingface.co/VITA-MLLM/Long-VITA-16K_MG | https://huggingface.co/VITA-MLLM/Long-VITA-16K_HF |
Long-VITA-128K | 14B | 131,072 | 512 | https://huggingface.co/VITA-MLLM/Long-VITA-128K | https://huggingface.co/VITA-MLLM/Long-VITA-128K_MG | https://huggingface.co/VITA-MLLM/Long-VITA-128K_HF |
Long-VITA-1M | 14B | 1,048,576 | 4,096 | https://huggingface.co/VITA-MLLM/Long-VITA-1M | https://huggingface.co/VITA-MLLM/Long-VITA-1M_MG | https://huggingface.co/VITA-MLLM/Long-VITA-1M_HF |
We originally implemented Long-VITA on Ascend NPU and will adapt to Nvidia GPU.