Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

License

NotificationsYou must be signed in to change notification settings

VITA-MLLM/Long-VITA

Repository files navigation

🔥 News

  • 2025.02.27 🌟 We have anOneline Demo now.
  • 2025.02.27 🌟VLMEvalKit of OpenCompass has supported our Long-VITA.
  • 2025.02.17 🌟 We support training onNvidia GPU with DeepSpeed and inference onNvidia GPU with Transformer.
  • 2025.02.09 🌟 We support training and inference onNvidia GPU with Megatron.
  • 2025.02.05 🌟 We release training code,training log, deployment code, and model weights, which supportAscend NPU with MindSpeed.
  • 2024.02.05 🌟 We are proud to launch Long-VITA, a strong long-context visual language model supporting over one million tokens.

Contents

✨ Highlights

  • Long Context. Long-VITA can process more than4K frames or over1M visual tokens. It achieves state-of-the-art performance on Video-MME under 20B models.
  • Open Source. Long-VITA is trained onopen-source data only, consisting of a mix of 17M samples that are publicly available.
  • Strong Performance. Long-VITA achieves competitive results on image and video understanding benchmarks among cutting-edge models under 20B parameters.

📈 Experimental Results

  • Comparison of image understanding.

imageimage

  • Comparison of video understanding.

image

image

  • Effectiveness of Logits-Masked LM Head.

image

🐍 Models

ModelLLM SizeTraining ContextTraining FramesMindSpeed WeightsMegatron WeightsHuggingface Weights
Long-VITA-16K14B16,38464https://huggingface.co/VITA-MLLM/Long-VITA-16Khttps://huggingface.co/VITA-MLLM/Long-VITA-16K_MGhttps://huggingface.co/VITA-MLLM/Long-VITA-16K_HF
Long-VITA-128K14B131,072512https://huggingface.co/VITA-MLLM/Long-VITA-128Khttps://huggingface.co/VITA-MLLM/Long-VITA-128K_MGhttps://huggingface.co/VITA-MLLM/Long-VITA-128K_HF
Long-VITA-1M14B1,048,5764,096https://huggingface.co/VITA-MLLM/Long-VITA-1Mhttps://huggingface.co/VITA-MLLM/Long-VITA-1M_MGhttps://huggingface.co/VITA-MLLM/Long-VITA-1M_HF

⭐ Training, Inference and Evaluation

We originally implemented Long-VITA on Ascend NPU and will adapt to Nvidia GPU.

About

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp