Movatterモバイル変換

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding. In 2023, we createdVideoChat🦜,llama-adapter🦙, 3D foundation modelPonder V2🧊 and many more wonderful works! InCVPR 2023, our vision foundation modelInternImage was listed as one of the most influential papers, and by benefiting from our partnerOpenDriveLab, we won theBest paper together🎉 .

In2024, we released thebest open-source VLMInternVL , video understanding foundation modelInternVideo2, which won 7 Champions onEgoVis challenges 🥇. Up to now, our brilliant team have open-sourced more than 70 works, pleasefind them here😃

Based on solid vision foundations, we have expanded to Multi-Modality models and. We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches：Alpha (explore lattest advances in vision+language research),uni-medical (focus on medical AI),Vchitect (Generative AI）

PinnedLoading

InternVLInternVLPublic
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Python 9.8k 758
InternVideoInternVideoPublic
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Python 2.2k 139
Ask-AnythingAsk-AnythingPublic
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Python 3.3k 267
VideoMambaVideoMambaPublic
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Python 1.1k 92
OmniQuantOmniQuantPublic
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Python 888 76
LLaMA-AdapterLLaMA-AdapterPublic
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Python 5.9k 381

Showing 10 of 91 repositories

Vlaser Public
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
OpenGVLab/Vlaser’s past year of commit activity
Python 41MIT0 0 0 UpdatedFeb 16, 2026
UMMEvalKit Public
A unified, efficient, and extensible evaluation toolkit for unified multimodal models
OpenGVLab/UMMEvalKit’s past year of commit activity
Jupyter Notebook 5MIT 1 0 0 UpdatedFeb 12, 2026
VKnowU Public
OpenGVLab/VKnowU’s past year of commit activity
Python 110 0 0 UpdatedFeb 3, 2026
GenExam Public
GenExam: A Multidisciplinary Text-to-Image Exam
OpenGVLab/GenExam’s past year of commit activity
Python 56MIT 4 0 0 UpdatedJan 29, 2026
MetaCaptioner Public
OpenGVLab/MetaCaptioner’s past year of commit activity
Python 44 3 1 0 UpdatedJan 27, 2026
ScaleCUA Public
ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
OpenGVLab/ScaleCUA’s past year of commit activity
Python 1,075Apache-2.0 74 14 0 UpdatedJan 7, 2026
GUI-Odyssey Public
[ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 212 apps, and 1.4K app combos.
OpenGVLab/GUI-Odyssey’s past year of commit activity
Python 147 8 10 0 UpdatedJan 3, 2026
SDLM Public
Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high efficiency and throughput.
OpenGVLab/SDLM’s past year of commit activity
Python 90MIT 3 0 0 UpdatedDec 27, 2025
InternVideo Public
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
OpenGVLab/InternVideo’s past year of commit activity
Python 2,198Apache-2.0 139 134 4 UpdatedDec 15, 2025
SID-VLN Public
Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
OpenGVLab/SID-VLN’s past year of commit activity
Python 11MIT 2 0 0 UpdatedNov 29, 2025