vision-language-model
Here are 601 public repositories matching this topic...
Language:All
Sort:Most stars
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
- Updated
Aug 12, 2024 - Python
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
- Updated
Sep 22, 2025 - Python
Effortless data labeling with AI support from Segment Anything and other awesome models.
- Updated
Nov 5, 2025 - Python
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
- Updated
Aug 7, 2024 - Python
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
- Updated
Oct 30, 2025 - Python
Align Anything: Training All-modality Model with Feedback
- Updated
Aug 25, 2025 - Jupyter Notebook
DeepSeek-VL: Towards Real-World Vision-Language Understanding
- Updated
Apr 24, 2024 - Python
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
- Updated
May 4, 2024 - Python
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
- Updated
Nov 5, 2025 - Python
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
- Updated
Jul 7, 2025 - Python
Collection of AWESOME vision-language models for vision tasks
- Updated
Oct 14, 2025
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
- Updated
May 26, 2025 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
- Updated
Nov 7, 2024 - Python
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
- Updated
Nov 1, 2025 - Python
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
- Updated
Nov 5, 2025 - Python
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
- Updated
Apr 9, 2025 - C++
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
- Updated
May 29, 2025 - Python
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
- Updated
Jun 14, 2025 - Jupyter Notebook
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
- Updated
Jun 26, 2025 - Python
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
- Updated
Sep 22, 2025 - Python
Improve this page
Add a description, image, and links to thevision-language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thevision-language-model topic, visit your repo's landing page and select "manage topics."