Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

vision-language-model

Here are 601 public repositories matching this topic...

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

  • UpdatedAug 12, 2024
  • Python

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

  • UpdatedSep 22, 2025
  • Python

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

  • UpdatedAug 7, 2024
  • Python

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

  • UpdatedOct 30, 2025
  • Python

Align Anything: Training All-modality Model with Feedback

  • UpdatedAug 25, 2025
  • Jupyter Notebook

DeepSeek-VL: Towards Real-World Vision-Language Understanding

  • UpdatedApr 24, 2024
  • Python

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

  • UpdatedMay 4, 2024
  • Python

MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)

  • UpdatedNov 5, 2025
  • Python

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

  • UpdatedJul 7, 2025
  • Python

Collection of AWESOME vision-language models for vision tasks

  • UpdatedOct 14, 2025

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

  • UpdatedNov 7, 2024
  • Python

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

  • UpdatedNov 1, 2025
  • Python

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

  • UpdatedNov 5, 2025
  • Python

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

  • UpdatedMay 29, 2025
  • Python

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

  • UpdatedJun 14, 2025
  • Jupyter Notebook

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

  • UpdatedJun 26, 2025
  • Python

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

  • UpdatedSep 22, 2025
  • Python

Improve this page

Add a description, image, and links to thevision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevision-language-model topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp