Movatterモバイル変換

Skip to content

#

vision-language-model

Here are 601 public repositories matching this topic...

Language:All

Filter by language

All601 Python426 Jupyter Notebook85 JavaScript7 HTML5 TypeScript4 Java3 Rust3 Kotlin2 Shell2 C#1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

haotian-liu /LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

UpdatedAug 12, 2024
Python

OpenGVLab /InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification gpt multi-modal semantic-segmentation video-classification image-text-retrieval llm vision-language-model gpt-4v vit-6b vit-22b gpt-4o

UpdatedSep 22, 2025
Python

CVHub520 /X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

machine-learning ocr computer-vision deep-learning sam artificial-intelligence yolo image-classification object-detection clip pose-estimation paddlepaddle instance-segmentation image-matting image-labeling-tool onnxruntime image-annotation-tool rotated-object-detection vision-language-model groundingdino

UpdatedNov 5, 2025
Python

QwenLM /Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-models vision-language-model

UpdatedAug 7, 2024
Python

jingyaogong /minimind-v

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

artificial-intelligence chatgpt vision-language-model

UpdatedOct 30, 2025
Python

PKU-Alignment /align-anything

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

UpdatedAug 25, 2025
Jupyter Notebook

deepseek-ai /DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

foundation-models vision-language-pretraining vision-language-model

UpdatedApr 24, 2024
Python

dvlab-research /MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

generation large-language-models vision-language-model

UpdatedMay 4, 2024
Python

volcengine /MineContext

MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse）

electron react javascript python agent typescript memory python3 embedding-models rag vector-database vision-language-model proactive-ai context-engineering

UpdatedNov 5, 2025
Python

MiniMax-AI /MiniMax-01

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

UpdatedJul 7, 2025
Python

jingyi0000 /VLM_survey

Collection of AWESOME vision-language models for vision tasks

computer-vision deep-learning survey transfer-learning clip knowledge-distillation vision-language-model multi-modal-model

UpdatedOct 14, 2025

InternLM /InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

UpdatedMay 26, 2025
Python

BAAI-Agents /Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

UpdatedNov 7, 2024
Python

illuin-tech /colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

information-retrieval vision-language-model retrieval-augmented-generation colpali colqwen2 colsmol

UpdatedNov 1, 2025
Python

Blaizzy /mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics florence2 paligemma pixtral molmo

UpdatedNov 5, 2025
Python

AlibabaResearch /AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

ocr computer-vision artificial-intelligence text-recognition document text-detection document-analysis end-to-end-ocr multimodal scene-text-recognition multimodal-deep-learning scene-text-detection vision-language document-understanding scene-text-detection-recognition document-recognition document-intelligence documentai vision-language-transformer vision-language-model

UpdatedApr 9, 2025
C++

showlab /ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

agent vision-language-model vision-language-action computer-use gui-agent

UpdatedMay 29, 2025
Python

ByteDance-Seed /Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

cookbook large-language-model vision-language-model multimodal-large-language-models

UpdatedJun 14, 2025
Jupyter Notebook

NVlabs /describe-anything

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

vision-language-model large-multimodal-models describe-anything detailed-localized-captioning

UpdatedJun 26, 2025
Python

AIDC-AI /Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

UpdatedSep 22, 2025
Python

Improve this page

Add a description, image, and links to thevision-language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevision-language-model topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp