vision-language-models

Star

Here are 132 public repositories matching this topic...

Language:All

Filter by language

All132 Python94 Jupyter Notebook14 HTML4 JavaScript3 Go1 TypeScript1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

LYL1015 /JarvisArt

Star785

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

agent image-processing lightroom large-language-models mllm vision-language-models

UpdatedFeb 8, 2026
Python

taco-group /4KAgent

Star745

[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!

agent workflow computer-vision image-processing low-level super-resolution image-restoration image-enhancement neurips large-language-models llm mllm vision-language-models agentic-ai neurips-2025

UpdatedSep 24, 2025
Python

zli12321 /Vision-Language-Models-Overview

Star526

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

reinforcement-learning clip claude world-models multimodal-models sota-model llava blip2 gpt-4v gemini-pro deepseek vision-language-models qwen-vl llama-vision-model multimodal-benchmarks vision-language-model-applications finevision-pretrain-dataset

UpdatedFeb 5, 2026

xiaomi-research /recogdrive

Star447

[ICLR 2026] ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

reinforcement-learning autonomous-driving diffusion-policy vision-language-action vision-language-models navsim end-to-end-a

UpdatedJan 26, 2026
Python

baaivision /EVE

Star368

EVE Series: Encoder-Free Vision-Language Models from BAAI

clip vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models encoder-free-vlm

UpdatedJul 24, 2025
Python

worldbench /awesome-vla-for-ad

Star295

🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

awesome-list autonomous-driving multi-modal 3d vla self-driving vlm vision-language embodied-ai large-language-models llm multimodal-large-language-models vision-language-action vision-language-models

UpdatedFeb 10, 2026
HTML

NishilBalar /Awesome-LVLM-Hallucination

Star269

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

UpdatedFeb 8, 2026

worldbench /DriveBench

Star231

[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

UpdatedDec 12, 2025
Python

Y-Research-SBU /PosterGen

Star214

Official Repository for PosterGen

agent large-language-model vision-language-models

UpdatedFeb 5, 2026
Python

snap-research /MyVLM

Star186

Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

personalization vision-language-models

UpdatedJul 5, 2024
Python

zli12321 /FFGO-Video-Customization

Star168

Video Content Customization Using First Frame

autonomous-driving game-simulation image-to-video diffusion-models vision-language-models lora-fine-tuning video-content-custoization subject-mixing product-selling

UpdatedJan 8, 2026
Python

BAAI-Agents /GPA-LM

Star162

This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".

games ai gcc planning gameplay awesome-list agents gameai vlm multimodal agent-framework large-language-models llm generative-ai vision-language-models general-computer-control

UpdatedSep 3, 2024

zli12321 /Vision-SR1

Star160

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

reinforcement-learning self-improvement self-rewarding vision-language-models qwen-vl grpo self-evolving-ai visual-perception-reward

UpdatedSep 23, 2025
Python

baaivision /DenseFusion

Star159

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

vlm image-descriptions visual-perception mllm multimodal-large-language-models vision-language-models

UpdatedDec 6, 2024
Python

mala-lab /Awesome-Anomaly-Detection-Foundation-Models

Star150

A curated list of papers & resources on anomaly detection foundation models using large language model, vision-language model, graph foundation model, time series foundation model, etc

anomaly-detection foundation-models large-language-models multimodal-large-language-models vision-language-models time-series-foundation-models multimodal-foundation-models

UpdatedFeb 10, 2026

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.

remote-sensing segmentation-models foundation-models large-vision-language-models large-multimodal-models vision-language-models grounding-llms