vision-language-models
Here are 132 public repositories matching this topic...
Language:All
Sort:Most stars
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
- Updated
Feb 8, 2026 - Python
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that can magically restore any image to perfect-4K!
- Updated
Sep 24, 2025 - Python
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
- Updated
Feb 5, 2026
[ICLR 2026] ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
- Updated
Jan 26, 2026 - Python
EVE Series: Encoder-Free Vision-Language Models from BAAI
- Updated
Jul 24, 2025 - Python
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
- Updated
Feb 10, 2026 - HTML
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
- Updated
Feb 8, 2026
[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
- Updated
Dec 12, 2025 - Python
Official Repository for PosterGen
- Updated
Feb 5, 2026 - Python
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
- Updated
Jul 5, 2024 - Python
Video Content Customization Using First Frame
- Updated
Jan 8, 2026 - Python
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
- Updated
Sep 3, 2024
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
- Updated
Sep 23, 2025 - Python
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
- Updated
Dec 6, 2024 - Python
A curated list of papers & resources on anomaly detection foundation models using large language model, vision-language model, graph foundation model, time series foundation model, etc
- Updated
Feb 10, 2026
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
- Updated
May 28, 2025 - Python
🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
- Updated
Feb 1, 2026 - HTML
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
- Updated
Oct 10, 2024 - Python
[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
- Updated
Aug 5, 2025 - Python
[AAAI-2026] Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
- Updated
Nov 17, 2025 - Python
Improve this page
Add a description, image, and links to thevision-language-models topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thevision-language-models topic, visit your repo's landing page and select "manage topics."