large-vision-language-models
Here are 48 public repositories matching this topic...
Language:All
Sort:Most stars
✨✨Latest Advances on Multimodal Large Language Models
- Updated
Jul 11, 2025
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
- Updated
Oct 9, 2024 - Python
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
- Updated
Oct 1, 2024 - Python
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
- Updated
Jul 4, 2025
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
- Updated
May 8, 2025
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
- Updated
Apr 4, 2025 - HTML
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
- Updated
Dec 15, 2024
Curated papers on Large Language Models in Healthcare and Medical domain
- Updated
May 29, 2025
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
- Updated
Nov 13, 2024 - Python
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
- Updated
Jul 1, 2024 - Python
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
- Updated
Jun 17, 2025
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
- Updated
Sep 26, 2024 - Python
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
- Updated
May 10, 2025
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
- Updated
Nov 4, 2024 - Python
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.
- Updated
Jun 19, 2025
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
- Updated
May 28, 2025 - Python
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
- Updated
Oct 10, 2024 - Python
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
- Updated
Feb 22, 2025 - Python
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
- Updated
Jan 19, 2025 - Python
[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.
- Updated
Jun 20, 2025 - Python
Improve this page
Add a description, image, and links to thelarge-vision-language-models topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thelarge-vision-language-models topic, visit your repo's landing page and select "manage topics."