large-vision-language-model
Here are 21 public repositories matching this topic...
Language:All
Sort:Most stars
✨✨Latest Advances on Multimodal Large Language Models
- Updated
Dec 12, 2025
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
- Updated
Dec 3, 2024 - Python
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
- Updated
May 26, 2025 - Python
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
- Updated
Jul 15, 2025 - Python
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
- Updated
Nov 14, 2025
[NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies
- Updated
Apr 14, 2025 - Python
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
- Updated
Sep 26, 2024 - Python
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
- Updated
Oct 10, 2024 - Python
✨✨latest advancements in VLA models(VIsion Language Action)
- Updated
Apr 14, 2025
[NeurIPS'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
- Updated
Dec 4, 2024 - Python
🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
- Updated
Mar 18, 2025 - Python
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
- Updated
Jul 23, 2025
This is the offical repository of LLAVIDAL
- Updated
Oct 4, 2025 - Python
[CVPR 2024 Highlight] The first benchmark for lithic use-wear analysis leveraging SOTA vision and vision-language models (DINOv2, GPT-4V), demonstrating AI performance surpassing that of expert archaeologists.
- Updated
Mar 24, 2025 - Jupyter Notebook
Source code of our paper "Transferring Textual Preferences to Vision-Language Understanding through Model Merging", ACL 2025
- Updated
Apr 25, 2025 - Python
Easy-to-use large vision language model pipeline for quantitative analysis
- Updated
Apr 26, 2025 - Python
Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.
- Updated
Aug 6, 2025 - Python
Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model
- Updated
May 13, 2025 - Python
🎤 Transform speech and text with this lightweight Python toolkit for transcription, analysis, and audio conversion tasks.
- Updated
Dec 17, 2025 - Jupyter Notebook
🔍 Experiment with neural networks for binary classification on multimodal data using this extensible PyTorch framework.
- Updated
Dec 17, 2025 - Python
Improve this page
Add a description, image, and links to thelarge-vision-language-model topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thelarge-vision-language-model topic, visit your repo's landing page and select "manage topics."