Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

large-vision-language-models

Here are 48 public repositories matching this topic...

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

  • UpdatedOct 9, 2024
  • Python

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

  • UpdatedJul 4, 2025

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

  • UpdatedMay 8, 2025

Curated papers on Large Language Models in Healthcare and Medical domain

  • UpdatedMay 29, 2025

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

  • UpdatedNov 13, 2024
  • Python

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

  • UpdatedJul 1, 2024
  • Python

A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

  • UpdatedJun 17, 2025

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

  • UpdatedSep 26, 2024
  • Python

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

  • UpdatedNov 4, 2024
  • Python

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

  • UpdatedJun 19, 2025

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.

  • UpdatedMay 28, 2025
  • Python

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

  • UpdatedOct 10, 2024
  • Python

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

  • UpdatedFeb 22, 2025
  • Python

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

  • UpdatedJan 19, 2025
  • Python

[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.

  • UpdatedJun 20, 2025
  • Python

Improve this page

Add a description, image, and links to thelarge-vision-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thelarge-vision-language-models topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp