Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

large-multimodal-models

Here are 78 public repositories matching this topic...

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

  • UpdatedMar 28, 2025
  • Python
OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

  • UpdatedMar 16, 2025
  • Python

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

  • UpdatedJun 26, 2025
  • Python

[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"

  • UpdatedOct 9, 2024
  • Python

A Framework of Small-scale Large Multimodal Models

  • UpdatedApr 26, 2025
  • Python

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

  • UpdatedFeb 1, 2024
  • Python

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

  • UpdatedJun 29, 2025
  • Python

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

  • UpdatedAug 24, 2024
  • Python

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

  • UpdatedJun 17, 2025
  • Python

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

  • UpdatedOct 28, 2025
  • Python

Open Platform for Embodied Agents

  • UpdatedJan 12, 2025
  • Python

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

  • UpdatedJul 1, 2024
  • Python

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

  • UpdatedMay 5, 2025
  • Python

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

  • UpdatedSep 26, 2024
  • Python

Improve this page

Add a description, image, and links to thelarge-multimodal-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thelarge-multimodal-models topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp