Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

visual-instruction-tuning

Here are 15 public repositories matching this topic...

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

  • UpdatedAug 19, 2025
  • Python

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

  • UpdatedJun 29, 2025
  • Python

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

  • UpdatedOct 28, 2025
  • Python

A collection of visual instruction tuning datasets.

  • UpdatedMar 14, 2024
  • Python

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

  • UpdatedDec 9, 2023
  • Python

Gamified Adversarial Prompting (GAP): Crowdsourcing AI-weakness-targeting data through gamification. Boost model performance with community-driven, strategic data collection

  • UpdatedOct 10, 2024
  • Python

[EMNLP 2024] A Video Chat Agent with Temporal Prior

  • UpdatedMar 2, 2025
  • Python

Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model.

  • UpdatedJul 1, 2024
  • Python

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

  • UpdatedJul 17, 2024
  • Python

Collections of multimodal search libraries, service and research papers

  • UpdatedApr 18, 2025

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

  • UpdatedFeb 16, 2024

🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open framework designed for seamless integration of vision and language tasks.

  • UpdatedNov 12, 2025
  • Python

Mistral assisted visual instruction data generation by following LLaVA

  • UpdatedFeb 16, 2025
  • Python

Improve this page

Add a description, image, and links to thevisual-instruction-tuning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevisual-instruction-tuning topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp