visual-instruction-tuning

Here are 15 public repositories matching this topic...

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

BradyFU /Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

multi-modality instruction-following in-context-learning large-language-models chain-of-thought instruction-tuning visual-instruction-tuning large-vision-language-model multimodal-instruction-tuning large-vision-language-models multimodal-large-language-models multimodal-in-context-learning multimodal-chain-of-thought

UpdatedNov 12, 2025

CircleRadon /Osprey

Star837

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm visual-instruction-tuning pixel-understanding

UpdatedAug 19, 2025
Python

ictnlp /LLaVA-Mini

Star537

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

video efficient vision llama multimodal large-language-models vision-language-model llava visual-instruction-tuning multimodal-large-language-models gpt4v large-multimodal-models gpt4o

UpdatedJun 29, 2025
Python

zjysteven /lmms-finetune

Star352

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning multimodal vision-language foundation-models instruction-tuning large-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next

UpdatedOct 28, 2025
Python

BAAI-DCAI /DataOptim

Star76

A collection of visual instruction tuning datasets.

llm mllm visual-instruction-tuning

UpdatedMar 14, 2024
Python

ChenDelong1999 /polite-flamingo

Star63

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

large-language-models visual-instruction-tuning multimodal-large-language-models

UpdatedDec 9, 2023
Python

fraction-ai /GAP

Star34

Gamified Adversarial Prompting (GAP): Crowdsourcing AI-weakness-targeting data through gamification. Boost model performance with community-driven, strategic data collection

ai computer-vision artificial-intelligence vqa web3 vqa-dataset llm visual-instruction-tuning

UpdatedOct 10, 2024
Python

bigai-nlco /VideoTGB

Star33

[EMNLP 2024] A Video Chat Agent with Temporal Prior

spatial-temporal video-language llm mllm visual-instruction-tuning multimodal-large-language-models

UpdatedMar 2, 2025
Python

hllj /Vistral-V

Star22

Vistral-V: Visual Instruction Tuning for Vistral - Vietnamese Large Vision-Language Model.

open-source vietnamese language-model vision-language-model visual-instruction-tuning vistral-v

UpdatedJul 1, 2024
Python

zjr2000 /REVERIE

Star19

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

dataset rationale vision-language visual-instruction-tuning multimodal-large-language-models

UpdatedJul 17, 2024
Python

mixpeek /awesome-multimodal-search

Star12

Collections of multimodal search libraries, service and research papers

UpdatedApr 18, 2025

jingyi0000 /Awesome-Visual-Instruction-Tuning

Star8

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

survey visual-instruction-tuning multi-modal-model multi-modal-language-model

UpdatedFeb 16, 2024

luxus180 /LLaVA-OneVision-1.5

Star3

🛠️ Build and train multimodal models easily with LLaVA-OneVision 1.5, an open framework designed for seamless integration of vision and language tasks.

finetuning multimodal vision-language foundation-models llm instruction-tuning mllm vision-language-model llava visual-instruction-tuning multimodal-large-language-models large-multimodal-models qwen-vl llava-next qwen3