llava

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

UpdatedAug 5, 2025
Python

jhc13 /taggui

Star1.3k

Tag manager and captioner for image datasets

image-captioning image-tagging tag-manager pyside6 stable-diffusion llava cogvlm florence-2

UpdatedOct 11, 2025
Python

unum-cloud /UForm

Star1.2k

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

neural-network clustering multi-lingual pytorch transformer openai image-search pretrained-models representation-learning clip semantic-search bert multimodal vector-search language-vision contrastive-learning huggingface-transformers cross-attention openclip llava

UpdatedOct 30, 2025
Python

gokayfem /awesome-vlm-architectures

Star1.2k

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

UpdatedJan 11, 2026
Markdown

TinyLLaVA /TinyLLaVA_Factory

Star962

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

UpdatedFeb 7, 2026
Python

NVlabs /Eagle

Star929

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

UpdatedOct 25, 2025
Python

mbzuai-oryx /LLaVA-pp

Star848

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversation lmms vision-language llm llava llama3 phi3 llava-llama3 llava-phi3 llama3-llava phi3-llava llama-3-vision phi3-vision llama-3-llava phi-3-llava llama3-vision phi-3-vision

UpdatedAug 5, 2025
Python

PsyChip /machina

Star784

OpenCV+YOLO+LLAVA powered video surveillance system

python opencv camera rtsp yolo llava ollama-api

UpdatedOct 21, 2025
Python

EvolvingLMMs-Lab /LLaVA-OneVision-1.5

Star747

Fully Open Framework for Democratized Multimodal Training

llm mllm vision-language-model llava qwen3

UpdatedDec 27, 2025
Python

PaddlePaddle /PaddleMIX

Star714

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

image-to-text clip text-to-image dit multimodal sora text-to-video aigc stable-diffusion controlnet llava sd-xl ppdiffusers eva-clip stablevideodiffusion minicpm-v internvl2 qwen2-vl got-ocr20 deepseek-vl

UpdatedFeb 3, 2026
Python

SkalskiP /awesome-foundation-and-multimodal-models

Star637

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

UpdatedFeb 29, 2024
Python

ictnlp /LLaVA-Mini

Star562

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

video efficient vision llama multimodal large-language-models vision-language-model llava visual-instruction-tuning multimodal-large-language-models gpt4v large-multimodal-models gpt4o

UpdatedJun 29, 2025
Python

Improve this page

Add a description, image, and links to thellava topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellava topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llava

Here are 249 public repositories matching this topic...

haotian-liu /LLaVA

Fanghua-Yu /SUPIR

open-compass /VLMEvalKit

yuanzhoulvpi2017 /zero_nlp

SciSharp /LLamaSharp

om-ai-lab /OmAgent

Blaizzy /mlx-vlm

chenking2020 /FindTheChatGPTer

mbzuai-oryx /Video-ChatGPT