vlm

Star

Here are 759 public repositories matching this topic...

Language:All

Filter by language

All759 Python452 Jupyter Notebook115 TypeScript31 C++28 JavaScript27 HTML11 Rust5 MATLAB4 Swift4 Go3

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

huggingface /transformers

Star157k

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio python nlp machine-learning natural-language-processing deep-learning pytorch transformer speech-recognition glm pretrained-models hacktoberfest gemma vlm pytorch-transformers model-hub llm qwen deepseek

UpdatedFeb 20, 2026
Python

bytedance /UI-TARS-desktop

Star28.1k

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

agent mcp vision vlm tars multimodal cowork computer-use mcp-server gui-agent browser-use gui-operator ui-tars agent-tars

UpdatedJan 14, 2026
TypeScript

sgl-project /sglang

Star23.6k

SGLang is a high-performance serving framework for large language models and multimodal models.

reinforcement-learning cuda inference transformer moe attention llama glm minimax wan diffusion vlm blackwell llm qwen deepseek gpt-oss qwen-image

UpdatedFeb 20, 2026
Python

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.

machine-learning tutorial deep-neural-networks computer-vision deep-learning pytorch image-classification object-detection image-segmentation vlm google-colab zero-shot-detection yolov5 zero-shot-classification yolov8 open-vocabulary-detection open-vocabulary-segmentation automatic-labeling-system qwen paligemma

UpdatedFeb 19, 2026
Jupyter Notebook

yzhao062 /anomaly-detection-resources

Sponsor

Star9.2k

Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!

machine-learning data-mining awesome awesome-list outlier-detection unsupervised-learning fraud-detection time-series-analysis vlm anomaly-detection fraud outlier outlier-ensembles graph-neural-networks large-language-models llm vlms

UpdatedNov 25, 2025
Python

RunanywhereAI /runanywhere-sdks

Star8.9k

Production ready toolkit to run AI locally

android kotlin swift ios react-native web cpp inference edge flutter websdk vlm multimodal diffusion-models on-device-ai voice-ai llm llamacpp ollama apple-intelligence

UpdatedFeb 20, 2026
C++

NexaAI /nexa-sdk

Star7.7k

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.

go sdk llama vlm on-device-ai llm stable-diffusion llama3 phi3 gemma3 qwen3 gpt-oss granite4 qwen3vl

UpdatedFeb 20, 2026
Kotlin

PaddlePaddle /ERNIE

Star7.7k

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

vlm ernie llm erniekit ernie-45 ernie-45-vl

UpdatedJan 4, 2026
Python

om-ai-lab /VLM-R1

Star5.8k

Solve Visual Understanding with Reinforced VLMs

reinforcement-learning vlm multimodal llm qwen deepseek-r1 grpo r1-zero vlm-r1 multimodal-r1

UpdatedOct 21, 2025
Python

OpenBMB /UltraRAG

Star5.2k

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

flask demo ui mcp openai easy gpt embedding vlm multimodal rag sentence-transformers huggingface-transformers llm vllm qwen deepseek

UpdatedFeb 20, 2026
Python

joanrod /star-vector

Star4.2k

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.

svg vlm llm multimodal-large-language-models

UpdatedNov 7, 2025
Python

EvolvingLMMs-Lab /lmms-eval

Star3.7k

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

benchmark evaluation agi video-understanding vlm multimodal large-language-models vision-language-model llm-evaluation audio-evaluation multimodal-evaluation

UpdatedFeb 20, 2026
Python

changyeyu /LLM-RL-Visualized

Star3.6k

🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献！💥（100+ LLM/RL Algorithm Maps ）

machine-learning natural-language-processing algorithm reinforcement-learning ai deep-learning transformers nlp-machine-learning vlm llm

UpdatedFeb 18, 2026
Python

Hunyuan-PromptEnhancer /PromptEnhancer

Star3.4k

PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.

prompt image-editing text-to-image image-to-image vlm prompt-engineering hunyuan prompt-enhancer hunyuan-image

UpdatedJan 26, 2026
Python

MiniMax-AI /MiniMax-01

Star3.3k

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

UpdatedJul 7, 2025
Python

SkyworkAI /Skywork-R1V

Star3.2k

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.

reinforcement-learning reasoning vlm llm multimodal-understanding deepseek-r1 grpo vlm-r1 multimodal-r1 r1v skywork-r1v

UpdatedDec 15, 2025
Python

QiuYannnn /Local-File-Organizer

Star3.1k

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

UpdatedOct 21, 2024
Python

om-ai-lab /OmAgent

Star2.6k

[EMNLP-2024] Build multimodal language agents for fast prototype and production

python agent workflow chatbot gemini openai llama gpt gradio vlm multimodal vision-and-language rag gpt4 large-language-models llm llava smart-hardware language-agent multimodal-agent

UpdatedMar 19, 2025
Python

xlang-ai /OSWorld

Star2.6k

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

UpdatedFeb 20, 2026
Python

BAAI-Agents /Cradle

Star2.5k

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

UpdatedNov 7, 2024
Python

Improve this page

Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm