vlm
Here are 759 public repositories matching this topic...
Language:All
Sort:Most stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- Updated
Feb 20, 2026 - Python
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
- Updated
Jan 14, 2026 - TypeScript
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.
- Updated
Feb 19, 2026 - Jupyter Notebook
Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!
- Updated
Nov 25, 2025 - Python
Production ready toolkit to run AI locally
- Updated
Feb 20, 2026 - C++
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.
- Updated
Feb 20, 2026 - Kotlin
Solve Visual Understanding with Reinforced VLMs
- Updated
Oct 21, 2025 - Python
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
- Updated
Feb 20, 2026 - Python
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.
- Updated
Nov 7, 2025 - Python
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
- Updated
Feb 20, 2026 - Python
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
- Updated
Feb 18, 2026 - Python
PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.
- Updated
Jan 26, 2026 - Python
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
- Updated
Jul 7, 2025 - Python
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
- Updated
Dec 15, 2025 - Python
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
- Updated
Oct 21, 2024 - Python
[EMNLP-2024] Build multimodal language agents for fast prototype and production
- Updated
Mar 19, 2025 - Python
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- Updated
Feb 20, 2026 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
- Updated
Nov 7, 2024 - Python
Improve this page
Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."