vlm
Here are 259 public repositories matching this topic...
Language:All
Sort:Most stars
SGLang is a fast serving framework for large language models and vision language models.
- Updated
Mar 18, 2025 - Python
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
- Updated
Mar 12, 2025 - Jupyter Notebook
Effortless data labeling with AI support from Segment Anything and other awesome models.
- Updated
Feb 26, 2025 - Python
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
- Updated
Mar 6, 2025 - Python
Solve Visual Understanding with Reinforced VLMs
- Updated
Mar 18, 2025 - Python
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
- Updated
Mar 18, 2025 - TypeScript
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
- Updated
Mar 18, 2025 - Python
Build multimodal language agents for fast prototype and production
- Updated
Mar 18, 2025 - Python
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
- Updated
Oct 21, 2024 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
- Updated
Nov 7, 2024 - Python
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
- Updated
Mar 6, 2025 - Python
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
- Updated
Mar 14, 2025 - Python
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
- Updated
Mar 13, 2025
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
- Updated
Mar 17, 2025
An open-sourced end-to-end VLM-based GUI Agent
- Updated
Feb 19, 2025 - Python
Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
- Updated
Feb 17, 2025 - Jupyter Notebook
Famous Vision Language Models and Their Architectures
- Updated
Feb 24, 2025 - Markdown
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
- Updated
Nov 4, 2024
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.
- Updated
Mar 16, 2025
Improve this page
Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."