vlm

Star

Here are 259 public repositories matching this topic...

Language:All

Filter by language

All259 Python148 Jupyter Notebook48 TypeScript7 JavaScript6 C++5 Java3 MATLAB3 HTML2 Julia2 Vue2

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

sgl-project /sglang

Star12.1k

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava deepseek-llm deepseek llama3 llama3-1 deepseek-v3 deepseek-r1 deepseek-r1-zero

UpdatedMar 18, 2025
Python

This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.

machine-learning tutorial deep-neural-networks computer-vision deep-learning pytorch image-classification object-detection image-segmentation vlm google-colab zero-shot-detection yolov5 zero-shot-classification yolov8 open-vocabulary-detection open-vocabulary-segmentation automatic-labeling-system qwen paligemma

UpdatedMar 12, 2025
Jupyter Notebook

CVHub520 /X-AnyLabeling

Sponsor

Star5.1k

Effortless data labeling with AI support from Segment Anything and other awesome models.

deep-learning sam pytorch yolo classification resnet deeplearning object-detection image-segmentation clip annotation-tool paddle pose-estimation depth-estimation matting vlm labeling-tool onnx llm grounding-dino

UpdatedFeb 26, 2025
Python

NexaAI /nexa-sdk

Star4.4k

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

audio sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

UpdatedMar 6, 2025
Python

om-ai-lab /VLM-R1

Star4.2k

Solve Visual Understanding with Reinforced VLMs

vlm multimodal llm qwen deepseek-r1 grpo vlm-r1

UpdatedMar 18, 2025
Python

bytedance /UI-TARS-desktop

Star3.1k

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

electron agent vision vlm vite gui-agents computer-use browser-use

UpdatedMar 18, 2025
TypeScript

MiniMax-AI /MiniMax-01

Star2.3k

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

UpdatedMar 18, 2025
Python

om-ai-lab /OmAgent

Star2.3k

Build multimodal language agents for fast prototype and production

python agent workflow chatbot gemini openai llama gpt gradio vlm multimodal vision-and-language rag gpt4 large-language-models llm llava smart-hardware language-agent multimodal-agent

UpdatedMar 18, 2025
Python

QiuYannnn /Local-File-Organizer

Star2.1k

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

UpdatedOct 21, 2024
Python

BAAI-Agents /Cradle

Star2k

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

UpdatedNov 7, 2024
Python

xlang-ai /OSWorld

Star1.7k

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

UpdatedMar 6, 2025
Python

heshengtao /comfyui_LLM_party

Star1.5k

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG

macos linux agent flux workflow ocr mcp gemini openai llama vlm dify o1 comfyui ollama gguf gpt-sovits graphrag omost janus-pro

UpdatedMar 14, 2025
Python

coderonion /awesome-yolo-object-detection

Star1.4k

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

gui cuda yolo awesome-list llama object-detection datasets autonomous-driving vlm tensorrt snn spiking-neural-network yolov5 ultralytics rknn llm yolov8 deepseek yolov12

UpdatedMar 13, 2025

ThuCCSLab /Awesome-LM-SSP

Star1.3k

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

nlp security privacy jailbreak safety awesome-list language-model vlm adversarial-attacks diffusion-models llm

UpdatedMar 17, 2025

BAAI-DCAI /Bunny

Star1k

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

UpdatedNov 18, 2024
Python

THUDM /CogAgent

Star834

An open-sourced end-to-end VLM-based GUI Agent

agent glm vlm computer-use gui-agent

UpdatedFeb 19, 2025
Python

peterdsharpe /AeroSandbox

Sponsor

Star816

Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

python analysis simulation optimization aerospace automatic-differentiation airplane cfd aircraft aerodynamics vlm xfoil aerospace-engineering aircraft-design mdo mdao aerodynamic-analysis 3d-panel

UpdatedFeb 17, 2025
Jupyter Notebook

gokayfem /awesome-vlm-architectures

Star721

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

UpdatedFeb 24, 2025
Markdown

zubair-irshad /Awesome-Robotics-3D

Star660

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

computer-vision robotics navigation benchmarks simulations manipulation scene-graph grasping nerf 3d pointclouds vlm diffusion-models pretraining policy-learning foundation-models llm vision-language-model gaussian-splatting

UpdatedNov 4, 2024

coderonion /awesome-llm-and-aigc

Star628

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

computer-vision cuda openai yolo triton awesome-list llama gpt datasets vla vlm sora hugging-face aigc large-language-models llm chatgpt langchain qwen deepseek

UpdatedMar 16, 2025

Improve this page

Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm