Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

vlm

Here are 759 public repositories matching this topic...

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

  • UpdatedFeb 20, 2026
  • Python

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

  • UpdatedJan 14, 2026
  • TypeScript

SGLang is a high-performance serving framework for large language models and multimodal models.

  • UpdatedFeb 20, 2026
  • Python

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.

  • UpdatedFeb 19, 2026
  • Jupyter Notebook
runanywhere-sdks

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.

  • UpdatedFeb 20, 2026
  • Kotlin

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

  • UpdatedJan 4, 2026
  • Python

Solve Visual Understanding with Reinforced VLMs

  • UpdatedOct 21, 2025
  • Python
UltraRAG

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

  • UpdatedFeb 20, 2026
  • Python

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.

  • UpdatedNov 7, 2025
  • Python

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

  • UpdatedFeb 20, 2026
  • Python

🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )

  • UpdatedFeb 18, 2026
  • Python

PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.

  • UpdatedJan 26, 2026
  • Python

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

  • UpdatedJul 7, 2025
  • Python

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.

  • UpdatedDec 15, 2025
  • Python

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

  • UpdatedOct 21, 2024
  • Python

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

  • UpdatedFeb 20, 2026
  • Python

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

  • UpdatedNov 7, 2024
  • Python

Improve this page

Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp