Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

vlm

Here are 259 public repositories matching this topic...

SGLang is a fast serving framework for large language models and vision language models.

  • UpdatedMar 18, 2025
  • Python

This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.

  • UpdatedMar 12, 2025
  • Jupyter Notebook

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

  • UpdatedMar 6, 2025
  • Python

Solve Visual Understanding with Reinforced VLMs

  • UpdatedMar 18, 2025
  • Python

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

  • UpdatedMar 18, 2025
  • TypeScript

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

  • UpdatedMar 18, 2025
  • Python

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

  • UpdatedOct 21, 2024
  • Python

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

  • UpdatedNov 7, 2024
  • Python

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

  • UpdatedMar 6, 2025
  • Python

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG

  • UpdatedMar 14, 2025
  • Python

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

  • UpdatedMar 13, 2025

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

  • UpdatedMar 17, 2025

A family of lightweight multimodal models.

  • UpdatedNov 18, 2024
  • Python

An open-sourced end-to-end VLM-based GUI Agent

  • UpdatedFeb 19, 2025
  • Python
AeroSandbox

Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

  • UpdatedFeb 17, 2025
  • Jupyter Notebook

Famous Vision Language Models and Their Architectures

  • UpdatedFeb 24, 2025
  • Markdown
Awesome-Robotics-3D

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

  • UpdatedNov 4, 2024

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

  • UpdatedMar 16, 2025

Improve this page

Add a description, image, and links to thevlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thevlm topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp