multimodal
Here are 1,262 public repositories matching this topic...
Language:All
Sort:Most stars
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
- Updated
Jul 18, 2025 - JavaScript
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
- Updated
Aug 12, 2024 - Python
☁️ Build multimodal AI applications with cloud-native stack
- Updated
Mar 24, 2025 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
- Updated
Jul 3, 2025 - Python
Janus-Series: Unified Multimodal Understanding and Generation Models
- Updated
Feb 1, 2025 - Python
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
- Updated
Jul 16, 2025 - TypeScript
The Open-sourced Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.
- Updated
Jul 18, 2025 - TypeScript
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
- Updated
Jul 18, 2025 - Python
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
- Updated
Jul 18, 2025 - Rust
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025).
- Updated
Jul 18, 2025 - Python
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Jul 18, 2025 - Python
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
- Updated
Jul 17, 2025 - TypeScript
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
- Updated
Apr 22, 2024 - Python
notes for software engineers getting up to speed on new AI developments. Serves as datastore forhttps://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
- Updated
Jun 27, 2025 - HTML
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
- Updated
Apr 24, 2025 - Python
Solve Visual Understanding with Reinforced VLMs
- Updated
Jun 26, 2025 - Python
A visual playground for agentic workflows: Iterate over your agents 10x faster
- Updated
Jul 6, 2025 - TypeScript
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website:https://swarms.ai
- Updated
Jul 18, 2025 - Python
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
- Updated
Oct 29, 2024 - Python
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
- Updated
Jul 3, 2025 - Python
Improve this page
Add a description, image, and links to themultimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themultimodal topic, visit your repo's landing page and select "manage topics."