multi-modality
Here are 85 public repositories matching this topic...
Language:All
Sort:Most stars
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
- Updated
Aug 12, 2024 - Python
✨✨Latest Advances on Multimodal Large Language Models
- Updated
Mar 21, 2025
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
- Updated
Jan 23, 2024 - Python
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website:https://swarms.ai
- Updated
Mar 18, 2025 - Python
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created byhttps://twitter.com/advadnoun
- Updated
Mar 13, 2022 - Python
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
- Updated
Mar 5, 2024 - Python
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
- Updated
Jan 22, 2025 - Python
Algorithms and Publications on 3D Object Tracking
- Updated
May 16, 2024 - C++
Parsing-free RAG supported by VLMs
- Updated
Feb 19, 2025 - Python
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
- Updated
Apr 21, 2024 - Python
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
- Updated
Mar 17, 2025 - Python
[CVPR 2023] Collaborative Diffusion
- Updated
Nov 28, 2023 - Python
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
- Updated
Jun 5, 2024 - Python
An open-source implementation for training LLaVA-NeXT.
- Updated
Oct 23, 2024 - Python
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
- Updated
Jun 4, 2024 - Python
[CVPR 2025] MINIMA: Modality Invariant Image Matching
- Updated
Mar 14, 2025 - Python
An official PyTorch implementation of the CRIS paper
- Updated
Jun 9, 2024 - Python
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
- Updated
Sep 11, 2024 - Python
Official repository for VisionZip (CVPR 2025)
- Updated
Feb 27, 2025 - Python
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
- Updated
Dec 7, 2019 - Python
Improve this page
Add a description, image, and links to themulti-modality topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themulti-modality topic, visit your repo's landing page and select "manage topics."