multi-modal-learning
Here are 134 public repositories matching this topic...
Language:All
Sort:Most stars
An open source implementation of CLIP.
- Updated
Nov 4, 2025 - Python
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
- Updated
Aug 29, 2025 - Jupyter Notebook
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
- Updated
Jan 1, 2025 - Python
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
- Updated
Jan 17, 2024 - Python
A concise but complete implementation of CLIP with various experimental improvements from recent papers
- Updated
Oct 16, 2023 - Python
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
- Updated
Jul 6, 2023
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
- Updated
Jun 13, 2025 - Python
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
- Updated
Dec 10, 2024
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
- Updated
Jul 15, 2024 - Python
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
- Updated
Dec 14, 2025 - Python
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
- Updated
Mar 18, 2024 - Python
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
- Updated
Aug 2, 2020 - Python
The official repository of Achelous and Achelous++
- Updated
Jul 8, 2024 - Python
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
- Updated
Apr 3, 2024 - Python
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
- Updated
Jul 21, 2024 - Python
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
- Updated
Aug 21, 2024 - Python
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
- Updated
Mar 20, 2024 - Python
[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
- Updated
Jun 26, 2025 - Python
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
- Updated
Sep 1, 2025 - Python
Official PyTorch Code for Anchor Token Guided Prompt Learning Methods: [ICCV 2025] ATPrompt and [Arxiv 2511.21188] AnchorOPT
- Updated
Dec 17, 2025 - Python
Improve this page
Add a description, image, and links to themulti-modal-learning topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themulti-modal-learning topic, visit your repo's landing page and select "manage topics."