Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

multi-modal-learning

Here are 134 public repositories matching this topic...

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

  • UpdatedJan 1, 2025
  • Python

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

  • UpdatedJan 17, 2024
  • Python

A concise but complete implementation of CLIP with various experimental improvements from recent papers

  • UpdatedOct 16, 2023
  • Python

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

  • UpdatedJul 6, 2023

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

  • UpdatedJun 13, 2025
  • Python

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

  • UpdatedJul 15, 2024
  • Python

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

  • UpdatedDec 14, 2025
  • Python

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

  • UpdatedMar 18, 2024
  • Python

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

  • UpdatedApr 3, 2024
  • Python

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

  • UpdatedJul 21, 2024
  • Python

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

  • UpdatedAug 21, 2024
  • Python

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

  • UpdatedMar 20, 2024
  • Python

[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

  • UpdatedJun 26, 2025
  • Python

[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations

  • UpdatedSep 1, 2025
  • Python

Official PyTorch Code for Anchor Token Guided Prompt Learning Methods: [ICCV 2025] ATPrompt and [Arxiv 2511.21188] AnchorOPT

  • UpdatedDec 17, 2025
  • Python

Improve this page

Add a description, image, and links to themulti-modal-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themulti-modal-learning topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp