multimodal-deep-learning
Here are 469 public repositories matching this topic...
Language:All
Sort:Most stars
LAVIS - A One-stop Library for Language-Vision Intelligence
- Updated
Nov 18, 2024 - Jupyter Notebook
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
- Updated
Nov 17, 2024 - Jupyter Notebook
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
- Updated
Jul 12, 2025
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
- Updated
Nov 3, 2024 - Python
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
- Updated
Apr 4, 2025 - Python
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
- Updated
Apr 9, 2025 - C++
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
- Updated
Apr 25, 2024
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
- Updated
Jul 9, 2025 - Python
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
- Updated
Aug 19, 2022
awesome grounding: A curated list of research papers in visual grounding
- Updated
Apr 9, 2023
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
- Updated
Mar 15, 2023 - OpenEdge ABL
A collection of resources on applications of multi-modal learning in medical imaging.
- Updated
Jun 5, 2025
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
- Updated
Jun 4, 2024 - Jupyter Notebook
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
- Updated
May 19, 2025 - Python
Compose multimodal datasets 🎹
- Updated
Jun 10, 2025 - Python
Towards Generalist Biomedical AI
- Updated
Feb 17, 2024 - Python
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
- Updated
Sep 26, 2024
Reference mapping for single-cell genomics
- Updated
May 22, 2025 - Jupyter Notebook
Paper List of Pre-trained Foundation Recommender Models
- Updated
Aug 12, 2024
Deep learning based content moderation from text, audio, video & image input modalities.
- Updated
Jul 5, 2025
Improve this page
Add a description, image, and links to themultimodal-deep-learning topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themultimodal-deep-learning topic, visit your repo's landing page and select "manage topics."