multimodality
Here are 176 public repositories matching this topic...
Language:All
Sort:Most stars
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created byhttps://twitter.com/advadnoun
- Updated
Feb 6, 2022 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
- Updated
Nov 7, 2024 - Python
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
- Updated
Aug 20, 2024
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
- Updated
Sep 22, 2025 - Python
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
- Updated
Apr 12, 2024 - Python
A Comparative Framework for Multimodal Recommender Systems
- Updated
Oct 25, 2025 - Python
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
- Updated
Nov 21, 2023 - TeX
Automated modeling and machine learning framework FEDOT
- Updated
Oct 14, 2025 - Python
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
- Updated
Dec 23, 2024 - Python
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
- Updated
Jul 1, 2025 - Python
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
- Updated
Jun 3, 2025 - Python
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
- Updated
Apr 4, 2025 - HTML
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
- Updated
May 19, 2025 - Python
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
- Updated
Nov 25, 2022 - Python
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
- Updated
Feb 8, 2022 - Python
Towards Generalist Biomedical AI
- Updated
Feb 17, 2024 - Python
A knowledge base construction engine for richly formatted data
- Updated
Jun 23, 2021 - Python
Sequence-to-Sequence Framework in PyTorch
- Updated
Jan 5, 2023 - Jupyter Notebook
DANCE: a deep learning library and benchmark platform for single-cell analysis
- Updated
Oct 20, 2025 - Python
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
- Updated
Jul 11, 2025 - Python
Improve this page
Add a description, image, and links to themultimodality topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themultimodality topic, visit your repo's landing page and select "manage topics."