cross-modal

Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.

nlp machine-learning embeddings image-classification cross-modal audio-classification video-tagging

UpdatedFeb 9, 2024
Jupyter Notebook

yisun98 /SOLC

Star260

Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类

pytorch remote-sensing segmentation cross-modal multi-modal multi-source deeplabv3 land-use-classification oa-kappa sar-optical

UpdatedMay 6, 2024
Python

QiangChunyu /SecoustiCodec

Star217

Ultra-low bitrate speech codec (0.27-1 kbps) with cross-modal alignment and real-time capabilities

semantic speech vae codec cross-modal speaker fsq speech-codec speech-representation contrastive-learning single-codebook

UpdatedAug 27, 2025
Python

JizhiziLi /RIM

Star208

[CVPR 2023] Referring Image Matting

image-segmentation cross-modal matting multimodal image-matting

UpdatedApr 17, 2023

DRSY /MoTIS

Star126

[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

naacl ai retrieval lsh ios-swift image-search k-means cross-modal clip knn semantic-search knowledge-distillation k-means-clustering random-projection vector-search

UpdatedMay 11, 2023
Swift

QizhiPei /BioT5

Star123

BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)

nlp machine-learning bioinformatics computational-biology cross-modal nlp-applications

UpdatedSep 14, 2024
Python

qcraftai /distill-bev

Star110

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

point-cloud lidar cross-modal autonomous-driving multi-modal knowledge-distillation self-driving bev distillation multi-camera 3d-object-detection nuscenes

UpdatedNov 24, 2023
Python

Zengyi-Qin /Weakly-Supervised-3D-Object-Detection

Star108

Weakly Supervised 3D Object Detection from Point Clouds (VS3D), ACM MM 2020

tensorflow point-cloud lidar stereo transfer-learning cross-modal unsupervised-learning object-proposals kitti monocular 3d-object-detection weakly-supervised-detection ws3d vs3d acm-mm-2020 unsupervised-object-detection

UpdatedMar 24, 2023
Jupyter Notebook

yangli18 /VLTVG

Star96

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

cross-modal visual-grounding vision-language visual-linguistic

UpdatedDec 2, 2022
Python

marslanm /Multimodality-Representation-Learning

Star84

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just acceptedhttps://dl.acm.org/doi/abs/10.1145/3617833 .

cross-modal multimodal-deep-learning multimodal-datasets transformer-models multimodal-pre-trained-model vision-language-pretraining multimodal-applications multimodal-pretext

UpdatedJun 16, 2025

rohitrango /objects-that-sound

Star83

Unofficial Implementation of Google Deepmind's paper `Objects that Sound`

machine-learning deep-neural-networks deep-learning embeddings deeplearning deepmind cross-modal audio-video audioset

UpdatedMay 7, 2018
Python

kywen1119 /DSRAN

Star74

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

computer-vision pytorch cross-modal tcsvt image-text-matching

UpdatedOct 25, 2022
Python

Paranioar /UniPT

Star69

[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"

transfer-learning cross-modal parameter-efficient-learning parameter-efficient-tuning memory-efficient-tuning memory-efficient-learning parameter-efficient-fine-tuning

UpdatedOct 15, 2024
Python

Eaphan /UPIDet

Star65

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]

cross-modal multi-modal 3d-object-detection

UpdatedJun 4, 2024
Python

GT-RIPL /Xmodal-Ctx

Star61

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

image-captioning cross-modal clip vision-and-language

UpdatedOct 21, 2022
Python

Improve this page

Add a description, image, and links to thecross-modal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thecross-modal topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly