- Notifications
You must be signed in to change notification settings - Fork0
sktsherlock/MAGB
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MAGB: A Comprehensive Benchmark for Multimodal Attributed Graphs
In many real-world scenarios, graph nodes are associated with multimodal attributes, such as texts and images, resulting inMultimodal Attributed Graphs (MAGs).
MAGB first provide 5 dataset from E-Commerce and Social Networks. And we evaluate two major paradigms:GNN-as Predictor andVLM-as-Predictor . The datasets are publicly available:
🤗Hugging Face | 📑Paper
Multimodal attributed graphs (MAGs) incorporate multiple data types (e.g., text, images, numerical features) into graph structures, enabling more powerful learning and inference capabilities.
This benchmark provides:
✅Standardized datasets with multimodal attributes.
✅Feature extraction pipelines for different modalities.
✅Evaluation metrics to compare different models.
✅Baselines and benchmarks to accelerate research.
Ensure you have the required dependencies installed before running the benchmark.
# Clone the repositorygit clone https://github.com/sktsherlock/MAGB.gitcd MAGB# Install dependenciespip install -r requirements.txt
1. Download the datasets fromMAGB. 👐
cd Data/sudo apt-get update&& sudo apt-get install git-lfs&& git clone https://huggingface.co/datasets/Sherirto/MAGB.ls
Now, you can see theMovies,Toys,Grocery,Reddit-S andReddit-M under the''Data'' folder.
Each dataset consists of several parts shown in the image below, including:
- Graph Data (*.pt): Stores the graph structure, including adjacency information and node labels. It can be loaded using DGL.
- Node Textual Metadata (*.csv): Contains node textual descriptions, neighborhood relationships, and category labels.
- Text, Image, and Multimodal Features (TextFeature/, ImageFeature/, MMFeature/): Pre-extracted embeddings from the MAGB paper for different modalities.
- Raw Images (*.tar.gz): A compressed folder containing images named by node IDs. It needs to be extracted before use.
Because of the Reddit-M dataset is too large, you may need to follow the below scripts to unzip the dataset.
cd MAGB/Data/cat RedditMImages_parta RedditMImages_partb RedditMImages_partc> RedditMImages.tar.gztar -xvzf RedditMImages.tar.gz