NotificationsYou must be signed in to change notification settings
Fork0
Star9

Benchmarking for the attributed graphs

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,296 Commits
Data		Data
FeatureExtractor		FeatureExtractor
Figure		Figure
GNN		GNN
LM		LM
MLLM		MLLM
Utils		Utils
README.md		README.md
__init__.py		__init__.py
gen_shell_env.py		gen_shell_env.py
requirements.yaml		requirements.yaml

Repository files navigation

MAGB

MAGB: A Comprehensive Benchmark for Multimodal Attributed Graphs

In many real-world scenarios, graph nodes are associated with multimodal attributes, such as texts and images, resulting inMultimodal Attributed Graphs (MAGs).

MAGB first provide 5 dataset from E-Commerce and Social Networks. And we evaluate two major paradigms:GNN-as Predictor andVLM-as-Predictor . The datasets are publicly available:

🤗Hugging Face | 📑Paper

📖 Table of Contents

📖 Introduction

Multimodal attributed graphs (MAGs) incorporate multiple data types (e.g., text, images, numerical features) into graph structures, enabling more powerful learning and inference capabilities.
This benchmark provides:
✅Standardized datasets with multimodal attributes.
✅Feature extraction pipelines for different modalities.
✅Evaluation metrics to compare different models.
✅Baselines and benchmarks to accelerate research.

💻 Installation

Ensure you have the required dependencies installed before running the benchmark.

# Clone the repositorygit clone https://github.com/sktsherlock/MAGB.gitcd MAGB# Install dependenciespip install -r requirements.txt

🚀 Usage

1. Download the datasets fromMAGB. 👐

cd Data/sudo apt-get update&& sudo apt-get install git-lfs&& git clone https://huggingface.co/datasets/Sherirto/MAGB.ls

Now, you can see theMovies,Toys,Grocery,Reddit-S andReddit-M under the''Data'' folder.

Each dataset consists of several parts shown in the image below, including:

Graph Data (*.pt): Stores the graph structure, including adjacency information and node labels. It can be loaded using DGL.
Node Textual Metadata (*.csv): Contains node textual descriptions, neighborhood relationships, and category labels.
Text, Image, and Multimodal Features (TextFeature/, ImageFeature/, MMFeature/): Pre-extracted embeddings from the MAGB paper for different modalities.
Raw Images (*.tar.gz): A compressed folder containing images named by node IDs. It needs to be extracted before use.

Because of the Reddit-M dataset is too large, you may need to follow the below scripts to unzip the dataset.

cd MAGB/Data/cat RedditMImages_parta RedditMImages_partb RedditMImages_partc> RedditMImages.tar.gztar -xvzf RedditMImages.tar.gz

2. Experiments

In this section, we demonstrate the execution code for both GNN-as-Predictor and VLM-as-Predictor.

GNN-as-Predictor

🧩 Node Classification

In theGNN/Library directory, we provide the code for models evaluated in the paper, includingGCN, GraphSAGE, GAT, RevGAT,andMLP. Additionally, we have added graph learning models such asAPPNP,SGC,Node2Vec, andDeepWalk for your use. Below, we show the code for node classification usingGCN on the Movies dataset in two scenarios: 3-shot learning and supervised learning.

pythonGNN/Library/GCN.py--graph_pathData/Movies/MoviesGraph.pt--featureData/Movies/TextFeature/Movies_roberta_base_512_mean.npy--fewshots3

pythonGNN/Library/GCN.py--graph_pathData/Movies/MoviesGraph.pt--featureData/Movies/TextFeature/Movies_roberta_base_512_mean.npy--train_ratio0.6--val_ratio0.2

Note: The fileMovies_roberta_base_512_mean.npy contains the textual features of the Movies dataset extracted using the RoBERTa-Base model.512 indicates the maximum text length used, andmean indicates that mean pooling was applied to extract the features. You can use the features we provide or extract your own.

Similarly, you can replace GCN.py with the corresponding code for other models, such asGraphSAGE.py,GAT.py, etc. For all node classification training code, it is necessary to pass the graph data path and the corresponding feature file. Other basic parameters can be found in theGNN/Utils/model_config.py file.

Below are the key parameters related to model training, along with their default values and descriptions:

Parameter	Type	Default Value	Description
`--n-runs`	`int`	`3`	Number of runs for averaging results.
`--lr`	`float`	`0.005`	Learning rate for model optimization.
`--n-epochs`	`int`	`1000`	Total number of training epochs.
`--n-layers`	`int`	`3`	Number of layers in the model.
`--n-hidden`	`int`	`256`	Number of hidden units per layer.
`--dropout`	`float`	`0.5`	Dropout rate to prevent overfitting.
`--label-smoothing`	`float`	`0.1`	Smoothing factor for label smoothing to reduce overfitting.
`--train_ratio`	`float`	`0.6`	Proportion of the dataset used for training.
`--val_ratio`	`float`	`0.2`	Proportion of the dataset used for validation.
`--fewshots`	`int`	`None`	Number of samples for few-shot learning.
`--metric`	`str`	`'accuracy'`	Evaluation metric (e.g., accuracy, precision, recall, f1).
`--average`	`str`	`'macro'`	Averaging method (e.g., weighted, micro, macro).
`--graph_path`	`str`	`None`	Path to the graph dataset file (e.g.,`.pt` file).
`--feature`	`str`	`None`	Specifies the unimodal feature embedding to use as input.
`--undirected`	`bool`	`True`	Whether to treat the graph as undirected.
`--selfloop`	`bool`	`True`	Whether to add self-loops to the graph.

Note: Some models may have their own unique parameters, such as 'edge-drop' forRevGAT andGAT. For these parameters, please refer to the respective code for details.

🔗 Link Prediction

In theGNN/LinkPrediction directory, we provide the code for link prediction experiments using three backbone models:GCN,GraphSAGE, andMLP. Below, we demonstrate the code for running link prediction usingGCN on theMovies dataset. The parameters forGraphSAGE andMLP are similar, and you can replaceGCN.py withSAGE.py orMLP.py to run experiments with those models.

pythonGNN/LinkPrediction/GCN.py \--n-hidden256 \--n-layers3 \--n-runs5 \--lr0.001 \--neg_len5000 \--dropout0.2 \--batch_size2048 \--graph_pathData/Movies/MoviesGraph.pt \--featureData/Movies/TextFeature/Movies_Llama_3.2_1B_Instruct_512_mean.npy \--link_pathData/LinkPrediction/Movies/

Below are the unique parameters specifically used for link prediction tasks:

Parameter	Type	Default Value	Description
`--neg_len`	`int`	`5000`	Number of negative samples used for training.
`--batch_size`	`int`	`2048`	Batch size for training.
`--link_path`	`str`	`None`	Path to the directory containing link prediction data (e.g., positive and negative edges).

These parameters are critical for handling the unique requirements of link prediction tasks, such as generating and managing negative samples, processing large datasets efficiently, and specifying the location of link prediction data.

VLM-as-Predictor

TheMLLM/Zero-shot.py script is designed for zero-shot node classification tasks using multimodal large language models (MLLMs). Below are the key command-line arguments for this script:

Parameter	Type	Default Value	Description
`--model_name`	`str`	`'meta-llama/Llama-3.2-11B-Vision-Instruct'`	HuggingFace model name or path.
`--dataset_name`	`str`	`'Movies'`	Name of the dataset (corresponds to a subdirectory in the`Data` folder).
`--base_dir`	`str`	`Project root directory`	Path to the root directory of the project.
`--max_new_tokens`	`int`	`15`	Maximum number of tokens to generate.
`--neighbor_mode`	`str`	`'both'`	Mode for using neighbor information (`text`,`image`, or`both`).
`--use_center_text`	`str`	`'True'`	Whether to use the center node's text.
`--use_center_image`	`str`	`'True'`	Whether to use the center node's image.
`--add_CoT`	`str`	`'False'`	Whether to add Chain of Thought (CoT) reasoning.
`--num_samples`	`int`	`5`	Number of test samples to evaluate.
`--num_neighbours`	`int`	`0`	Number of neighbors to consider for each node.

Below, we present the code for performing zero-shot node classification on theMovies dataset using theLLaMA-3.2-11B Vision Instruct model with different strategies. This is provided to help researchers reproduce the experimental results presented in our paper.

$\text{Center-only}$

pythonMLLM/Zero-shot.py--model_namemeta-llama/Llama-3.2-11B-Vision-Instruct--num_samples300--max_new_tokens30--dataset_nameMoives

$\text{GRE-T}_{k=1}$

pythonMLLM/Zero-shot.py--model_namemeta-llama/Llama-3.2-11B-Vision-Instruct--num_neighbours1--neighbor_modetext--num_samples300--max_new_tokens30--dataset_nameMoives

$\text{GRE-V}_{k=1}$

pythonMLLM/Zero-shot.py--model_namemeta-llama/Llama-3.2-11B-Vision-Instruct--num_neighbours1--neighbor_modeimage--num_samples300--max_new_tokens30--dataset_nameMoives

$\text{GRE-M}_{k=1}$

pythonMLLM/Zero-shot.py--model_namemeta-llama/Llama-3.2-11B-Vision-Instruct--num_neighbours1--neighbor_modeboth--num_samples300--max_new_tokens30--dataset_nameMoives

Please note that both the VLMs and GNNs used the same original test set for the node classification task. However, for efficiency during VLM testing, we randomly selected 300 samples from this original test set.We observed that the experimental results obtained on this subset did not deviate significantly from those obtained on the complete test set.

🔧 Customizing`load_model_and_processor` for Unsupported VLMs

Theload_model_and_processor function inMLLM/Library.py is designed to load specific models and their corresponding processors from the Hugging Face library. If you want to use a model that is not currently supported, you can modify this function to include your custom model. Below is an example to guide you through the process.

Example: Adding Support for a Custom Model

Suppose you want to add support for a new model,custom-org/custom-model-7B, which uses theAutoModelForCausalLM class andAutoProcessor. Here's how you can modify theload_model_and_processor function:

Open theMLLM/Library.py file.
Locate themodel_mapping dictionary inside theload_model_and_processor function.
Add a new entry for your custom model.

Here is the modified code:

defload_model_and_processor(model_name:str):"""    Load the model and processor based on the Hugging Face model name.    """model_mapping= {"meta-llama/Llama-3.2-11B-Vision-Instruct": {"model_cls":MllamaForConditionalGeneration,"processor_cls":AutoProcessor,        },"custom-org/custom-model-7B": {# Add your custom model here"model_cls":AutoModelForCausalLM,# Replace with the correct model class"processor_cls":AutoProcessor,# Replace with the correct processor class        },# Other existing models...    }# Other existing codes...returnmodel,processor

🤝 Contributing

We welcome contributions toMAGB. To contribute:

Fork the repository.
Create a new branch for your feature or bug fix.
Submit a pull request with a detailed description of your changes.

For major changes, please open an issue first to discuss what you would like to change.

📚 Citation

If you use MAGB in your research, please cite our paper:

@misc{yan2025graphmeetsmultimodalbenchmarking,title={When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning},author={Hao Yan and Chaozhuo Li and Jun Yin and Zhigang Yu and Weihao Han and Mingzheng Li and Zhengxin Zeng and Hao Sun and Senzhang Wang},year={2025},eprint={2410.09132},archivePrefix={arXiv},url={https://arxiv.org/abs/2410.09132},}

About

Benchmarking for the attributed graphs

Releases1

v1.0.0 Latest

May 30, 2025

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MAGB

📖 Table of Contents

📖 Introduction

💻 Installation

🚀 Usage

1. Download the datasets fromMAGB. 👐

2. Experiments

GNN-as-Predictor

🧩 Node Classification

🔗 Link Prediction

VLM-as-Predictor

🔧 Customizing`load_model_and_processor` for Unsupported VLMs

Example: Adding Support for a Custom Model

🤝 Contributing

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Uh oh!

Languages

Movatterモバイル変換

sktsherlock/MAGB

Folders and files

Latest commit

History

Repository files navigation

MAGB

📖 Table of Contents

📖 Introduction

💻 Installation

🚀 Usage

1. Download the datasets fromMAGB. 👐

2. Experiments

GNN-as-Predictor

🧩 Node Classification

🔗 Link Prediction

VLM-as-Predictor

🔧 Customizingload_model_and_processor for Unsupported VLMs

Example: Adding Support for a Custom Model

🤝 Contributing

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Uh oh!

Languages

🔧 Customizing`load_model_and_processor` for Unsupported VLMs

Packages