RUCAIBox/RecBolePublic

NotificationsYou must be signed in to change notification settings
Fork675
Star3.8k

A unified, comprehensive and efficient recommendation library

License

MIT license

3.8k stars 675 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4,388 Commits
.github		.github
asset		asset
conda		conda
dataset/ml-100k		dataset/ml-100k
docs		docs
recbole		recbole
run_example		run_example
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_CN.md		README_CN.md
hyper.test		hyper.test
requirements.txt		requirements.txt
run_hyper.py		run_hyper.py
run_recbole.py		run_recbole.py
run_recbole_group.py		run_recbole_group.py
run_test.sh		run_test.sh
setup.py		setup.py
significance_test.py		significance_test.py
style.cfg		style.cfg

Repository files navigation

RecBole (伯乐)

“世有伯乐，然后有千里马。千里马常有，而伯乐不常有。”——韩愈《马说》

RecBole is developed based on Python and PyTorch for reproducing and developing recommendation algorithms in a unified,comprehensive and efficient framework for research purpose.Our library includes 94 recommendation algorithms, covering four major categories:

General Recommendation
Sequential Recommendation
Context-aware Recommendation
Knowledge-based Recommendation

We design a unified and flexible data file format, and provide the support for 44 benchmark recommendation datasets.A user can apply the provided script to process the original data copy, or simply download the processed datasetsby our team.

Figure: RecBole Overall Architecture

In order to support the study of recent advances in recommender systems, we construct an extended recommendation libraryRecBole2.0 consisting of 8 packages for up-to-date topics and architectures (e.g., debiased, fairness and GNNs).

Feature

General and extensible data structure. We design general and extensible data structures to unify the formatting andusage of various recommendation datasets.
Comprehensive benchmark models and datasets. We implement 94 commonly used recommendation algorithms, and providethe formatted copies of 44 recommendation datasets.
Efficient GPU-accelerated execution. We optimize the efficiency of our library with a number of improved techniquesoriented to the GPU environment.
Extensive and standard evaluation protocols. We support a series of widely adopted evaluation protocols or settingsfor testing and comparing recommendation algorithms.

RecBole News

02/23/2025: We release RecBolev1.2.1.

11/01/2023: We release RecBolev1.2.0.

11/06/2022: We releasethe optimal hyperparameters of the model and their tuning ranges.

10/05/2022: We release RecBolev1.1.1.

06/28/2022: We releaseRecBole2.0 with8 packages consisting of65 newly implement models.

02/25/2022: We release RecBolev1.0.1.

09/17/2021: We release RecBolev1.0.0.

03/22/2021: We release RecBolev0.2.1.

01/15/2021: We release RecBolev0.2.0.

12/10/2020: 我们发布了RecBole小白入门系列中文博客（持续更新中）。

12/06/2020: We release RecBolev0.1.2.

11/29/2020: We constructed preliminary experiments to test the time and memory cost on threedifferent-sized datasets and provided thetest resultfor reference.

11/03/2020: We release the first version of RecBolev0.1.1.

Latest Update for SIGIR 2023 Submission

To better meet the user requirements and contribute to the research community, we present a significant update of RecBole in the latest version, making it more user-friendly and easy-to-use as a comprehensive benchmark library for recommendation. We summarize these updates in "Towards a More User-Friendly and Easy-to-Use Benchmark Library for Recommender Systems" and submit the paper toSIGIR 2023. The main contribution in this update is introduced below.

Our extensions are made in three major aspects, namely the models/datasets, the framework, and the configurations. Furthermore, we provide more comprehensive documentation and well-organized FAQ for the usage of our library, which largely improves the user experience. More specifically, the highlights of this update are summarized as:

We introduce more operations and settings to help benchmarking the recommendation domain.
We improve the user friendliness of our library by providing more detailed documentation and well-organized frequently asked questions.
We point out several development guidelines for the open-source library developers.

These extensions make it much easier to reproduce the benchmark results and stay up-to-date with the recent advances on recommender systems. The datailed comparison between this update and previous versions is listed below.

Aspect	RecBole 1.0	RecBole 2.0	This update
Recommendation tasks	4 categories	3 topics and 5 packages	4 categories
Models and datasets	73 models and 28 datasets	65 models and 8 new datasets	94 models and 43 datasets
Data structure	Implemented Dataset and Dataloader	Task-oriented	Compatible data module inherited from PyTorch
Continuous features	Field embedding	Field embedding	Field embedding and discretization
GPU-accelerated execution	Single-GPU utilization	Single-GPU utilization	Multi-GPU and mixed precision training
Hyper-parameter tuning	Serial gradient search	Serial gradient search	Three search methods in both serial and parallel
Significance test	-	-	Available interface
Benchmark results	-	Partially public (GNN and CDR)	Benchmark configurations on 94 models
Friendly usage	Documentation	Documentation	Improved documentation and FAQ page

Installation

RecBole works with the following operating systems:

Linux
Windows 10
macOS X

RecBole requires Python version 3.7 or later.

RecBole requires torch version 1.7.0 or later. If you want to use RecBole with GPU,please ensure that CUDA or cudatoolkit version is 9.2 or later.This requires NVIDIA driver version >= 396.26 (for Linux) or >= 397.44 (for Windows10).

Install from conda

conda install -c aibox recbole

Install from pip

pip install recbole

Install from source

git clone https://github.com/RUCAIBox/RecBole.git&&cd RecBolepip install -e. --verbose

Quick-Start

With the source code, you can use the provided script for initial usage of our library:

python run_recbole.py

This script will run the BPR model on the ml-100k dataset.

Typically, this example takes less than one minute. We will obtain some output like:

INFO ml-100kThe number of users: 944Average actions of users: 106.04453870625663The number of items: 1683Average actions of items: 59.45303210463734The number of inters: 100000The sparsity of the dataset: 93.70575143257098%INFO Evaluation Settings:Group by user_idOrdering: {'strategy': 'shuffle'}Splitting: {'strategy': 'by_ratio', 'ratios': [0.8, 0.1, 0.1]}Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}INFO BPRMF(    (user_embedding): Embedding(944, 64)    (item_embedding): Embedding(1683, 64)    (loss): BPRLoss())Trainable parameters: 168128INFO epoch 0 training [time: 0.27s, train loss: 27.7231]INFO epoch 0 evaluating [time: 0.12s, valid_score: 0.021900]INFO valid result:recall@10: 0.0073  mrr@10: 0.0219  ndcg@10: 0.0093  hit@10: 0.0795  precision@10: 0.0088...INFO epoch 63 training [time: 0.19s, train loss: 4.7660]INFO epoch 63 evaluating [time: 0.08s, valid_score: 0.394500]INFO valid result:recall@10: 0.2156  mrr@10: 0.3945  ndcg@10: 0.2332  hit@10: 0.7593  precision@10: 0.1591INFO Finished training, best eval result in epoch 52INFO Loading model structure and parameters from saved/***.pthINFO best valid result:recall@10: 0.2169  mrr@10: 0.4005  ndcg@10: 0.235  hit@10: 0.7582  precision@10: 0.1598INFO test result:recall@10: 0.2368  mrr@10: 0.4519  ndcg@10: 0.2768  hit@10: 0.7614  precision@10: 0.1901

If you want to change the parameters, such aslearning_rate,embedding_size, just set the additional commandparameters as you need:

python run_recbole.py --learning_rate=0.0001 --embedding_size=128

If you want to change the models, just run the script by setting additional command parameters:

python run_recbole.py --model=[model_name]

Auto-tuning Hyperparameter

OpenRecBole/hyper.test and set several hyperparameters to auto-searching in parameter list. The following has two ways to search best hyperparameter:

loguniform: indicates that the parameters obey the uniform distribution, randomly taking values from e^{-8} to e^{0}.
choice: indicates that the parameter takes discrete values from the setting list.

Here is an example forhyper.test:

learning_rate loguniform -8, 0embedding_size choice [64, 96 , 128]train_batch_size choice [512, 1024, 2048]mlp_hidden_size choice ['[64, 64, 64]','[128, 128]']

Set training command parameters as you need to run:

python run_hyper.py --model=[model_name] --dataset=[data_name] --config_files=xxxx.yaml --params_file=hyper.teste.g.python run_hyper.py --model=BPR --dataset=ml-100k --config_files=test.yaml --params_file=hyper.test

Note that--config_files=test.yaml is optional, if you don't have any customize config settings, this parameter can be empty.

This processing maybe take a long time to output best hyperparameter and result:

running parameters:                                                                                                                    {'embedding_size': 64, 'learning_rate': 0.005947474154838498, 'mlp_hidden_size': '[64,64,64]', 'train_batch_size': 512}                  0%|                                                                                           | 0/18 [00:00<?, ?trial/s, best loss=?]

More information about parameter tuning can be found in ourdocs.

Time and Memory Costs

We constructed preliminary experiments to test the time and memory cost on three different-sized datasets(small, medium and large). For detailed information, you can click the following links.

NOTE: Our test results only gave the approximate time and memory cost of our implementations in the RecBole library(based on our machine server). Any feedback or suggestions about the implementations and test are welcome.We will keep improving our implementations, and update these test results.

RecBole Major Releases

Releases	Date
v1.2.1	02/23/2025
v1.2.0	11/01/2023
v1.1.1	10/05/2022
v1.0.0	09/17/2021
v0.2.0	01/15/2021
v0.1.1	11/03/2020

Open Source Contributions

As a one-stop framework from data processing, model development, algorithm training to scientific evaluation, RecBole has a total of11 related GitHub projects including

two versions of RecBole (RecBole 1.0 andRecBole 2.0);
8 benchmarking packages (RecBole-MetaRec,RecBole-DA,RecBole-Debias,RecBole-FairRec,RecBole-CDR,RecBole-TRM,RecBole-GNN andRecBole-PJF);
dataset repository (RecSysDatasets).

In the following table, we summarize the open source contributions of GitHub projects based on RecBole.

Projects	Stars	Forks	Issues	Pull requests
RecBole
RecBole2.0
RecBole-DA
RecBole-MetaRec
RecBole-Debias
RecBole-FairRec
RecBole-CDR
RecBole-GNN
RecBole-TRM
RecBole-PJF
RecSysDatasets

Contributing

Please let us know if you encounter a bug or have any suggestions byfiling an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions discussed in the issue tracker and going through PRs.

We thank the insightful suggestions from@tszumowski,@rowedenny,@deklanw et.al.

We thank the nice contributions through PRs from@rowedenny，@deklanw et.al.

Cite

If you find RecBole useful for your research or development, please cite the following papers:RecBole[1.0],RecBole[2.0] andRecBole[1.2.1].

@inproceedings{recbole[1.0],author    ={Wayne Xin Zhao and Shanlei Mu and Yupeng Hou and Zihan Lin and Yushuo Chen and Xingyu Pan and Kaiyuan Li and Yujie Lu and Hui Wang and Changxin Tian and Yingqian Min and Zhichao Feng and Xinyan Fan and Xu Chen and Pengfei Wang and Wendi Ji and Yaliang Li and Xiaoling Wang and Ji{-}Rong Wen},title     ={RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms},booktitle ={{CIKM}},pages     ={4653--4664},publisher ={{ACM}},year      ={2021}}@inproceedings{recbole[2.0],author    ={Wayne Xin Zhao and Yupeng Hou and Xingyu Pan and Chen Yang and Zeyu Zhang and Zihan Lin and Jingsen Zhang and Shuqing Bian and Jiakai Tang and Wenqi Sun and Yushuo Chen and Lanling Xu and Gaowei Zhang and Zhen Tian and Changxin Tian and Shanlei Mu and Xinyan Fan and Xu Chen and Ji{-}Rong Wen},title     ={RecBole 2.0: Towards a More Up-to-Date Recommendation Library},booktitle ={{CIKM}},pages     ={4722--4726},publisher ={{ACM}},year      ={2022}}@inproceedings{recbole[1.2.1],author    ={Lanling Xu and Zhen Tian and Gaowei Zhang and Junjie Zhang and Lei Wang and Bowen Zheng and Yifan Li and Jiakai Tang and Zeyu Zhang and Yupeng Hou and Xingyu Pan and Wayne Xin Zhao and Xu Chen and Ji{-}Rong Wen},title     ={Towards a More User-Friendly and Easy-to-Use Benchmark Library for Recommender Systems},booktitle ={{SIGIR}},pages     ={2837--2847},publisher ={{ACM}},year      ={2023}}

The Team

RecBole is developed byRUC, BUPT, ECNU, and maintained by RUC.

Here is the list of our lead developers in each development phase. They are the souls of RecBole and have made outstanding contributions.

Time	Version	Lead Developers	Paper
June 2020 ~ Nov. 2020	v0.1.1	Shanlei Mu (@ShanleiMu), Yupeng Hou (@hyp1231), Zihan Lin (@linzihan-backforward), Kaiyuan Li (@tsotfsk)	PDF
Nov. 2020 ~ Jul. 2022	v0.1.2 ~ v1.0.1	Yushuo Chen (@chenyushuo), Xingyu Pan (@2017pxy)	PDF
Jul. 2022 ~ Nov. 2023	v1.1.0 ~ v1.1.1	Lanling Xu (@Sherry-XLL), Zhen Tian (@chenyuwuxin), Gaowei Zhang (@Wicknight), Lei Wang (@Paitesanshi), Junjie Zhang (@leoleojie)	PDF
Nov. 2023 ~ Feb. 2025	v1.2.0	Bowen Zheng (@zhengbw0324), Chen Ma (@Yilu114)	PDF
Feb. 2025 ~ now	v1.2.1	Enze Liu (@BishopLiu), Kesha Ou (@TayTroye), Bingqian Li (@Fotiligner)	PDF