RecList/reclistPublic

NotificationsYou must be signed in to change notification settings
Fork25
Star468

Behavioral "black-box" testing for recommender systems

License

MIT license

468 stars 25 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github		.github
examples		examples
images		images
reclist		reclist
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

RecList

Overview

RecList is an open source library providing behavioral, "black-box" testing for recommender systems. Inspired by the pioneering work ofRibeiro et al. 2020 in NLP, we introduce a general plug-and-play procedure to scale up behavioral testing, with an easy-to-extend interface for custom use cases.

While quantitative metrics over held-out data points are important, a lot more tests are needed for recommendersto properly function in the wild and not erode our confidence in them: for example, a model may boast an accuracy improvement over the entire dataset, but actually be significantly worse than another on rare items or new users; or again, a model that correctly recommends HDMI cables as add-on for shoppers buying a TV, may also wrongly recommend TVs to shoppers just buying a cable.

RecList goal is to operationalize these important intuitions into a practical package for testing research and production models in a more nuanced way, withoutrequiring unnecessary custom code and ad hoc procedures.

If you are not familiar with the library, we suggest first taking our small tour to get acquainted with the main abstractions through ready-made models and tests.

Colab Tutorials

Name	Link
Tutorial 101 - Introduction to Reclist
Tutorial - RecList at EvalRS2023 (KDD)
Tutorial - FashionCLIP Evaluation with RecList

Quick Links

The originalpaper (arxiv), initialrelease post andbeta announcement.
EvalRS22@CIKM andEvalRS23@KDD , for music recommendations with RecList.
Acolab notebook, for a quick interactive tour.
Ourwebsite for past talks and presentations.

Status

RecList is free software released under the MIT license, and it has been adopted by popularopen-source data challenges.
After a major API re-factoring,RecList is now inbeta.

Summary

This doc is structured as follows:

Quick Start

You can take a quick tour online using ourcolab notebook.If you want to useRecList locally, clone the repository, create and activate a virtual env, and install the required packages from pip (you can also install from root of course).

git clone https://github.com/jacopotagliabue/reclistcd reclistpython3 -m venv venvsource venv/bin/activatepip install reclistcd examplespython dummy.py

The sample script will run a suite of tests on a dummy dataset and model, showcasing a typical workflow with the library. Note the commented arguments in the script, which you can use to customize the behavior of the libraryonce you familiarize yourself with the basic patterns (e.g. using S3 to store the plots, leveraging a third-party tool to track experiments).

Once your development setup is working as expected, you can run

python evalrs_2023.py

to explore tests on a real-worlddataset (make sure thefiles are available in the examples folder before you run the script).Finally, once you've run successfully the sample scripts, take the guided tour below to learn more about the abstractions and the full capabilities ofRecList.

A Guided Tour

An instance ofRecList represents a suite of tests for recommender systems.

Asevalrs_2023.py shows, we leave users quite a wide range of options: we provide out of the box standard metricsin case your dataset is DataFrame-shaped (or you can / wish turn it into such a shape), but don't force you any pattern if you just want to useRecListfor the scaffolding it provides.

For example, the following code only assumes you have a dataset with golden labels, predictions, and metadata (e.g. item features) in the shape of a DataFrame:

cdf=DFSessionRecList(dataset=df_events,model_name="myDataFrameRandomModel",predictions=df_predictions,y_test=df_dataset,logger=LOGGER.LOCAL,metadata_store=METADATA_STORE.LOCAL,similarity_model=my_sim_model,)cdf(verbose=True)

Our library pre-packages standard recSys metrics and important behavioral tests, but it is built with extensibility in mind: you can re-use tests in new suites, or you can write new domain-specific suites and tests.Any suite must inherit from the main interface, and then declare its tests as functions decorated with@rec_test.

In the example, an instance is created with one slice-based test: the decorator and return type are used to automatically generate a chart.

classMyRecList(RecList):@rec_test(test_type="AccuracyByCountry",display_type=CHART_TYPE.BARS)defaccuracy_by_country(self):"""        Compute the accuracy by country        NOTE: the accuracy here is just a random number.        """fromrandomimportrandintreturn {"US":randint(0,100),"CA":randint(0,100),"FR":randint(0,100) }

Inheritance is powerful, as we can build new suites by re-using existing ones. Here, we inherit the tests from an existing "parent" list and just add one more to create a new suite:

classChildRecList(MyParentRecList):@rec_test(test_type='custom_test',display_type=CHART_TYPE.SCALAR)defmy_test(self):"""        Custom test, returning my lucky number as an example        """fromrandomimportrandintreturn {"luck_number":randint(0,100) }

Any model can be tested, as no assumption is made on the model's structure, but only the availability ofpredictionsandground truth. Once again, while our example leverages a DataFrame-shaped dataset for these entities, you are free to build your ownRecList instance with any shape you prefer, provided you implement the metrics accordingly (seedummy.py for an example with different input types).

Once you run a suite of tests, results are dumped automatically and versioned in a folder (local or on S3), structured as follows(name of the suite, name of the model, run timestamp):

.reclist/  myList/    myModel/      1637357392/      1637357404/

If you useRecList as part of your standard testings - either for research or production purposes - you can use the JSON reportfor machine-to-machine communication with downstream systems (e.g. you may want to automatically fail thepipeline if tests are not passed).

Capabilities

RecList provides a dataset and model agnostic framework to scale up behavioral tests. We provide some suggested abstractionsbased on DataFrames to make existing tests and metrics fully re-usable, but we don't force any pattern on the user. As out-of-the box functionality, the package provides:

tests and metrics to be used on your own datasets and models;
automated storage of results, with versioning, both in a local folder or on S3;
flexible, Python interface to declare tests-as-functions, and annotate them withdisplay_type for automated charts;
pre-built connectors with popular experiment trackers (e.g. Neptune, Comet), and an extensible interface to add your own (see below);
reference implementations based on popular data challenges that used RecList: for an example of the "less wrong" latent space metric you can check the song2vec implementationhere.

Using Third-Party Tracking Tools

RecList supports streaming the results of your tests directly to your cloud platform of choice, both as metrics and charts.

If you have thePython client installed, you can use theNeptune logger by simply specifying it at init time, and either passingNEPTUNE_KEY andNEPTUNE_PROJECT_NAME as kwargs, or setting them as environment variables.

cdf=DFSessionRecList(dataset=df_events,model_name="myDataFrameRandomModel",predictions=df_predictions,y_test=df_dataset,logger=LOGGER.NEPTUNE,metadata_store=METADATA_STORE.LOCAL,similarity_model=my_sim_model)cdf(verbose=True)

If you have thePython client installed, you can use theComet logger by simply specifying it at init time, and either passingCOMET_KEY,COMET_PROJECT_NAME,COMET_WORKSPACE as kwargs, or setting them as environment variables.

cdf=DFSessionRecList(dataset=df_events,model_name="myDataFrameRandomModel",predictions=df_predictions,y_test=df_dataset,logger=LOGGER.COMET,metadata_store=METADATA_STORE.LOCAL,similarity_model=my_sim_model)cdf(verbose=True)

If you wish to add a new platform, you can do so by simply implementing a new class inheriting from RecLogger.

Acknowledgments

The original authors are:

Patrick John Chia -LinkedIn,GitHub
Jacopo Tagliabue -LinkedIn,GitHub
Federico Bianchi -LinkedIn,GitHub
Chloe He -LinkedIn,GitHub
Brian Ko -LinkedIn,GitHub

RecList is a community project made possible by the generous support of awesome folks. Between June and December 2022, the development of our beta has been supported byComet,Neptune ,Gantry.Our beta has been developed with the help of:

Unnati Patel -LinkedIn
Ciro Greco -LinkedIn

If you have questions or feedback, please reach out to:jacopo dot tagliabue at nyu dot edu.

License and Citation

All the code is released under an open MIT license. If you foundRecList useful, please cite our WWW paper:

@inproceedings{10.1145/3487553.3524215,    author = {Chia, Patrick John and Tagliabue, Jacopo and Bianchi, Federico and He, Chloe and Ko, Brian},    title = {Beyond NDCG: Behavioral Testing of Recommender Systems with RecList},    year = {2022},    isbn = {9781450391306},    publisher = {Associationfor Computing Machinery},    address = {New York, NY, USA},    url = {https://doi.org/10.1145/3487553.3524215},    doi = {10.1145/3487553.3524215},    pages = {99–104},    numpages = {6},    keywords = {recommender systems, open source, behavioral testing},    location = {Virtual Event, Lyon, France},    series = {WWW'22 Companion}}

Credits

This package was created withCookiecutter and theaudreyr/cookiecutter-pypackage project template.

About

Behavioral "black-box" testing for recommender systems

reclist.io

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

RecList

Overview

Colab Tutorials

Quick Links

Status

Summary

Quick Start

A Guided Tour

Capabilities

Using Third-Party Tracking Tools

Acknowledgments

License and Citation

Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors8

Uh oh!

Languages

Movatterモバイル変換

License

RecList/reclist

Folders and files

Latest commit

History

Repository files navigation

RecList

Overview

Colab Tutorials

Quick Links

Status

Summary

Quick Start

A Guided Tour

Capabilities

Using Third-Party Tracking Tools

Acknowledgments

License and Citation

Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors8

Uh oh!

Languages

Packages