allenai/scirepevalPublic

NotificationsYou must be signed in to change notification settings
Fork10
Star74

SciRepEval benchmark training and evaluation scripts

License

Apache-2.0 license

74 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 338 Commits
evaluation		evaluation
examples		examples
training		training
.gitignore		.gitignore
BENCHMARKING.md		BENCHMARKING.md
LICENSE.md		LICENSE.md
README.md		README.md
adapter_fusion.py		adapter_fusion.py
bert_pals.py		bert_pals.py
mdcr.py		mdcr.py
requirements.txt		requirements.txt
reviewer_matching.py		reviewer_matching.py
s2and_embeddings.py		s2and_embeddings.py
scirepeval.py		scirepeval.py
scirepeval_ensemble.py		scirepeval_ensemble.py
scirepeval_tasks.jsonl		scirepeval_tasks.jsonl
update_triplets.py		update_triplets.py

Repository files navigation

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

This repo contains the code to train, evaluate and reproduce the representation learning models and results on the benchmark introduced inSciRepEval.

Quick Setup

Clone the repo and setup the environment as follows:

git clone git@github.com:allenai/scirepeval.gitcd scirepevalconda create -n scirepeval python=3.8conda activate scirepevalpip install -r requirements.txt

Usage

Please refer to the following for further usage:

Training - Train multi-task/multi-format transformer models or adapter modules

Inference - Using the trained SciRepEval models to generate embeddings.

Evaluation - Evaluate trained models on custom tasks OR customize existing evaluation config for SciRepEval benchmark tasks

Benchmarking - Simply evaluate models(pretrained from HuggingFace/local checkpoints) on SciRepEval and generate a report

Benchmark Details

SciRepEval consists of 24 scientific document tasks to train and evaluate scientific document representation models. The tasks are divided across 4 task formats- classificationCLF, regressionRGN, proximity (nearest neighbors) retrievalPRX and ad-hoc searchSRCH. The table below gives a brief overview of the tasks with their HuggingFace datasets config names, if applicable.The benchmark dataset can be downloaded from AWS S3 or HuggingFace as follows:

AWS S3 via CLI

mkdir scirepeval_data&& mkdir scirepeval_data/train&& mkdir scirepeval_data/test&&cd scirepeval_dataaws s3 --no-sign-request sync s3://ai2-s2-research-public/scirepeval/train trainaws s3 --no-sign-request sync s3://ai2-s2-research-public/scirepeval/testtest

The AWS CLI commands can be run with the--dryrun flag to list the files being copied. The entire dataset is ~24 GB in size.

HuggingFace Datasets

The training, validation and raw evaluation data is available atallenai/scirepeval, while the labelled test examples are available atallenai/scirepeval_test.

importdatasets#training/validation/eval metadatadataset=datasets.load_dataset(allenai/scirepeval,<hfconfigname>)#labelled test examplesdataset=datasets.load_dataset(allenai/scirepeval_test,<hfconfigname>)

Since we want to evaluate document representations, every dataset consists of two parts: test metadata (text for representation generation available under allenai/scirepeval) and labelled examples (available under allenai/scirepeval_test)

Format	Name	Train	Metric	HF Config	HF Test Config
CLF	MeSH Descriptors	Y	F1 Macro	mesh_descriptors	mesh_descriptors
CLF	Fields of study	Y	F1 Macro	fos	fos
CLF	Biomimicry	N	F1 Binary	biomimicry	biomimicry
CLF	DRSM	N	F1 Macro	drsm	drsm
CLF	SciDocs-MAG	N	F1 Macro	scidocs_mag_mesh	scidocs_mag
CLF	SciDocs-Mesh Diseases	N	F1 Macro	scidocs_mag_mesh	scidocs_mesh
RGN	Citation Count	Y	Kendall's Tau	cite_count	cite_count
RGN	Year of Publication	Y	Kendall's Tau	pub_year	pub_year
RGN	Peer Review Score	N	Kendall's Tau	peer_review_score_hIndex	peer_review_score
RGN	Max Author hIndex	N	Kendall's Tau	peer_review_score_hIndex	hIndex
RGN	Tweet Mentions	N	Kendall's Tau	tweet_mentions	tweet_mentions
PRX	Same Author Detection	Y	MAP	same_author	same_author
PRX	Highly Influential Citations	Y	MAP	high_influence_cite	high_influence_cite
PRX	Citation Prediction	Y	-	cite_prediction	-
PRX	S2AND*	N	B^3 F1	-	-
PRX	Paper-Reviewer Matching**	N	Precision@5,10	paper_reviewer_matching	paper_reviewer_matching,reviewers
PRX	RELISH	N	NDCG	relish	relish
PRX	SciDocs-Cite	N	MAP, NDCG	scidocs_view_cite_read	scidocs_cite
PRX	SciDocs-CoCite	N	MAP, NDCG	scidocs_view_cite_read	scidocs_cocite
PRX	SciDocs-CoView	N	MAP, NDCG	scidocs_view_cite_read	scidocs_view
PRX	SciDocs-CoRead	N	MAP, NDCG	scidocs_view_cite_read	scidocs_read
SRCH	Search	Y	NDCG	search	search
SRCH	NFCorpus	N	NDCG	nfcorpus	nfcorpus
SRCH	TREC-CoVID	N	NDCG	trec_covid	trec_covid

*S2AND requires the evaluation dataset in a specific format so to evaluate your model on the task please followthese instructions.

**Combinations of multiple datasets -1,2,3, also dataset of papers authored by potential reviewers is required for evaluation; hence the multiple dataset configs.

License

The aggregate benchmark is released underODC-BY license. By downloading this data you acknowledge that you have read and agreed to all the terms in this license.For constituent datasets, also go through the individual licensing requirements, as applicable.

Citation

Please cite the SciRepEval work as:

@article{Singh2022SciRepEvalAM,title={SciRepEval: A Multi-Format Benchmark for Scientific Document Representations},author={Amanpreet Singh and Mike D'Arcy and Arman Cohan and Doug Downey and Sergey Feldman},journal={ArXiv},year={2022},volume={abs/2211.13308}}

About

SciRepEval benchmark training and evaluation scripts

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Quick Setup

Usage

Benchmark Details

AWS S3 via CLI

HuggingFace Datasets

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors4

Uh oh!

Languages

Movatterモバイル変換

License

allenai/scirepeval

Folders and files

Latest commit

History

Repository files navigation

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Quick Setup

Usage

Benchmark Details

AWS S3 via CLI

HuggingFace Datasets

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors4

Uh oh!

Languages

Packages