Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

SciRepEval benchmark training and evaluation scripts

License

NotificationsYou must be signed in to change notification settings

allenai/scirepeval

Repository files navigation

This repo contains the code to train, evaluate and reproduce the representation learning models and results on the benchmark introduced inSciRepEval.

Quick Setup

Clone the repo and setup the environment as follows:

git clone git@github.com:allenai/scirepeval.gitcd scirepevalconda create -n scirepeval python=3.8conda activate scirepevalpip install -r requirements.txt

Usage

Please refer to the following for further usage:

Training - Train multi-task/multi-format transformer models or adapter modules

Inference - Using the trained SciRepEval models to generate embeddings.

Evaluation - Evaluate trained models on custom tasks OR customize existing evaluation config for SciRepEval benchmark tasks

Benchmarking - Simply evaluate models(pretrained from HuggingFace/local checkpoints) on SciRepEval and generate a report

Benchmark Details

SciRepEval consists of 24 scientific document tasks to train and evaluate scientific document representation models. The tasks are divided across 4 task formats- classificationCLF, regressionRGN, proximity (nearest neighbors) retrievalPRX and ad-hoc searchSRCH. The table below gives a brief overview of the tasks with their HuggingFace datasets config names, if applicable.The benchmark dataset can be downloaded from AWS S3 or HuggingFace as follows:

AWS S3 via CLI

mkdir scirepeval_data&& mkdir scirepeval_data/train&& mkdir scirepeval_data/test&&cd scirepeval_dataaws s3 --no-sign-request sync s3://ai2-s2-research-public/scirepeval/train trainaws s3 --no-sign-request sync s3://ai2-s2-research-public/scirepeval/testtest

The AWS CLI commands can be run with the--dryrun flag to list the files being copied. The entire dataset is ~24 GB in size.

HuggingFace Datasets

The training, validation and raw evaluation data is available atallenai/scirepeval, while the labelled test examples are available atallenai/scirepeval_test.

importdatasets#training/validation/eval metadatadataset=datasets.load_dataset(allenai/scirepeval,<hfconfigname>)#labelled test examplesdataset=datasets.load_dataset(allenai/scirepeval_test,<hfconfigname>)

Since we want to evaluate document representations, every dataset consists of two parts: test metadata (text for representation generation available under allenai/scirepeval) and labelled examples (available under allenai/scirepeval_test)

FormatNameTrainMetricHF ConfigHF Test Config
CLFMeSH DescriptorsYF1 Macromesh_descriptorsmesh_descriptors
CLFFields of studyYF1 Macrofosfos
CLFBiomimicryNF1 Binarybiomimicrybiomimicry
CLFDRSMNF1 Macrodrsmdrsm
CLFSciDocs-MAGNF1 Macroscidocs_mag_meshscidocs_mag
CLFSciDocs-Mesh DiseasesNF1 Macroscidocs_mag_meshscidocs_mesh
RGNCitation CountYKendall's Taucite_countcite_count
RGNYear of PublicationYKendall's Taupub_yearpub_year
RGNPeer Review ScoreNKendall's Taupeer_review_score_hIndexpeer_review_score
RGNMax Author hIndexNKendall's Taupeer_review_score_hIndexhIndex
RGNTweet MentionsNKendall's Tautweet_mentionstweet_mentions
PRXSame Author DetectionYMAPsame_authorsame_author
PRXHighly Influential CitationsYMAPhigh_influence_citehigh_influence_cite
PRXCitation PredictionY-cite_prediction-
PRXS2AND*NB^3 F1--
PRXPaper-Reviewer Matching**NPrecision@5,10paper_reviewer_matchingpaper_reviewer_matching,reviewers
PRXRELISHNNDCGrelishrelish
PRXSciDocs-CiteNMAP, NDCGscidocs_view_cite_readscidocs_cite
PRXSciDocs-CoCiteNMAP, NDCGscidocs_view_cite_readscidocs_cocite
PRXSciDocs-CoViewNMAP, NDCGscidocs_view_cite_readscidocs_view
PRXSciDocs-CoReadNMAP, NDCGscidocs_view_cite_readscidocs_read
SRCHSearchYNDCGsearchsearch
SRCHNFCorpusNNDCGnfcorpusnfcorpus
SRCHTREC-CoVIDNNDCGtrec_covidtrec_covid

*S2AND requires the evaluation dataset in a specific format so to evaluate your model on the task please followthese instructions.

**Combinations of multiple datasets -1,2,3, also dataset of papers authored by potential reviewers is required for evaluation; hence the multiple dataset configs.

License

The aggregate benchmark is released underODC-BY license. By downloading this data you acknowledge that you have read and agreed to all the terms in this license.For constituent datasets, also go through the individual licensing requirements, as applicable.

Citation

Please cite the SciRepEval work as:

@article{Singh2022SciRepEvalAM,title={SciRepEval: A Multi-Format Benchmark for Scientific Document Representations},author={Amanpreet Singh and Mike D'Arcy and Arman Cohan and Doug Downey and Sergey Feldman},journal={ArXiv},year={2022},volume={abs/2211.13308}}

About

SciRepEval benchmark training and evaluation scripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp