sebastianruder/NLP-progressPublic

NotificationsYou must be signed in to change notification settings
Fork3.6k
Star23k

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

License

MIT license

23k stars 3.6k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 805 Commits
_includes		_includes
arabic		arabic
bengali		bengali
chinese		chinese
english		english
french		french
german		german
hindi		hindi
img		img
korean		korean
nepali		nepali
persian		persian
portuguese		portuguese
russian		russian
spanish		spanish
structured		structured
turkish		turkish
vietnamese		vietnamese
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CNAME		CNAME
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
jekyll_instructions.md		jekyll_instructions.md

Repository files navigation

Tracking Progress in Natural Language Processing

English

Vietnamese

Hindi

Chinese

For more tasks, datasets and results in Chinese, check out theChinese NLP website.

French

Russian

Spanish

Bengali

Persian

Turkish

Summarization

German

Arabic

Language modeling

This document aims to track the progress in Natural Language Processing (NLP) and give an overviewof the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech taggingas well as more recent ones such as reading comprehension and natural language inference. The main objectiveis to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for theirtask of interest, which serves as a stepping stone for further research. To this end, if there is aplace where results for a task are already published and regularly maintained, such as a public leaderboard,the reader will be pointed there.

If you want to find this document again in the future, just go tonlpprogress.comornlpsota.com in your browser.

Contributing

Guidelines

Results Results reported in published papers are preferred; an exception may be made for influential preprints.

Datasets Datasets should have been used for evaluation in at least one published paper besidesthe one that introduced the dataset.

Code We recommend to add a link to an implementationif available. You can add aCode column (see below) to the table if it does not exist.In theCode column, indicate an official implementation withOfficial.If an unofficial implementation is available, useLink (see below).If no implementation is available, you can leave the cell empty.

Adding a new result

If you would like to add a new result, you can just click on the small edit button in the top-rightcorner of the file for the respective task (see below).

This allows you to edit the file in Markdown. Simply add a row to the corresponding table in thesame format. Make sure that the table stays sorted (with the best result on top).After you've made your change, make sure that the table still looks ok by clicking on the"Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page,where you see the below form.

Add a name for your proposed change, an optional description, indicate that you would like to"Create a new branch for this commit and start a pull request", and click on "Propose file change".

Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository.In both cases, follow the steps below:

If your task is completely new, create a new file and link to it in the table of contents above.
If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
Briefly describe the dataset/task and include relevant references.
Describe the evaluation setting and evaluation metric.
Show how an annotated example of the dataset/task looks like.
Add a download link if available.
Copy the below table and fill in at least two results (including the state-of-the-art)for your dataset/task (change Score to the metric of your dataset). If your dataset/taskhas multiple metrics, add them to the right ofScore.
Submit your change as a pull request.

Model	Score	Paper / Source	Code

Wish list

These are tasks and datasets that are still missing:

Bilingual dictionary induction
Discourse parsing
Keyphrase extraction
Knowledge base population (KBP)
More dialogue tasks
Semi-supervised learning
Frame-semantic parsing (FrameNet full-sentence analysis)

Exporting into a structured format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.

The instructions are instructured/README.md.

Instructions for building the site locally

Instructions for building the website locally using Jekyll can be foundhere.

About

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

nlpprogress.com/