Movatterモバイル変換

primeqa/primeqaPublic

NotificationsYou must be signed in to change notification settings
Fork57
Star735

The prime repository for state-of-the-art Multilingual Question Answering research and development.

primeqa.github.io/primeqa

License

Apache-2.0 license

735 stars 57 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 2,651 Commits
.github/workflows		.github/workflows
Dockerfiles		Dockerfiles
docs		docs
extensions		extensions
notebooks		notebooks
primeqa		primeqa
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION		VERSION
pyproject.toml		pyproject.toml
setup.py		setup.py
tox.ini		tox.ini

Repository files navigation

The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development.

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of theTransformers toolkit and usesdatasets andmodels that are directly downloadable.

The models within PrimeQA supports End-to-end Question Answering. PrimeQA answers questions via

Information Retrieval: Retrieving documents and passages using both traditional (e.g. BM25) and neural (e.g. ColBERT) models
Multilingual Machine Reading Comprehension: Extract and/ or generate answers given the source document or passage.
Multilingual Question Generation: Supports generation of questions for effective domain adaptation overtables andmultilingual text.
Retrieval Augmented Generation: Generate answers using the GPT-3/ChatGPT pretrained models, conditioned on retrieved passages.

Some examples of models (applicable on benchmark datasets) supported are :

Traditional IR with BM25 Pyserini
Neural IR with ColBERT, DPR (collaboration withStanford NLP IR led byChris Potts &Matei Zaharia).Replicating the experiments thatDr. Decr (Li et. al, 2022) performed to reach the top of the XOR TyDI leaderboard.
Machine Reading Comprehension with XLM-R: to replicate the experiments to get to the top of the TyDI leaderboard similar to the performance of the IBM GAAMA system. Coming soon: code to replicate GAAMA's performance on Natural Questions.

🏅 Top of the Leaderboard

PrimeQA is at the top of several leaderboards: XOR-TyDi, TyDiQA-main, OTT-QA and HybridQA.

XOR-TyDi

TyDiQA-main

OTT-QA

HybridQA

✔️ Getting Started

Installation

Installation doc

# cd to project root# If you want to run on GPU make sure to install torch appropriately# E.g. for torch 1.11 + CUDA 11.3:pip install'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired# Example installation commands:# Minimal install (non-editable)pip install.# GPU supportpip install .[gpu]# Full install (editable)pip install -e .[all]

Please note that dependencies (specified insetup.py) are pinned to provide a stable experience.When installing from source these can be modified, however this is not officially supported.

Note: in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:

Create and activate a conda environment
Install faiss libraries, using a command

conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0

Insetup.py, remove the faiss-related lines:

"faiss-cpu~=1.7.2": ["install", "gpu"],"faiss-gpu~=1.7.2": ["gpu"],

Continue with thepip install commands as desctibed above.

JAVA requirements

Java 11 is required for BM25 retrieval. Install java as follows:

conda install -c conda-forge openjdk=11

💬 Blog Posts

There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:

🧪 Unit Tests

Testing doc

To run the unit tests you first need toinstall PrimeQA.Make sure to install with the[tests] or[all] extras from pip.

From there you can run the tests via pytest, for example:

pytest --cov PrimeQA --cov-config .coveragerc tests/

For more information, see:

Ourtox.ini
Thepytest andtox documentation

🔭 Learn more

Section	Description
📒Documentation	Full API documentation and tutorials
🏁Quick tour: Entry Points for PrimeQA	Different entry points for PrimeQA: Information Retrieval, Reading Comprehension, TableQA and Question Generation
📓Tutorials: Jupyter Notebooks	Notebooks to get started on QA tasks
📓GPT-3/ChatGPT Reader Notebooks	Notebooks to get started with the GPT-3/ChatGPT reader components
💻Examples: Applying PrimeQA on various QA tasks	Example scripts for fine-tuning PrimeQA models on a range of QA tasks
🤗Model sharing and uploading	Upload and share your fine-tuned models with the community
✅Pull Request	PrimeQA Pull Request
📄Generate Documentation	How Documentation works
🛠Orchestrator Service REST Microservice	Proof-of-concept code for PrimeQA Orchestrator microservice
📖Tooling UI	Demo UI

❤️ PrimeQA collaborators include


	Stanford NLP		University of Illinois
	University of Stuttgart		University of Notre Dame
	Ohio State University		Carnegie Mellon University
	University of Massachusetts		IBM Research