SumitM0432/XLM-RoBERTa-for-Textual-EntailmentPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star6

A multilingual model XLM- RoBERTa for the textual entailment of sequence pair - premise and hypothesis of 15 different languages using the MNLI and XNLI corpus.

License

MIT license

6 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_augmentation.py		data_augmentation.py
dataset.py		dataset.py
engine.py		engine.py
exploratory-data-analysis-mnli-and-xnli.ipynb		exploratory-data-analysis-mnli-and-xnli.ipynb
model.py		model.py
predict.py		predict.py
train.py		train.py

Repository files navigation

XLM-RoBERTa-for-Textual-Entailment

XLM-RoBERTa

The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook’s RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.

https://huggingface.co/transformers/model_doc/xlmroberta.html

https://huggingface.co/transformers/model_doc/roberta.html

Textual entailment

Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed premise (p) and hypothesis (h), respectively. The relation between premise and hypothesis can be entailment, contradictory or it can be neutral (neither entailment nor contradictory).

Dataset

I have used three separate datasets Multi-Genre NLI Corpus (MNLI), Cross-Lingual NLI Corpus (XNLI), and the kaggle (Contradictory, My Dear Watson) Dataset. I have incorporated the datasets into one, hence the dataset is a multilingual dataset of 15 different languages. For Data Augmentation I have used the back translation for the kaggle dataset and used all the premises and hypotheses of different languages(15) of the XNLI corpus.

MNLI -https://cims.nyu.edu/~sbowman/multinli/

XNLI -https://cims.nyu.edu/~sbowman/xnli/

Kaggle dataset -https://www.kaggle.com/c/contradictory-my-dear-watson/data

Kaggle Exploratory Notebook -https://www.kaggle.com/code/sumitm004/exploratory-data-analysis-mnli-and-xnli

Requirements

Libraries - pandas, numpy, transformers, nlp, pytorch, tqdm, googletrans and sklearn.

About

A multilingual model XLM- RoBERTa for the textual entailment of sequence pair - premise and hypothesis of 15 different languages using the MNLI and XNLI corpus.

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

XLM-RoBERTa-for-Textual-Entailment

XLM-RoBERTa

https://huggingface.co/transformers/model_doc/xlmroberta.html

https://huggingface.co/transformers/model_doc/roberta.html

Textual entailment

Dataset

MNLI -https://cims.nyu.edu/~sbowman/multinli/

XNLI -https://cims.nyu.edu/~sbowman/xnli/

Kaggle dataset -https://www.kaggle.com/c/contradictory-my-dear-watson/data

Kaggle Exploratory Notebook -https://www.kaggle.com/code/sumitm004/exploratory-data-analysis-mnli-and-xnli

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

SumitM0432/XLM-RoBERTa-for-Textual-Entailment

Folders and files

Latest commit

History

Repository files navigation

XLM-RoBERTa-for-Textual-Entailment

XLM-RoBERTa

https://huggingface.co/transformers/model_doc/xlmroberta.html

https://huggingface.co/transformers/model_doc/roberta.html

Textual entailment

Dataset

MNLI -https://cims.nyu.edu/~sbowman/multinli/

XNLI -https://cims.nyu.edu/~sbowman/xnli/

Kaggle dataset -https://www.kaggle.com/c/contradictory-my-dear-watson/data

Kaggle Exploratory Notebook -https://www.kaggle.com/code/sumitm004/exploratory-data-analysis-mnli-and-xnli

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages