arxiv:2004.09297

MPNet: Masked and Permuted Pre-training for Language Understanding

Published on Apr 20, 2020

Upvote

Authors:

Kaitao Song,

Xu Tan,

Abstract

MPNet is a pre-training model that combines the advantages of BERT and XLNet by using permuted language modeling and incorporating auxiliary position information, resulting in superior performance across various downstream tasks.

AI-generated summary

BERT adoptsmasked language modeling (MLM) forpre-training and is one of themost successfulpre-training models. Since BERT neglects dependency amongpredicted tokens, XLNet introducespermuted language modeling (PLM) forpre-training to address this problem. However, XLNet does not leverage the fullposition information of a sentence and thus suffers from position discrepancybetweenpre-training andfine-tuning. In this paper, we propose MPNet, a novelpre-training method that inherits the advantages of BERT and XLNet and avoidstheir limitations. MPNet leverages the dependency among predicted tokensthroughpermuted language modeling (vs. MLM in BERT), and takes auxiliaryposition information as input to make the model see a full sentence and thusreducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on alarge-scale dataset (over 160GB text corpora) and fine-tune on a variety ofdown-streaming tasks (GLUE,SQuAD, etc). Experimental results show that MPNetoutperforms MLM and PLM by a large margin, and achieves better results on thesetasks compared with previous state-of-the-art pre-trained methods (e.g., BERT,XLNet,RoBERTa) under the same model setting. The code and the pre-trainedmodels are available at: https://github.com/microsoft/MPNet.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.

Tap or paste here to upload images

·Sign up orlog in to comment

Upvote

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2004.09297 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2004.09297 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.