- Notifications
You must be signed in to change notification settings - Fork3
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020) & Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation (ACM TALLIP)
Mao-KU/JASS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This the repository for thispaper.
Find extensions of this work and new pre-trained models here:code,paper
Install OpenNMT-py (1.0) and subword-nmt.
pip install OpenNMT-pypip install subword-nmt
We release JASS models on 2 language pairs: ja+en, ja+ru. For Japanese seq2seq pretraining, we use our proposed JASS methods while MASS is utilized for English and Russian.
Model | Vocabulary | BPE codes |
---|---|---|
JASS-jaen | ja-en | ja-en.bpe.codes |
JASS-jaru | ja-ru | ja-ru.bpe.codes |
Run the bpe precrocessing for the dataset to be finetuned. After setting up the downloaded vocabulary for src and tgt sentences during the preprocessing phase bypreprocess.py
of OpenNMT, usetrain_from
argument oftrain.py
in OpenNMT to implement the finetuning for the pretrained model.
We will update the current Japanese--English pre-trained model and release pretrained models on Japanese--Chinese and Japanese--Korean. We released new models here:code
[1] Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, Sadao Kurohashi,JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation
@inproceedings{mao-etal-2020-jass, title = "{JASS}: {J}apanese-specific Sequence to Sequence Pre-training for Neural Machine Translation", author = "Mao, Zhuoyuan and Cromieres, Fabien and Dabre, Raj and Song, Haiyue and Kurohashi, Sadao", booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://www.aclweb.org/anthology/2020.lrec-1.454", pages = "3683--3691", language = "English", ISBN = "979-10-95546-34-4",}
About
JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation (LREC2020) & Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation (ACM TALLIP)
Resources
Uh oh!
There was an error while loading.Please reload this page.