- Notifications
You must be signed in to change notification settings - Fork2
A phenomenon-wise evaluation dataset for Japanese-English machine translation robustness. The dataset is based on the MTNT dataset, with additional annotations of four linguistic phenomena; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant. COLING 2020.
cl-tohoku/PheMT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PheMT is a phenomenon-wise dataset designed for evaluating the robustness of Japanese-English machine translation systems.The dataset is based on the MTNT dataset[1], with additional annotations of four linguistic phenomena common in UGC; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant.COLING 2020.
Seethe paper for more information.
New!! ready-to-useevaluation tools are now available! (Feb. 2021)
This repository contains the following.
.├── README.md├── mtnt_approp_annotated.tsv # pre-filtered MTNT dataset with annotated appropriateness (See Appendix A)├── proper│ ├── proper.alignment # translations of targeted expressions│ ├── proper.en # references│ ├── proper.ja # source sentences│ └── proper.tsv├── abbrev│ ├── abbrev.alignment│ ├── abbrev.en│ ├── abbrev.norm.ja # normalized source sentences│ ├── abbrev.orig.ja # original source sentences│ └── abbrev.tsv├── colloq│ ├── colloq.alignment│ ├── colloq.en│ ├── colloq.norm.ja│ ├── colloq.orig.ja│ └── colloq.tsv├── variant│ ├── variant.alignment│ ├── variant.en│ ├── variant.norm.ja│ ├── variant.orig.ja│ └── variant.tsv└── src └── calc_acc.py # script for calculating translation accuracy
Please feed both original and normalized versions of source sentences to your model to get the difference of arbitrary metrics as a robustness measure.Also, we extracted translations for expressions presenting targeted phenomena.We recommend usingsrc/calc_acc.py
to measure the effect of each phenomenon more directly with the help of translation accuracy.
USAGE:python calc_acc.py system_output {proper, abbrev, colloq, variant}.alignment
- Statistics
Dataset | # sent. | # unique expressions (ratio) | average edit distance |
---|---|---|---|
Proper Noun | 943 | 747 (79.2%) | (no normalized version) |
Abbreviated Noun | 348 | 234 (67.2%) | 5.04 |
Colloquial Expression | 172 | 153 (89.0%) | 1.77 |
Variant | 103 | 97 (94.2%) | 3.42 |
- Examples
- Abbreviated Nounoriginal source : 地味なアプデ (apude, meaning update) だがnormalized source : 地味なアップデート (update) だがreference : That’s a plain update thoughalignment : update- Colloquial Expressionoriginal source : ここまで描いて飽きた、かなちい (kanachii, meaning sad)normalized source : ここまで描いて飽きた、かなしい (kanashii)reference : Drawing this much then getting bored, how sad.alignment : sad
If you use our dataset for your research, please cite the following paper:
@inproceedings{fujii-etal-2020-phemt, title = "{P}he{MT}: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents", author = "Fujii, Ryo and Mita, Masato and Abe, Kaori and Hanawa, Kazuaki and Morishita, Makoto and Suzuki, Jun and Inui, Kentaro", booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.coling-main.521", pages = "5929--5943",}
[1] Michel and Neubig (2018), MTNT: A Testbed for Machine Translation of Noisy Text.
About
A phenomenon-wise evaluation dataset for Japanese-English machine translation robustness. The dataset is based on the MTNT dataset, with additional annotations of four linguistic phenomena; Proper Noun, Abbreviated Noun, Colloquial Expression, and Variant. COLING 2020.
Resources
Uh oh!
There was an error while loading.Please reload this page.