Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.
Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, and Barbara Plank. 2023.Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction. InProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 80–85, Tórshavn, Faroe Islands. University of Tartu Library.
@inproceedings{bassignana-etal-2023-multi, title = "Multi-{C}ross{RE} A Multi-Lingual Multi-Domain Dataset for Relation Extraction", author = "Bassignana, Elisa and Ginter, Filip and Pyysalo, Sampo and van der Goot, Rob and Plank, Barbara", editor = {Alum{\"a}e, Tanel and Fishel, Mark}, booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", month = may, year = "2023", address = "T{\'o}rshavn, Faroe Islands", publisher = "University of Tartu Library", url = "https://aclanthology.org/2023.nodalida-1.9/", pages = "80--85", abstract = "Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and{--}as sanity check{--}over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="bassignana-etal-2023-multi"> <titleInfo> <title>Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction</title> </titleInfo> <name type="personal"> <namePart type="given">Elisa</namePart> <namePart type="family">Bassignana</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Filip</namePart> <namePart type="family">Ginter</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sampo</namePart> <namePart type="family">Pyysalo</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rob</namePart> <namePart type="family">van der Goot</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Barbara</namePart> <namePart type="family">Plank</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2023-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)</title> </titleInfo> <name type="personal"> <namePart type="given">Tanel</namePart> <namePart type="family">Alumäe</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Fishel</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>University of Tartu Library</publisher> <place> <placeTerm type="text">Tórshavn, Faroe Islands</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.</abstract> <identifier type="citekey">bassignana-etal-2023-multi</identifier> <location> <url>https://aclanthology.org/2023.nodalida-1.9/</url> </location> <part> <date>2023-05</date> <extent unit="page"> <start>80</start> <end>85</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction%A Bassignana, Elisa%A Ginter, Filip%A Pyysalo, Sampo%A van der Goot, Rob%A Plank, Barbara%Y Alumäe, Tanel%Y Fishel, Mark%S Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)%D 2023%8 May%I University of Tartu Library%C Tórshavn, Faroe Islands%F bassignana-etal-2023-multi%X Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose Multi-CrossRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. Multi-CrossRE is a machine translated version of CrossRE (Bassignana and Plank, 2022), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and–as sanity check–over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.%U https://aclanthology.org/2023.nodalida-1.9/%P 80-85
[Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction](https://aclanthology.org/2023.nodalida-1.9/) (Bassignana et al., NoDaLiDa 2023)
Elisa Bassignana, Filip Ginter, Sampo Pyysalo, Rob van der Goot, and Barbara Plank. 2023.Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction. InProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 80–85, Tórshavn, Faroe Islands. University of Tartu Library.