- Notifications
You must be signed in to change notification settings - Fork10
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
License
IndoNLP/nusax
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
NusaX is a high-quality multilingual parallel corpus that covers 12 languages, Indonesian, English, and 10 Indonesian local languages, namely Acehnese, Balinese, Banjarese, Buginese, Madurese, Minangkabau, Javanese, Ngaju, Sundanese, and Toba Batak.
NusaX is created by translating existing sentiment analysis dataset into local languages.Our translations are written and verified by local native speakers. Therefore, NusaX can be broken down into 2 separate tasks:
Additionally, we also release theNusaX-Lexicon, which consists of parallel, sentiment lexicon of 10 Indonesian local languages.
You can find the details inour paper. The paper was awarded anOutstanding Paper at EACL 2023.
If you use our dataset or any code from this repository, please cite the following:
@inproceedings{winata-etal-2023-nusax, title = "NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages", author = "Winata, Genta Indra and Aji, Alham Fikri and Cahyawijaya, Samuel and Mahendra, Rahmad and Koto, Fajri and Romadhony, Ade and Kurniawan, Kemal and Moeljadi, David and Prasojo, Radityo Eko and Fung, Pascale and Baldwin, Timothy and Lau, Jey Han and Sennrich, Rico and Ruder, Sebastian", booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.eacl-main.57", pages = "815--834"}The dataset is licensed with CC-BY-SA, and the code is licensed with Apache-2.0.
About
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors3
Uh oh!
There was an error while loading.Please reload this page.