Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)

License

NotificationsYou must be signed in to change notification settings

IndoNLP/nusax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NusaX is a high-quality multilingual parallel corpus that covers 12 languages, Indonesian, English, and 10 Indonesian local languages, namely Acehnese, Balinese, Banjarese, Buginese, Madurese, Minangkabau, Javanese, Ngaju, Sundanese, and Toba Batak.

NusaX is created by translating existing sentiment analysis dataset into local languages.Our translations are written and verified by local native speakers. Therefore, NusaX can be broken down into 2 separate tasks:

Additionally, we also release theNusaX-Lexicon, which consists of parallel, sentiment lexicon of 10 Indonesian local languages.

Research Paper

You can find the details inour paper. The paper was awarded anOutstanding Paper at EACL 2023.

If you use our dataset or any code from this repository, please cite the following:

@inproceedings{winata-etal-2023-nusax,    title = "NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages",    author = "Winata, Genta Indra  and      Aji, Alham Fikri  and      Cahyawijaya, Samuel  and      Mahendra, Rahmad  and      Koto, Fajri  and      Romadhony, Ade  and      Kurniawan, Kemal  and      Moeljadi, David  and      Prasojo, Radityo Eko  and      Fung, Pascale  and      Baldwin, Timothy  and      Lau, Jey Han  and      Sennrich, Rico  and      Ruder, Sebastian",    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",    month = may,    year = "2023",    address = "Dubrovnik, Croatia",    publisher = "Association for Computational Linguistics",    url = "https://aclanthology.org/2023.eacl-main.57",    pages = "815--834"}

License

The dataset is licensed with CC-BY-SA, and the code is licensed with Apache-2.0.

About

High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp