We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.
@inproceedings{winata-etal-2018-bilingual, title = "Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition", author = "Winata, Genta Indra and Wu, Chien-Sheng and Madotto, Andrea and Fung, Pascale", editor = "Aguilar, Gustavo and AlGhamdi, Fahad and Soto, Victor and Solorio, Thamar and Diab, Mona and Hirschberg, Julia", booktitle = "Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W18-3214/", doi = "10.18653/v1/W18-3214", pages = "110--114", abstract = "We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76{\%} harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information."}
%0 Conference Proceedings%T Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition%A Winata, Genta Indra%A Wu, Chien-Sheng%A Madotto, Andrea%A Fung, Pascale%Y Aguilar, Gustavo%Y AlGhamdi, Fahad%Y Soto, Victor%Y Solorio, Thamar%Y Diab, Mona%Y Hirschberg, Julia%S Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching%D 2018%8 July%I Association for Computational Linguistics%C Melbourne, Australia%F winata-etal-2018-bilingual%X We propose an LSTM-based model with hierarchical architecture on named entity recognition from code-switching Twitter data. Our model uses bilingual character representation and transfer learning to address out-of-vocabulary words. In order to mitigate data noise, we propose to use token replacement and normalization. In the 3rd Workshop on Computational Approaches to Linguistic Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean F1-score for English-Spanish language pair without using any gazetteer and knowledge-based information.%R 10.18653/v1/W18-3214%U https://aclanthology.org/W18-3214/%U https://doi.org/10.18653/v1/W18-3214%P 110-114
[Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition](https://aclanthology.org/W18-3214/) (Winata et al., ACL 2018)