Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.
Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, and Mathias Creutz. 2022.Modeling Noise in Paraphrase Detection. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4324–4332, Marseille, France. European Language Resources Association.
@inproceedings{vahtola-etal-2022-modeling, title = "Modeling Noise in Paraphrase Detection", author = {Vahtola, Teemu and Sj{\"o}blom, Eetu and Tiedemann, J{\"o}rg and Creutz, Mathias}, editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Odijk, Jan and Piperidis, Stelios", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.461/", pages = "4324--4332", abstract = "Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies."}
%0 Conference Proceedings%T Modeling Noise in Paraphrase Detection%A Vahtola, Teemu%A Sjöblom, Eetu%A Tiedemann, Jörg%A Creutz, Mathias%Y Calzolari, Nicoletta%Y Béchet, Frédéric%Y Blache, Philippe%Y Choukri, Khalid%Y Cieri, Christopher%Y Declerck, Thierry%Y Goggi, Sara%Y Isahara, Hitoshi%Y Maegaard, Bente%Y Mariani, Joseph%Y Mazo, Hélène%Y Odijk, Jan%Y Piperidis, Stelios%S Proceedings of the Thirteenth Language Resources and Evaluation Conference%D 2022%8 June%I European Language Resources Association%C Marseille, France%F vahtola-etal-2022-modeling%X Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.%U https://aclanthology.org/2022.lrec-1.461/%P 4324-4332
Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, and Mathias Creutz. 2022.Modeling Noise in Paraphrase Detection. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4324–4332, Marseille, France. European Language Resources Association.