Movatterモバイル変換

Modeling Noise in Paraphrase Detection

Teemu Vahtola,Eetu Sjöblom,Jörg Tiedemann,Mathias Creutz

Correct Metadata for

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. Seeour corrections guidelines if you need to change the PDF.

TitleAdjust the title. Retain tags such as <fixed-case>.

AuthorsAdjust author names and order to match the PDF.

AbstractCorrect abstract if needed. Retain XML formatting tags such as <tex-math>.

Verification against PDFEnsure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consultthe PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.

Anthology ID:: 2022.lrec-1.461
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari,Frédéric Béchet,Philippe Blache,Khalid Choukri,Christopher Cieri,Thierry Declerck,Sara Goggi,Hitoshi Isahara,Bente Maegaard,Joseph Mariani,Hélène Mazo,Jan Odijk,Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 4324–4332
Language:
URL:: https://aclanthology.org/2022.lrec-1.461/
DOI:
Bibkey:
Cite (ACL):: Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, and Mathias Creutz. 2022.Modeling Noise in Paraphrase Detection. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4324–4332, Marseille, France. European Language Resources Association.
Cite (Informal):: Modeling Noise in Paraphrase Detection (Vahtola et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.461.pdf
Data: Opusparcus

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{vahtola-etal-2022-modeling,    title = "Modeling Noise in Paraphrase Detection",    author = {Vahtola, Teemu  and      Sj{\"o}blom, Eetu  and      Tiedemann, J{\"o}rg  and      Creutz, Mathias},    editor = "Calzolari, Nicoletta  and      B{\'e}chet, Fr{\'e}d{\'e}ric  and      Blache, Philippe  and      Choukri, Khalid  and      Cieri, Christopher  and      Declerck, Thierry  and      Goggi, Sara  and      Isahara, Hitoshi  and      Maegaard, Bente  and      Mariani, Joseph  and      Mazo, H{\'e}l{\`e}ne  and      Odijk, Jan  and      Piperidis, Stelios",    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",    month = jun,    year = "2022",    address = "Marseille, France",    publisher = "European Language Resources Association",    url = "https://aclanthology.org/2022.lrec-1.461/",    pages = "4324--4332",    abstract = "Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies."}

Download as File

<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="vahtola-etal-2022-modeling">    <titleInfo>        <title>Modeling Noise in Paraphrase Detection</title>    </titleInfo>    <name type="personal">        <namePart type="given">Teemu</namePart>        <namePart type="family">Vahtola</namePart>        <role>            <roleTerm authority="marcrelator" type="text">author</roleTerm>        </role>    </name>    <name type="personal">        <namePart type="given">Eetu</namePart>        <namePart type="family">Sjöblom</namePart>        <role>            <roleTerm authority="marcrelator" type="text">author</roleTerm>        </role>    </name>    <name type="personal">        <namePart type="given">Jörg</namePart>        <namePart type="family">Tiedemann</namePart>        <role>            <roleTerm authority="marcrelator" type="text">author</roleTerm>        </role>    </name>    <name type="personal">        <namePart type="given">Mathias</namePart>        <namePart type="family">Creutz</namePart>        <role>            <roleTerm authority="marcrelator" type="text">author</roleTerm>        </role>    </name>    <originInfo>        <dateIssued>2022-06</dateIssued>    </originInfo>    <typeOfResource>text</typeOfResource>    <relatedItem type="host">        <titleInfo>            <title>Proceedings of the Thirteenth Language Resources and Evaluation Conference</title>        </titleInfo>        <name type="personal">            <namePart type="given">Nicoletta</namePart>            <namePart type="family">Calzolari</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Frédéric</namePart>            <namePart type="family">Béchet</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Philippe</namePart>            <namePart type="family">Blache</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Khalid</namePart>            <namePart type="family">Choukri</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Christopher</namePart>            <namePart type="family">Cieri</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Thierry</namePart>            <namePart type="family">Declerck</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Sara</namePart>            <namePart type="family">Goggi</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Hitoshi</namePart>            <namePart type="family">Isahara</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Bente</namePart>            <namePart type="family">Maegaard</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Joseph</namePart>            <namePart type="family">Mariani</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Hélène</namePart>            <namePart type="family">Mazo</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Jan</namePart>            <namePart type="family">Odijk</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <name type="personal">            <namePart type="given">Stelios</namePart>            <namePart type="family">Piperidis</namePart>            <role>                <roleTerm authority="marcrelator" type="text">editor</roleTerm>            </role>        </name>        <originInfo>            <publisher>European Language Resources Association</publisher>            <place>                <placeTerm type="text">Marseille, France</placeTerm>            </place>        </originInfo>        <genre authority="marcgt">conference publication</genre>    </relatedItem>    <abstract>Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.</abstract>    <identifier type="citekey">vahtola-etal-2022-modeling</identifier>    <location>        <url>https://aclanthology.org/2022.lrec-1.461/</url>    </location>    <part>        <date>2022-06</date>        <extent unit="page">            <start>4324</start>            <end>4332</end>        </extent>    </part></mods></modsCollection>

Download as File

%0 Conference Proceedings%T Modeling Noise in Paraphrase Detection%A Vahtola, Teemu%A Sjöblom, Eetu%A Tiedemann, Jörg%A Creutz, Mathias%Y Calzolari, Nicoletta%Y Béchet, Frédéric%Y Blache, Philippe%Y Choukri, Khalid%Y Cieri, Christopher%Y Declerck, Thierry%Y Goggi, Sara%Y Isahara, Hitoshi%Y Maegaard, Bente%Y Mariani, Joseph%Y Mazo, Hélène%Y Odijk, Jan%Y Piperidis, Stelios%S Proceedings of the Thirteenth Language Resources and Evaluation Conference%D 2022%8 June%I European Language Resources Association%C Marseille, France%F vahtola-etal-2022-modeling%X Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training. In this paper, we propose the use of a linear noise model to augment pre-trained language models to account for label noise in fine-tuning. We test our approach in a paraphrase detection task with various levels of noise and five different languages. Our experiments demonstrate the effectiveness of the additional noise model in making the training procedures more robust and stable. Furthermore, we show that this model can be applied without further knowledge about annotation confidence and reliability of individual training examples and we analyse our results in light of data selection and sampling strategies.%U https://aclanthology.org/2022.lrec-1.461/%P 4324-4332

Download as File

Markdown (Informal)

[Modeling Noise in Paraphrase Detection](https://aclanthology.org/2022.lrec-1.461/) (Vahtola et al., LREC 2022)

Modeling Noise in Paraphrase Detection (Vahtola et al., LREC 2022)

ACL

Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, and Mathias Creutz. 2022.Modeling Noise in Paraphrase Detection. InProceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4324–4332, Marseille, France. European Language Resources Association.