The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest.
Eyal Shnarch, Carlos Alzate, Lena Dankin, Martin Gleize, Yufang Hou, Leshem Choshen, Ranit Aharonov, and Noam Slonim. 2018.Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 599–605, Melbourne, Australia. Association for Computational Linguistics.
@inproceedings{shnarch-etal-2018-will, title = "Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining", author = "Shnarch, Eyal and Alzate, Carlos and Dankin, Lena and Gleize, Martin and Hou, Yufang and Choshen, Leshem and Aharonov, Ranit and Slonim, Noam", editor = "Gurevych, Iryna and Miyao, Yusuke", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P18-2095/", doi = "10.18653/v1/P18-2095", pages = "599--605", abstract = "The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="shnarch-etal-2018-will"> <titleInfo> <title>Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining</title> </titleInfo> <name type="personal"> <namePart type="given">Eyal</namePart> <namePart type="family">Shnarch</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Carlos</namePart> <namePart type="family">Alzate</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lena</namePart> <namePart type="family">Dankin</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Martin</namePart> <namePart type="family">Gleize</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yufang</namePart> <namePart type="family">Hou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Leshem</namePart> <namePart type="family">Choshen</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ranit</namePart> <namePart type="family">Aharonov</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Noam</namePart> <namePart type="family">Slonim</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2018-07</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</title> </titleInfo> <name type="personal"> <namePart type="given">Iryna</namePart> <namePart type="family">Gurevych</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yusuke</namePart> <namePart type="family">Miyao</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Melbourne, Australia</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest.</abstract> <identifier type="citekey">shnarch-etal-2018-will</identifier> <identifier type="doi">10.18653/v1/P18-2095</identifier> <location> <url>https://aclanthology.org/P18-2095/</url> </location> <part> <date>2018-07</date> <extent unit="page"> <start>599</start> <end>605</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining%A Shnarch, Eyal%A Alzate, Carlos%A Dankin, Lena%A Gleize, Martin%A Hou, Yufang%A Choshen, Leshem%A Aharonov, Ranit%A Slonim, Noam%Y Gurevych, Iryna%Y Miyao, Yusuke%S Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)%D 2018%8 July%I Association for Computational Linguistics%C Melbourne, Australia%F shnarch-etal-2018-will%X The process of obtaining high quality labeled data for natural language understanding tasks is often slow, error-prone, complicated and expensive. With the vast usage of neural networks, this issue becomes more notorious since these networks require a large amount of labeled data to produce satisfactory results. We propose a methodology to blend high quality but scarce strong labeled data with noisy but abundant weak labeled data during the training of neural networks. Experiments in the context of topic-dependent evidence detection with two forms of weak labeled data show the advantages of the blending scheme. In addition, we provide a manually annotated data set for the task of topic-dependent evidence detection. We believe that blending weak and strong labeled data is a general notion that may be applicable to many language understanding tasks, and can especially assist researchers who wish to train a network but have a small amount of high quality labeled data for their task of interest.%R 10.18653/v1/P18-2095%U https://aclanthology.org/P18-2095/%U https://doi.org/10.18653/v1/P18-2095%P 599-605
[Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining](https://aclanthology.org/P18-2095/) (Shnarch et al., ACL 2018)
Eyal Shnarch, Carlos Alzate, Lena Dankin, Martin Gleize, Yufang Hou, Leshem Choshen, Ranit Aharonov, and Noam Slonim. 2018.Will it Blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 599–605, Melbourne, Australia. Association for Computational Linguistics.