Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook. In order to be useful for CSS analysis, these categories must be fine-grained. The typically skewed distribution of fine-grained categories, however, results in a challenging classification problem on the NLP side. This paper proposes to make use of the hierarchical relations among categories typically present in such codebooks:e.g., markets and taxation are both subcategories of economy, while borders is a subcategory of security. We use these ontological relations as prior knowledge to establish additional constraints on the learned model, thusimproving performance overall and in particular for infrequent categories. We evaluate several lightweight variants of this intuition by extending state-of-the-art transformer-based textclassifiers on two datasets and multiple languages. We find the most consistent improvement for an approach based on regularization.
Erenay Dayanik, Andre Blessing, Nico Blokker, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, and Sebastian Pado. 2022.Improving Neural Political Statement Classification with Class Hierarchical Information. InFindings of the Association for Computational Linguistics: ACL 2022, pages 2367–2382, Dublin, Ireland. Association for Computational Linguistics.
@inproceedings{dayanik-etal-2022-improving, title = "Improving Neural Political Statement Classification with Class Hierarchical Information", author = "Dayanik, Erenay and Blessing, Andre and Blokker, Nico and Haunss, Sebastian and Kuhn, Jonas and Lapesa, Gabriella and Pado, Sebastian", editor = "Muresan, Smaranda and Nakov, Preslav and Villavicencio, Aline", booktitle = "Findings of the Association for Computational Linguistics: ACL 2022", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-acl.186/", doi = "10.18653/v1/2022.findings-acl.186", pages = "2367--2382", abstract = "Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook. In order to be useful for CSS analysis, these categories must be fine-grained. The typically skewed distribution of fine-grained categories, however, results in a challenging classification problem on the NLP side. This paper proposes to make use of the hierarchical relations among categories typically present in such codebooks:e.g., markets and taxation are both subcategories of economy, while borders is a subcategory of security. We use these ontological relations as prior knowledge to establish additional constraints on the learned model, thusimproving performance overall and in particular for infrequent categories. We evaluate several lightweight variants of this intuition by extending state-of-the-art transformer-based textclassifiers on two datasets and multiple languages. We find the most consistent improvement for an approach based on regularization."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="dayanik-etal-2022-improving"> <titleInfo> <title>Improving Neural Political Statement Classification with Class Hierarchical Information</title> </titleInfo> <name type="personal"> <namePart type="given">Erenay</namePart> <namePart type="family">Dayanik</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andre</namePart> <namePart type="family">Blessing</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nico</namePart> <namePart type="family">Blokker</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sebastian</namePart> <namePart type="family">Haunss</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jonas</namePart> <namePart type="family">Kuhn</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Gabriella</namePart> <namePart type="family">Lapesa</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sebastian</namePart> <namePart type="family">Pado</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Findings of the Association for Computational Linguistics: ACL 2022</title> </titleInfo> <name type="personal"> <namePart type="given">Smaranda</namePart> <namePart type="family">Muresan</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Preslav</namePart> <namePart type="family">Nakov</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Aline</namePart> <namePart type="family">Villavicencio</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Dublin, Ireland</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook. In order to be useful for CSS analysis, these categories must be fine-grained. The typically skewed distribution of fine-grained categories, however, results in a challenging classification problem on the NLP side. This paper proposes to make use of the hierarchical relations among categories typically present in such codebooks:e.g., markets and taxation are both subcategories of economy, while borders is a subcategory of security. We use these ontological relations as prior knowledge to establish additional constraints on the learned model, thusimproving performance overall and in particular for infrequent categories. We evaluate several lightweight variants of this intuition by extending state-of-the-art transformer-based textclassifiers on two datasets and multiple languages. We find the most consistent improvement for an approach based on regularization.</abstract> <identifier type="citekey">dayanik-etal-2022-improving</identifier> <identifier type="doi">10.18653/v1/2022.findings-acl.186</identifier> <location> <url>https://aclanthology.org/2022.findings-acl.186/</url> </location> <part> <date>2022-05</date> <extent unit="page"> <start>2367</start> <end>2382</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T Improving Neural Political Statement Classification with Class Hierarchical Information%A Dayanik, Erenay%A Blessing, Andre%A Blokker, Nico%A Haunss, Sebastian%A Kuhn, Jonas%A Lapesa, Gabriella%A Pado, Sebastian%Y Muresan, Smaranda%Y Nakov, Preslav%Y Villavicencio, Aline%S Findings of the Association for Computational Linguistics: ACL 2022%D 2022%8 May%I Association for Computational Linguistics%C Dublin, Ireland%F dayanik-etal-2022-improving%X Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook. In order to be useful for CSS analysis, these categories must be fine-grained. The typically skewed distribution of fine-grained categories, however, results in a challenging classification problem on the NLP side. This paper proposes to make use of the hierarchical relations among categories typically present in such codebooks:e.g., markets and taxation are both subcategories of economy, while borders is a subcategory of security. We use these ontological relations as prior knowledge to establish additional constraints on the learned model, thusimproving performance overall and in particular for infrequent categories. We evaluate several lightweight variants of this intuition by extending state-of-the-art transformer-based textclassifiers on two datasets and multiple languages. We find the most consistent improvement for an approach based on regularization.%R 10.18653/v1/2022.findings-acl.186%U https://aclanthology.org/2022.findings-acl.186/%U https://doi.org/10.18653/v1/2022.findings-acl.186%P 2367-2382
[Improving Neural Political Statement Classification with Class Hierarchical Information](https://aclanthology.org/2022.findings-acl.186/) (Dayanik et al., Findings 2022)
Erenay Dayanik, Andre Blessing, Nico Blokker, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, and Sebastian Pado. 2022.Improving Neural Political Statement Classification with Class Hierarchical Information. InFindings of the Association for Computational Linguistics: ACL 2022, pages 2367–2382, Dublin, Ireland. Association for Computational Linguistics.