We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well.
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018.HotFlip: White-Box Adversarial Examples for Text Classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.
@inproceedings{ebrahimi-etal-2018-hotflip, title = "{H}ot{F}lip: White-Box Adversarial Examples for Text Classification", author = "Ebrahimi, Javid and Rao, Anyi and Lowd, Daniel and Dou, Dejing", editor = "Gurevych, Iryna and Miyao, Yusuke", booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", month = jul, year = "2018", address = "Melbourne, Australia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P18-2006/", doi = "10.18653/v1/P18-2006", pages = "31--36", abstract = "We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="ebrahimi-etal-2018-hotflip"> <titleInfo> <title>HotFlip: White-Box Adversarial Examples for Text Classification</title> </titleInfo> <name type="personal"> <namePart type="given">Javid</namePart> <namePart type="family">Ebrahimi</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anyi</namePart> <namePart type="family">Rao</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Daniel</namePart> <namePart type="family">Lowd</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dejing</namePart> <namePart type="family">Dou</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2018-07</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</title> </titleInfo> <name type="personal"> <namePart type="given">Iryna</namePart> <namePart type="family">Gurevych</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yusuke</namePart> <namePart type="family">Miyao</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Melbourne, Australia</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well.</abstract> <identifier type="citekey">ebrahimi-etal-2018-hotflip</identifier> <identifier type="doi">10.18653/v1/P18-2006</identifier> <location> <url>https://aclanthology.org/P18-2006/</url> </location> <part> <date>2018-07</date> <extent unit="page"> <start>31</start> <end>36</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T HotFlip: White-Box Adversarial Examples for Text Classification%A Ebrahimi, Javid%A Rao, Anyi%A Lowd, Daniel%A Dou, Dejing%Y Gurevych, Iryna%Y Miyao, Yusuke%S Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)%D 2018%8 July%I Association for Computational Linguistics%C Melbourne, Australia%F ebrahimi-etal-2018-hotflip%X We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well.%R 10.18653/v1/P18-2006%U https://aclanthology.org/P18-2006/%U https://doi.org/10.18653/v1/P18-2006%P 31-36
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018.HotFlip: White-Box Adversarial Examples for Text Classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.