Side effects during neural network tuning are typically measured by overall accuracy changes. However, we find that even with similar overall accuracy, existing tuning methods result in non-negligible instance-wise side effects. Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conductneural network surgery by only modifying a limited number of parameters. Neural network surgery can be realized using diverse techniques and we investigate three lines of methods. Experimental results on representative tuning problems validate the effectiveness of the surgery approach. The dynamic selecting method achieves the best overall performance that not only satisfies the tuning goal but also induces fewer instance-wise side effects by changing only10-5 of the parameters.
@inproceedings{zhang-etal-2021-neural, title = "Neural Network Surgery: Injecting Data Patterns into Pre-trained Models with Minimal Instance-wise Side Effects", author = "Zhang, Zhiyuan and Ren, Xuancheng and Su, Qi and Sun, Xu and He, Bin", editor = "Toutanova, Kristina and Rumshisky, Anna and Zettlemoyer, Luke and Hakkani-Tur, Dilek and Beltagy, Iz and Bethard, Steven and Cotterell, Ryan and Chakraborty, Tanmoy and Zhou, Yichao", booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jun, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.naacl-main.430/", doi = "10.18653/v1/2021.naacl-main.430", pages = "5453--5466", abstract = "Side effects during neural network tuning are typically measured by overall accuracy changes. However, we find that even with similar overall accuracy, existing tuning methods result in non-negligible instance-wise side effects. Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conduct \textit{neural network surgery} by only modifying a limited number of parameters. Neural network surgery can be realized using diverse techniques and we investigate three lines of methods. Experimental results on representative tuning problems validate the effectiveness of the surgery approach. The dynamic selecting method achieves the best overall performance that not only satisfies the tuning goal but also induces fewer instance-wise side effects by changing only $10^{-5}$ of the parameters."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="zhang-etal-2021-neural"> <titleInfo> <title>Neural Network Surgery: Injecting Data Patterns into Pre-trained Models with Minimal Instance-wise Side Effects</title> </titleInfo> <name type="personal"> <namePart type="given">Zhiyuan</namePart> <namePart type="family">Zhang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Xuancheng</namePart> <namePart type="family">Ren</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Qi</namePart> <namePart type="family">Su</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Xu</namePart> <namePart type="family">Sun</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bin</namePart> <namePart type="family">He</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2021-06</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title> </titleInfo> <name type="personal"> <namePart type="given">Kristina</namePart> <namePart type="family">Toutanova</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anna</namePart> <namePart type="family">Rumshisky</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Luke</namePart> <namePart type="family">Zettlemoyer</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Dilek</namePart> <namePart type="family">Hakkani-Tur</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Iz</namePart> <namePart type="family">Beltagy</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Steven</namePart> <namePart type="family">Bethard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ryan</namePart> <namePart type="family">Cotterell</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tanmoy</namePart> <namePart type="family">Chakraborty</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yichao</namePart> <namePart type="family">Zhou</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Online</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Side effects during neural network tuning are typically measured by overall accuracy changes. However, we find that even with similar overall accuracy, existing tuning methods result in non-negligible instance-wise side effects. Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conduct neural network surgery by only modifying a limited number of parameters. Neural network surgery can be realized using diverse techniques and we investigate three lines of methods. Experimental results on representative tuning problems validate the effectiveness of the surgery approach. The dynamic selecting method achieves the best overall performance that not only satisfies the tuning goal but also induces fewer instance-wise side effects by changing only 10⁻5 of the parameters.</abstract> <identifier type="citekey">zhang-etal-2021-neural</identifier> <identifier type="doi">10.18653/v1/2021.naacl-main.430</identifier> <location> <url>https://aclanthology.org/2021.naacl-main.430/</url> </location> <part> <date>2021-06</date> <extent unit="page"> <start>5453</start> <end>5466</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T Neural Network Surgery: Injecting Data Patterns into Pre-trained Models with Minimal Instance-wise Side Effects%A Zhang, Zhiyuan%A Ren, Xuancheng%A Su, Qi%A Sun, Xu%A He, Bin%Y Toutanova, Kristina%Y Rumshisky, Anna%Y Zettlemoyer, Luke%Y Hakkani-Tur, Dilek%Y Beltagy, Iz%Y Bethard, Steven%Y Cotterell, Ryan%Y Chakraborty, Tanmoy%Y Zhou, Yichao%S Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies%D 2021%8 June%I Association for Computational Linguistics%C Online%F zhang-etal-2021-neural%X Side effects during neural network tuning are typically measured by overall accuracy changes. However, we find that even with similar overall accuracy, existing tuning methods result in non-negligible instance-wise side effects. Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conduct neural network surgery by only modifying a limited number of parameters. Neural network surgery can be realized using diverse techniques and we investigate three lines of methods. Experimental results on representative tuning problems validate the effectiveness of the surgery approach. The dynamic selecting method achieves the best overall performance that not only satisfies the tuning goal but also induces fewer instance-wise side effects by changing only 10⁻5 of the parameters.%R 10.18653/v1/2021.naacl-main.430%U https://aclanthology.org/2021.naacl-main.430/%U https://doi.org/10.18653/v1/2021.naacl-main.430%P 5453-5466
[Neural Network Surgery: Injecting Data Patterns into Pre-trained Models with Minimal Instance-wise Side Effects](https://aclanthology.org/2021.naacl-main.430/) (Zhang et al., NAACL 2021)