We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods.
@inproceedings{bujel-etal-2021-zero, title = "Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers", author = "Bujel, Kamil and Yannakoudakis, Helen and Rei, Marek", editor = "Rogers, Anna and Calixto, Iacer and Vuli{\'c}, Ivan and Saphra, Naomi and Kassner, Nora and Camburu, Oana-Maria and Bansal, Trapit and Shwartz, Vered", booktitle = "Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.repl4nlp-1.20/", doi = "10.18653/v1/2021.repl4nlp-1.20", pages = "195--205", abstract = "We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods."}
%0 Conference Proceedings%T Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers%A Bujel, Kamil%A Yannakoudakis, Helen%A Rei, Marek%Y Rogers, Anna%Y Calixto, Iacer%Y Vulić, Ivan%Y Saphra, Naomi%Y Kassner, Nora%Y Camburu, Oana-Maria%Y Bansal, Trapit%Y Shwartz, Vered%S Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)%D 2021%8 August%I Association for Computational Linguistics%C Online%F bujel-etal-2021-zero%X We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-based architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods.%R 10.18653/v1/2021.repl4nlp-1.20%U https://aclanthology.org/2021.repl4nlp-1.20/%U https://doi.org/10.18653/v1/2021.repl4nlp-1.20%P 195-205