We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model. Specifically, we propose to transfer the knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT’s expressive MaxSim operator into a simple dot product. The advantage of the bi-encoder teacher–student setup is that we can efficiently add in-batch negatives during knowledge distillation, enabling richer interactions between teacher and student models. In addition, using ColBERT as the teacher reduces training cost compared to a full cross-encoder. Experiments on the MS MARCO passage and document ranking tasks and data from the TREC 2019 Deep Learning Track demonstrate that our approach helps models learn robust representations for dense retrieval effectively and efficiently.
@inproceedings{lin-etal-2021-batch, title = "In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval", author = "Lin, Sheng-Chieh and Yang, Jheng-Hong and Lin, Jimmy", editor = "Rogers, Anna and Calixto, Iacer and Vuli{\'c}, Ivan and Saphra, Naomi and Kassner, Nora and Camburu, Oana-Maria and Bansal, Trapit and Shwartz, Vered", booktitle = "Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.repl4nlp-1.17/", doi = "10.18653/v1/2021.repl4nlp-1.17", pages = "163--173", abstract = "We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model. Specifically, we propose to transfer the knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT`s expressive MaxSim operator into a simple dot product. The advantage of the bi-encoder teacher{--}student setup is that we can efficiently add in-batch negatives during knowledge distillation, enabling richer interactions between teacher and student models. In addition, using ColBERT as the teacher reduces training cost compared to a full cross-encoder. Experiments on the MS MARCO passage and document ranking tasks and data from the TREC 2019 Deep Learning Track demonstrate that our approach helps models learn robust representations for dense retrieval effectively and efficiently."}
%0 Conference Proceedings%T In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval%A Lin, Sheng-Chieh%A Yang, Jheng-Hong%A Lin, Jimmy%Y Rogers, Anna%Y Calixto, Iacer%Y Vulić, Ivan%Y Saphra, Naomi%Y Kassner, Nora%Y Camburu, Oana-Maria%Y Bansal, Trapit%Y Shwartz, Vered%S Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)%D 2021%8 August%I Association for Computational Linguistics%C Online%F lin-etal-2021-batch%X We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model. Specifically, we propose to transfer the knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT‘s expressive MaxSim operator into a simple dot product. The advantage of the bi-encoder teacher–student setup is that we can efficiently add in-batch negatives during knowledge distillation, enabling richer interactions between teacher and student models. In addition, using ColBERT as the teacher reduces training cost compared to a full cross-encoder. Experiments on the MS MARCO passage and document ranking tasks and data from the TREC 2019 Deep Learning Track demonstrate that our approach helps models learn robust representations for dense retrieval effectively and efficiently.%R 10.18653/v1/2021.repl4nlp-1.17%U https://aclanthology.org/2021.repl4nlp-1.17/%U https://doi.org/10.18653/v1/2021.repl4nlp-1.17%P 163-173
[In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval](https://aclanthology.org/2021.repl4nlp-1.17/) (Lin et al., RepL4NLP 2021)