- Yubao Tang ORCID:orcid.org/0009-0003-8010-340414,15,
- Ruqing Zhang ORCID:orcid.org/0000-0003-4294-254114,15,
- Zhaochun Ren ORCID:orcid.org/0000-0002-9076-656516,
- Jiafeng Guo ORCID:orcid.org/0000-0002-9509-867414,15 &
- …
- Maarten de Rijke ORCID:orcid.org/0000-0002-1086-020217
Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 14612))
Included in the following conference series:
Abstract
Generative retrieval (GR) has become a highly active area of information retrieval that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the novel GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 14871
- Price includes VAT (Japan)
- Softcover Book
- JPY 18589
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bénédict, G., Zhang, R., Metzler, D.: Gen-IR@SIGIR 2023: the first workshop on generative information retrieval. In: SIGIR, pp. 3460–3463 (2023)
Bevilacqua, M., Ottaviano, G., Lewis, P., Yih, W.T., Riedel, S., Petroni, F.: Autoregressive search engines: generating substrings as document identifiers. In: NeurIPS, pp. 31668–31683 (2022)
Chen, J., Zhang, R., Guo, J., Fan, Y., Cheng, X.: GERE: generative evidence retrieval for fact verification. In: SIGIR, pp. 2184–2189 (2022)
Chen, J., Zhang, R., Guo, J., Liu, Y., Fan, Y., Cheng, X.: CorpusBrain: pre-train a generative retrieval model for knowledge-intensive language tasks. In: CIKM, pp. 191–200 (2022)
Chen, J., et al.: Continual learning for generative retrieval over dynamic corpora. In: CIKM, pp. 306–315 (2023)
Chen, J., et al.: A unified generative retriever for knowledge-intensive language tasks via prompt learning. In: SIGIR, pp. 1448–1457 (2023)
Chen, X., Liu, Y., He, B., Sun, L., Sun, Y.: Understanding differential search index for text retrieval. In: Findings of ACL, pp. 10701–10717 (2023)
De Cao, N., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. In: ICLR (2021)
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: ICML, pp. 3929–3938 (2020)
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist.7, 452–466 (2019)
Lee, H., et al.: Nonparametric decoding for generative retrieval. In: Findings of the ACL 2023, pp. 12642–12661 (2023)
Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Learning to rank in generative retrieval. arXiv preprintarXiv:2306.15222 (2023)
Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Multiview identifiers enhanced generative retrieval. In: ACL, pp. 6636–6648 (2023)
Liu, S., Xiao, F., Ou, W., Si, L.: Cascade ranking for operational e-commerce search. In: KDD, pp. 1557–1565 (2017)
Liu, Y.A., Zhang, R., Guo, J., Chen, W., Cheng, X.: On the robustness of generative retrieval models: an out-of-distribution perspective. In: Gen-IR@SIGIR (2023)
Metzler, D., Tay, Y., Bahri, D., Najork, M.: Rethinking search: making domain experts out of dilettantes. SIGIR Forum55(1), 1–27 (2021)
Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprintarXiv:1602.01137 (2016)
Nadeem, U., Ziems, N., Wu, S.: CodeDSI: differentiable code search. arXiv preprintarXiv:2210.00328 (2022)
Najork, M.: Generative information retrieval. In: SIGIR, p. 1 (2023)
Nguyen, T., Yates, A.: Generative retrieval as dense retrieval. In: Gen-IR@SIGIR (2023)
Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (2016)
Pradeep, R., et al.: How does generative retrieval scale to millions of passages? In: Gen-IR@SIGIR (2023)
Ren, R., Zhao, W.X., Liu, J., Wu, H., Wen, J.R., Wang, H.: TOME: a two-stage approach for model-based retrieval. In: ACL, pp. 6102–6114 (2023)
Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW, pp. 13–19 (2004)
Sun, W., et al.: Learning to tokenize for generative retrieval. In: NeurIPS (2023)
Tang, Y., et al.: Semantic-enhanced differentiable search index inspired by learning strategies. In: KDD, pp. 4904–4913 (2023)
Tay, Y., et al.: Transformer memory as a differentiable search index. In: NeurIPS, vol. 35, pp. 21831–21843 (2022)
Wang, Y., et al.: A neural corpus indexer for document retrieval. In: NeurIPS, vol. 35, pp. 25600–25614 (2022)
Wang, Z., Zhou, Y., Tu, Y., Dou, Z.: NOVO: learnable and interpretable document identifiers for model-based IR. In: CIKM, pp. 2656–2665 (2023)
Zhang, P., Liu, Z., Zhou, Y., Dou, Z., Cao, Z.: Term-sets can be strong document identifiers for auto-regressive search engines. arXiv preprintarXiv:2305.13859 (2023)
Zhou, Y.J., Yao, J., Dou, Z.C., Wu, L., Wen, J.R.: DynamicRetriever: a pre-trained model-based IR system without an explicit index. Mach. Intell. Res.20(2), 276–288 (2023)
Zhou, Y., Yao, J., Dou, Z., Wu, L., Zhang, P., Wen, J.R.: Ultron: an ultimate retriever on corpus with a model-based indexer. arXiv preprintarXiv:2208.09257 (2022)
Zhuang, S., et al.: Bridging the gap between indexing and retrieval for differentiable search index with query generation. In: Gen-IR@SIGIR (2023)
Author information
Authors and Affiliations
CAS Key Lab of Network Data Science and Technology, ICT, CAS, Beijing, China
Yubao Tang, Ruqing Zhang & Jiafeng Guo
University of Chinese Academy of Sciences, Beijing, China
Yubao Tang, Ruqing Zhang & Jiafeng Guo
Leiden University, Leiden, The Netherlands
Zhaochun Ren
University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke
- Yubao Tang
You can also search for this author inPubMed Google Scholar
- Ruqing Zhang
You can also search for this author inPubMed Google Scholar
- Zhaochun Ren
You can also search for this author inPubMed Google Scholar
- Jiafeng Guo
You can also search for this author inPubMed Google Scholar
- Maarten de Rijke
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toYubao Tang.
Editor information
Editors and Affiliations
Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tang, Y., Zhang, R., Ren, Z., Guo, J., de Rijke, M. (2024). Recent Advances in Generative Information Retrieval. In: Goharian, N.,et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14612. Springer, Cham. https://doi.org/10.1007/978-3-031-56069-9_48
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-56068-2
Online ISBN:978-3-031-56069-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative