Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 14612))

Included in the following conference series:

European Conference on Information Retrieval

Abstract

Generative retrieval (GR) has become a highly active area of information retrieval that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the novel GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 14871; Price includes VAT (Japan)

Softcover Book: JPY 18589; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Doc2Query–: When Less is More

Large language models for generative information extraction: a survey

ArticleOpen access11 November 2024

IRGen: Generative Modeling for Image Retrieval

Notes

1.
https://ecir2024-generative-ir.github.io.

References

Bénédict, G., Zhang, R., Metzler, D.: Gen-IR@SIGIR 2023: the first workshop on generative information retrieval. In: SIGIR, pp. 3460–3463 (2023)
Google Scholar
Bevilacqua, M., Ottaviano, G., Lewis, P., Yih, W.T., Riedel, S., Petroni, F.: Autoregressive search engines: generating substrings as document identifiers. In: NeurIPS, pp. 31668–31683 (2022)
Google Scholar
Chen, J., Zhang, R., Guo, J., Fan, Y., Cheng, X.: GERE: generative evidence retrieval for fact verification. In: SIGIR, pp. 2184–2189 (2022)
Google Scholar
Chen, J., Zhang, R., Guo, J., Liu, Y., Fan, Y., Cheng, X.: CorpusBrain: pre-train a generative retrieval model for knowledge-intensive language tasks. In: CIKM, pp. 191–200 (2022)
Google Scholar
Chen, J., et al.: Continual learning for generative retrieval over dynamic corpora. In: CIKM, pp. 306–315 (2023)
Google Scholar
Chen, J., et al.: A unified generative retriever for knowledge-intensive language tasks via prompt learning. In: SIGIR, pp. 1448–1457 (2023)
Google Scholar
Chen, X., Liu, Y., He, B., Sun, L., Sun, Y.: Understanding differential search index for text retrieval. In: Findings of ACL, pp. 10701–10717 (2023)
Google Scholar
De Cao, N., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. In: ICLR (2021)
Google Scholar
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: ICML, pp. 3929–3938 (2020)
Google Scholar
Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist.7, 452–466 (2019)
Google Scholar
Lee, H., et al.: Nonparametric decoding for generative retrieval. In: Findings of the ACL 2023, pp. 12642–12661 (2023)
Google Scholar
Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Learning to rank in generative retrieval. arXiv preprintarXiv:2306.15222 (2023)
Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Multiview identifiers enhanced generative retrieval. In: ACL, pp. 6636–6648 (2023)
Google Scholar
Liu, S., Xiao, F., Ou, W., Si, L.: Cascade ranking for operational e-commerce search. In: KDD, pp. 1557–1565 (2017)
Google Scholar
Liu, Y.A., Zhang, R., Guo, J., Chen, W., Cheng, X.: On the robustness of generative retrieval models: an out-of-distribution perspective. In: Gen-IR@SIGIR (2023)
Google Scholar
Metzler, D., Tay, Y., Bahri, D., Najork, M.: Rethinking search: making domain experts out of dilettantes. SIGIR Forum55(1), 1–27 (2021)
Article Google Scholar
Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprintarXiv:1602.01137 (2016)
Nadeem, U., Ziems, N., Wu, S.: CodeDSI: differentiable code search. arXiv preprintarXiv:2210.00328 (2022)
Najork, M.: Generative information retrieval. In: SIGIR, p. 1 (2023)
Google Scholar
Nguyen, T., Yates, A.: Generative retrieval as dense retrieval. In: Gen-IR@SIGIR (2023)
Google Scholar
Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (2016)
Google Scholar
Pradeep, R., et al.: How does generative retrieval scale to millions of passages? In: Gen-IR@SIGIR (2023)
Google Scholar
Ren, R., Zhao, W.X., Liu, J., Wu, H., Wen, J.R., Wang, H.: TOME: a two-stage approach for model-based retrieval. In: ACL, pp. 6102–6114 (2023)
Google Scholar
Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW, pp. 13–19 (2004)
Google Scholar
Sun, W., et al.: Learning to tokenize for generative retrieval. In: NeurIPS (2023)
Google Scholar
Tang, Y., et al.: Semantic-enhanced differentiable search index inspired by learning strategies. In: KDD, pp. 4904–4913 (2023)
Google Scholar
Tay, Y., et al.: Transformer memory as a differentiable search index. In: NeurIPS, vol. 35, pp. 21831–21843 (2022)
Google Scholar
Wang, Y., et al.: A neural corpus indexer for document retrieval. In: NeurIPS, vol. 35, pp. 25600–25614 (2022)
Google Scholar
Wang, Z., Zhou, Y., Tu, Y., Dou, Z.: NOVO: learnable and interpretable document identifiers for model-based IR. In: CIKM, pp. 2656–2665 (2023)
Google Scholar
Zhang, P., Liu, Z., Zhou, Y., Dou, Z., Cao, Z.: Term-sets can be strong document identifiers for auto-regressive search engines. arXiv preprintarXiv:2305.13859 (2023)
Zhou, Y.J., Yao, J., Dou, Z.C., Wu, L., Wen, J.R.: DynamicRetriever: a pre-trained model-based IR system without an explicit index. Mach. Intell. Res.20(2), 276–288 (2023)
Article Google Scholar
Zhou, Y., Yao, J., Dou, Z., Wu, L., Zhang, P., Wen, J.R.: Ultron: an ultimate retriever on corpus with a model-based indexer. arXiv preprintarXiv:2208.09257 (2022)
Zhuang, S., et al.: Bridging the gap between indexing and retrieval for differentiable search index with query generation. In: Gen-IR@SIGIR (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

CAS Key Lab of Network Data Science and Technology, ICT, CAS, Beijing, China
Yubao Tang, Ruqing Zhang & Jiafeng Guo
University of Chinese Academy of Sciences, Beijing, China
Yubao Tang, Ruqing Zhang & Jiafeng Guo
Leiden University, Leiden, The Netherlands
Zhaochun Ren
University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke

Authors

Yubao Tang
View author publications
You can also search for this author inPubMed Google Scholar
Ruqing Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhaochun Ren
View author publications
You can also search for this author inPubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author inPubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYubao Tang.

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, Y., Zhang, R., Ren, Z., Guo, J., de Rijke, M. (2024). Recent Advances in Generative Information Retrieval. In: Goharian, N.,et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14612. Springer, Cham. https://doi.org/10.1007/978-3-031-56069-9_48

Download citation

DOI:https://doi.org/10.1007/978-3-031-56069-9_48
Published:23 March 2024
Publisher Name:Springer, Cham
Print ISBN:978-3-031-56068-2
Online ISBN:978-3-031-56069-9
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Recent Advances in Generative Information Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Doc2Query–: When Less is More

Large language models for generative information extraction: a survey

IRGen: Generative Modeling for Image Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now