Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Recent Advances in Generative Information Retrieval

  • Conference paper
  • First Online:

Abstract

Generative retrieval (GR) has become a highly active area of information retrieval that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the novel GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 18589
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Bénédict, G., Zhang, R., Metzler, D.: Gen-IR@SIGIR 2023: the first workshop on generative information retrieval. In: SIGIR, pp. 3460–3463 (2023)

    Google Scholar 

  2. Bevilacqua, M., Ottaviano, G., Lewis, P., Yih, W.T., Riedel, S., Petroni, F.: Autoregressive search engines: generating substrings as document identifiers. In: NeurIPS, pp. 31668–31683 (2022)

    Google Scholar 

  3. Chen, J., Zhang, R., Guo, J., Fan, Y., Cheng, X.: GERE: generative evidence retrieval for fact verification. In: SIGIR, pp. 2184–2189 (2022)

    Google Scholar 

  4. Chen, J., Zhang, R., Guo, J., Liu, Y., Fan, Y., Cheng, X.: CorpusBrain: pre-train a generative retrieval model for knowledge-intensive language tasks. In: CIKM, pp. 191–200 (2022)

    Google Scholar 

  5. Chen, J., et al.: Continual learning for generative retrieval over dynamic corpora. In: CIKM, pp. 306–315 (2023)

    Google Scholar 

  6. Chen, J., et al.: A unified generative retriever for knowledge-intensive language tasks via prompt learning. In: SIGIR, pp. 1448–1457 (2023)

    Google Scholar 

  7. Chen, X., Liu, Y., He, B., Sun, L., Sun, Y.: Understanding differential search index for text retrieval. In: Findings of ACL, pp. 10701–10717 (2023)

    Google Scholar 

  8. De Cao, N., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. In: ICLR (2021)

    Google Scholar 

  9. Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: ICML, pp. 3929–3938 (2020)

    Google Scholar 

  10. Kwiatkowski, T., et al.: Natural questions: a benchmark for question answering research. Trans. Assoc. Comput. Linguist.7, 452–466 (2019)

    Google Scholar 

  11. Lee, H., et al.: Nonparametric decoding for generative retrieval. In: Findings of the ACL 2023, pp. 12642–12661 (2023)

    Google Scholar 

  12. Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Learning to rank in generative retrieval. arXiv preprintarXiv:2306.15222 (2023)

  13. Li, Y., Yang, N., Wang, L., Wei, F., Li, W.: Multiview identifiers enhanced generative retrieval. In: ACL, pp. 6636–6648 (2023)

    Google Scholar 

  14. Liu, S., Xiao, F., Ou, W., Si, L.: Cascade ranking for operational e-commerce search. In: KDD, pp. 1557–1565 (2017)

    Google Scholar 

  15. Liu, Y.A., Zhang, R., Guo, J., Chen, W., Cheng, X.: On the robustness of generative retrieval models: an out-of-distribution perspective. In: Gen-IR@SIGIR (2023)

    Google Scholar 

  16. Metzler, D., Tay, Y., Bahri, D., Najork, M.: Rethinking search: making domain experts out of dilettantes. SIGIR Forum55(1), 1–27 (2021)

    Article  Google Scholar 

  17. Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprintarXiv:1602.01137 (2016)

  18. Nadeem, U., Ziems, N., Wu, S.: CodeDSI: differentiable code search. arXiv preprintarXiv:2210.00328 (2022)

  19. Najork, M.: Generative information retrieval. In: SIGIR, p. 1 (2023)

    Google Scholar 

  20. Nguyen, T., Yates, A.: Generative retrieval as dense retrieval. In: Gen-IR@SIGIR (2023)

    Google Scholar 

  21. Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (2016)

    Google Scholar 

  22. Pradeep, R., et al.: How does generative retrieval scale to millions of passages? In: Gen-IR@SIGIR (2023)

    Google Scholar 

  23. Ren, R., Zhao, W.X., Liu, J., Wu, H., Wen, J.R., Wang, H.: TOME: a two-stage approach for model-based retrieval. In: ACL, pp. 6102–6114 (2023)

    Google Scholar 

  24. Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW, pp. 13–19 (2004)

    Google Scholar 

  25. Sun, W., et al.: Learning to tokenize for generative retrieval. In: NeurIPS (2023)

    Google Scholar 

  26. Tang, Y., et al.: Semantic-enhanced differentiable search index inspired by learning strategies. In: KDD, pp. 4904–4913 (2023)

    Google Scholar 

  27. Tay, Y., et al.: Transformer memory as a differentiable search index. In: NeurIPS, vol. 35, pp. 21831–21843 (2022)

    Google Scholar 

  28. Wang, Y., et al.: A neural corpus indexer for document retrieval. In: NeurIPS, vol. 35, pp. 25600–25614 (2022)

    Google Scholar 

  29. Wang, Z., Zhou, Y., Tu, Y., Dou, Z.: NOVO: learnable and interpretable document identifiers for model-based IR. In: CIKM, pp. 2656–2665 (2023)

    Google Scholar 

  30. Zhang, P., Liu, Z., Zhou, Y., Dou, Z., Cao, Z.: Term-sets can be strong document identifiers for auto-regressive search engines. arXiv preprintarXiv:2305.13859 (2023)

  31. Zhou, Y.J., Yao, J., Dou, Z.C., Wu, L., Wen, J.R.: DynamicRetriever: a pre-trained model-based IR system without an explicit index. Mach. Intell. Res.20(2), 276–288 (2023)

    Article  Google Scholar 

  32. Zhou, Y., Yao, J., Dou, Z., Wu, L., Zhang, P., Wen, J.R.: Ultron: an ultimate retriever on corpus with a model-based indexer. arXiv preprintarXiv:2208.09257 (2022)

  33. Zhuang, S., et al.: Bridging the gap between indexing and retrieval for differentiable search index with query generation. In: Gen-IR@SIGIR (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. CAS Key Lab of Network Data Science and Technology, ICT, CAS, Beijing, China

    Yubao Tang, Ruqing Zhang & Jiafeng Guo

  2. University of Chinese Academy of Sciences, Beijing, China

    Yubao Tang, Ruqing Zhang & Jiafeng Guo

  3. Leiden University, Leiden, The Netherlands

    Zhaochun Ren

  4. University of Amsterdam, Amsterdam, The Netherlands

    Maarten de Rijke

Authors
  1. Yubao Tang

    You can also search for this author inPubMed Google Scholar

  2. Ruqing Zhang

    You can also search for this author inPubMed Google Scholar

  3. Zhaochun Ren

    You can also search for this author inPubMed Google Scholar

  4. Jiafeng Guo

    You can also search for this author inPubMed Google Scholar

  5. Maarten de Rijke

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toYubao Tang.

Editor information

Editors and Affiliations

  1. Georgetown University, Washington, WA, USA

    Nazli Goharian

  2. University of Pisa, PISA, Pisa, Italy

    Nicola Tonellotto

  3. King's College London, London, UK

    Yulan He

  4. University College London, London, UK

    Aldo Lipani

  5. University of Glasgow, Glasgow, UK

    Graham McDonald

  6. University of Glasgow, Glasgow, UK

    Craig Macdonald

  7. University of Glasgow, Glasgow, UK

    Iadh Ounis

Rights and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, Y., Zhang, R., Ren, Z., Guo, J., de Rijke, M. (2024). Recent Advances in Generative Information Retrieval. In: Goharian, N.,et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14612. Springer, Cham. https://doi.org/10.1007/978-3-031-56069-9_48

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 18589
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp