HyDE: Hypothetical Document Embeddings 🤹‍♂️

HyDE, stands for Hypothetical Document Embeddings is an approach used for precise zero-shot dense retrieval without relevance labels. It focuses on augmenting and improving similarity searches, often intertwined with vector stores in information retrieval. The method generates a hypothetical document for an incoming query, which is then embedded and used to look up real documents that are similar to the hypothetical document.

Official Paper

Here’s a code snippet for using HyDE with Langchain:

fromlangchain.llmsimportOpenAIfromlangchain.embeddingsimportOpenAIEmbeddingsfromlangchain.promptsimportPromptTemplatefromlangchain.chainsimportLLMChain,HypotheticalDocumentEmbedderfromlangchain.vectorstoresimportLanceDB# set OPENAI_API_KEY as env variable before this step# initialize LLM and embedding functionllm=OpenAI()emebeddings=OpenAIEmbeddings()# HyDE embeddingembeddings=HypotheticalDocumentEmbedder(llm_chain=llm_chain,base_embeddings=embeddings)# load dataset# LanceDB retrieverretriever=LanceDB.from_documents(documents,embeddings,connection=table)# prompt templateprompt_template="""As a knowledgeable and helpful research assistant, your task is to provide informative answers based on the given context. Use your extensive knowledge base to offer clear, concise, and accurate responses to the user's inquiries.if quetion is not related to documents simply say you dont knowQuestion:{question}Answer:"""prompt=PromptTemplate(input_variables=["question"],template=prompt_template)# LLM Chainllm_chain=LLMChain(llm=llm,prompt=prompt)# vector searchretriever.similarity_search(query)llm_chain.run(query)

Movatterモバイル変換

HyDE: Hypothetical Document Embeddings 🤹‍♂️