Vanilla RAG 🌱

RAG(Retrieval-Augmented Generation) works by finding documents related to the user's question, combining them with a prompt for a large language model (LLM), and then using the LLM to create more accurate and relevant answers.

Here’s a simple guide to building a RAG pipeline from scratch:

Data Loading: Gather and load the documents you want to use for answering questions.
Chunking and Embedding: Split the documents into smaller chunks and convert them into numerical vectors (embeddings) that capture their meaning.
Vector Store: Create a LanceDB table to store and manage these vectors for quick access during retrieval.
Retrieval & Prompt Preparation: When a question is asked, find the most relevant document chunks from the table and prepare a prompt combining these chunks with the question.
Answer Generation: Send the prepared prompt to a LLM to generate a detailed and accurate answer.

Here’s a code snippet for defining a table with theEmbedding API, which simplifies the process by handling embedding extraction and querying in one step.

importpandasaspdimportlancedbfromlancedb.pydanticimportLanceModel,Vectorfromlancedb.embeddingsimportget_registrydb=lancedb.connect("/tmp/db")model=get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5",device="cpu")classDocs(LanceModel):text:str=model.SourceField()vector:Vector(model.ndims())=model.VectorField()table=db.create_table("docs",schema=Docs)# considering chunks are in list formatdf=pd.DataFrame({'text':chunks})table.add(data=df)query="What is issue date of lease?"actual=table.search(query).limit(1).to_list()[0]print(actual.text)

Check Colab for the complete code

Movatterモバイル変換

Vanilla RAG 🌱