We've raised a $125M Series B to build the platform for agent engineering.Read more.

LangChain

LangGraph

Learn

Tutorials

Conceptual overviews

Additional resources

Tutorials

LangChain

Build a RAG agent with LangChain

Overview

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, orRAG.This tutorial will show how to build a simple Q&A application over an unstructured text data source. We will demonstrate:

A RAGagent that executes searches with a simple tool. This is a good general-purpose implementation.
A two-step RAGchain that uses just a single LLM call per query. This is a fast and effective method for simple queries.

Concepts

We will cover the following concepts:

Indexing: a pipeline for ingesting data from a source and indexing it.This usually happens in a separate process.
Retrieval and generation: the actual RAG process, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

Once we’ve indexed our data, we will use anagent as our orchestration framework to implement the retrieval and generation steps.

The indexing portion of this tutorial will largely follow thesemantic search tutorial.If your data is already available for search (i.e., you have a function to execute a search), or you’re comfortable with the content from that tutorial, feel free to skip to the section onretrieval and generation

Preview

In this guide we’ll build an app that answers questions about the website’s content. The specific website we will use is theLLM Powered Autonomous Agents blog post by Lilian Weng, which allows us to ask questions about the contents of the post.We can create a simple indexing pipeline and RAG chain to do this in ~40 lines of code. See below for the full code snippet:

Expand for full code snippet

import bs4from langchain.agentsimport AgentState, create_agentfrom langchain_community.document_loadersimport WebBaseLoaderfrom langchain.messagesimport MessageLikeRepresentationfrom langchain_text_splittersimport RecursiveCharacterTextSplitter# Load and chunk contents of the blogloader= WebBaseLoader(    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),    bs_kwargs=dict(        parse_only=bs4.SoupStrainer(            class_=("post-content","post-title","post-header")        )    ),)docs= loader.load()text_splitter= RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)all_splits= text_splitter.split_documents(docs)# Index chunks_= vector_store.add_documents(documents=all_splits)# Construct a tool for retrieving context@tool(response_format="content_and_artifact")def retrieve_context(query:str):    """Retrieve information to help answer a query."""    retrieved_docs= vector_store.similarity_search(query,k=2)    serialized= "\n\n".join(        (f"Source:{doc.metadata}\nContent:{doc.page_content}")        for docin retrieved_docs    )    return serialized, retrieved_docstools= [retrieve_context]# If desired, specify custom instructionsprompt= (    "You have access to a tool that retrieves context from a blog post. "    "Use the tool to help answer user queries.")agent= create_agent(model, tools,system_prompt=prompt)

query= "What is task decomposition?"for stepin agent.stream(    {"messages": [{"role":"user","content": query}]},    stream_mode="values",):    step["messages"][-1].pretty_print()

================================ Human Message =================================What is task decomposition?================================== Ai Message ==================================Tool Calls:  retrieve_context (call_xTkJr8njRY0geNz43ZvGkX0R) Call ID: call_xTkJr8njRY0geNz43ZvGkX0R  Args:    query: task decomposition================================= Tool Message =================================Name: retrieve_contextSource: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Task decomposition can be done by...Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Component One: Planning...================================== Ai Message ==================================Task decomposition refers to...

Check out theLangSmith trace.

Setup

Installation

This tutorial requires these langchain dependencies:

pip install langchain langchain-text-splitters langchain-community bs4

For more details, see ourInstallation guide.

LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is withLangSmith.After you sign up at the link above, make sure to set your environment variables to start logging traces:

export LANGSMITH_TRACING="true"export LANGSMITH_API_KEY="..."

Or, set them in Python:

import getpassimport osos.environ["LANGSMITH_TRACING"]= "true"os.environ["LANGSMITH_API_KEY"]= getpass.getpass()

Components

We will need to select three components from LangChain’s suite of integrations.Select a chat model:

OpenAI
Anthropic
Azure
Google Gemini
AWS Bedrock

👉 Read theOpenAI chat model integration docs

pip install -U "langchain[openai]"

import osfrom langchain.chat_modelsimport init_chat_modelos.environ["OPENAI_API_KEY"]= "sk-..."model= init_chat_model("gpt-4.1")

Select an embeddings model:

OpenAI
Azure
Google Gemini
Google Vertex
AWS
HuggingFace
Ollama
Cohere
MistralAI
Nomic
NVIDIA
Voyage AI
IBM watsonx
Fake

pip install -U "langchain-openai"

import getpassimport osif not os.environ.get("OPENAI_API_KEY"):os.environ["OPENAI_API_KEY"]= getpass.getpass("Enter API key for OpenAI: ")from langchain_openaiimport OpenAIEmbeddingsembeddings= OpenAIEmbeddings(model="text-embedding-3-large")

Select a vector store:

In-memory
AstraDB
Chroma
FAISS
Milvus
MongoDB
PGVector
PGVectorStore
Pinecone
Qdrant

pip install -U "langchain-core"

from langchain_core.vectorstoresimport InMemoryVectorStorevector_store= InMemoryVectorStore(embeddings)

1. Indexing

This section is an abbreviated version of the content in thesemantic search tutorial.If your data is already indexed and available for search (i.e., you have a function to execute a search), or if you’re comfortable withdocument loaders,embeddings, andvector stores, feel free to skip to the next section onretrieval and generation.

Indexing commonly works as follows:

Load: First we need to load our data. This is done withDocument Loaders.
Split:Text splitters break largeDocuments into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won’t fit in a model’s finite context window.
Store: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using aVectorStore andEmbeddings model.

Loading documents

We need to first load the blog post contents. We can useDocumentLoaders for this, which are objects that load in data from a source and return a list ofDocument objects.In this case we’ll use theWebBaseLoader, which usesurllib to load HTML from web URLs andBeautifulSoup to parse it to text. We can customize the HTML -> text parsing by passing in parameters into theBeautifulSoup parser viabs_kwargs (seeBeautifulSoup docs). In this case only HTML tags with class “post-content”, “post-title”, or “post-header” are relevant, so we’ll remove all others.

import bs4from langchain_community.document_loadersimport WebBaseLoader# Only keep post title, headers, and content from the full HTML.bs4_strainer= bs4.SoupStrainer(class_=("post-title","post-header","post-content"))loader= WebBaseLoader(    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),    bs_kwargs={"parse_only": bs4_strainer},)docs= loader.load()assert len(docs)== 1print(f"Total characters:{len(docs[0].page_content)}")

Total characters: 43131

print(docs[0].page_content[:500])

      LLM Powered Autonomous AgentsDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian WengBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.Agent System Overview#In

Go deeperDocumentLoader: Object that loads data from a source as list ofDocuments.

Integrations: 160+ integrations to choose from.
BaseLoader: API reference for the base interface.

Splitting documents

Our loaded document is over 42k characters which is too long to fit into the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.To handle this we’ll split theDocument into chunks for embedding and vector storage. This should help us retrieve only the most relevant parts of the blog post at run time.As in thesemantic search tutorial, we use aRecursiveCharacterTextSplitter, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

from langchain_text_splittersimport RecursiveCharacterTextSplittertext_splitter= RecursiveCharacterTextSplitter(    chunk_size=1000,# chunk size (characters)    chunk_overlap=200,# chunk overlap (characters)    add_start_index=True,# track index in original document)all_splits= text_splitter.split_documents(docs)print(f"Split blog post into{len(all_splits)} sub-documents.")

Split blog post into 66 sub-documents.

Go deeperTextSplitter: Object that splits a list ofDocument objects into smallerchunks for storage and retrieval.

Integrations
Interface: API reference for the base interface.

Storing documents

Now we need to index our 66 text chunks so that we can search over them at runtime. Following thesemantic search tutorial, our approach is toembed the contents of each document split and insert these embeddings into avector store. Given an input query, we can then use vector search to retrieve relevant documents.We can embed and store all of our document splits in a single command using the vector store and embeddings model selected at thestart of the tutorial.

document_ids= vector_store.add_documents(documents=all_splits)print(document_ids[:3])

['07c18af6-ad58-479a-bfb1-d508033f9c64', '9000bf8e-1993-446f-8d4d-f4e507ba4b8f', 'ba3b5d14-bed9-4f5f-88be-44c88aedc2e6']

Go deeperEmbeddings: Wrapper around a text embedding model, used for converting text to embeddings.

Integrations: 30+ integrations to choose from.
Interface: API reference for the base interface.

VectorStore: Wrapper around a vector database, used for storing and querying embeddings.

Integrations: 40+ integrations to choose from.
Interface: API reference for the base interface.

This completes theIndexing portion of the pipeline. At this point we have a query-able vector store containing the chunked contents of our blog post. Given a user question, we should ideally be able to return the snippets of the blog post that answer the question.

2. Retrieval and Generation

RAG applications commonly work as follows:

Retrieve: Given a user input, relevant splits are retrieved from storage using aRetriever.
Generate: Amodel produces an answer using a prompt that includes both the question with the retrieved data

Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.We will demonstrate:

A RAGagent that executes searches with a simple tool. This is a good general-purpose implementation.
A two-step RAGchain that uses just a single LLM call per query. This is a fast and effective method for simple queries.

RAG agents

One formulation of a RAG application is as a simpleagent with a tool that retrieves information. We can assemble a minimal RAG agent by implementing atool that wraps our vector store:

from langchain.toolsimport tool@tool(response_format="content_and_artifact")def retrieve_context(query:str):    """Retrieve information to help answer a query."""    retrieved_docs= vector_store.similarity_search(query,k=2)    serialized= "\n\n".join(        (f"Source:{doc.metadata}\nContent:{doc.page_content}")        for docin retrieved_docs    )    return serialized, retrieved_docs

Here we use thetool decorator to configure the tool to attach raw documents asartifacts to eachToolMessage. This will let us access document metadata in our application, separate from the stringified representation that is sent to the model.

Retrieval tools are not limited to a single stringquery argument, as in the above example. You canforce the LLM to specify additional search parameters by adding arguments— for example, a category:

from typingimport Literaldef retrieve_context(query:str,section: Literal["beginning","middle","end"]):

Given our tool, we can construct the agent:

from langchain.agentsimport create_agenttools= [retrieve_context]# If desired, specify custom instructionsprompt= (    "You have access to a tool that retrieves context from a blog post. "    "Use the tool to help answer user queries.")agent= create_agent(model, tools,system_prompt=prompt)

Let’s test this out. We construct a question that would typically require an iterative sequence of retrieval steps to answer:

query= (    "What is the standard method for Task Decomposition?\n\n"    "Once you get the answer, look up common extensions of that method.")for eventin agent.stream(    {"messages": [{"role":"user","content": query}]},    stream_mode="values",):    event["messages"][-1].pretty_print()

================================ Human Message =================================What is the standard method for Task Decomposition?Once you get the answer, look up common extensions of that method.================================== Ai Message ==================================Tool Calls:  retrieve_context (call_d6AVxICMPQYwAKj9lgH4E337) Call ID: call_d6AVxICMPQYwAKj9lgH4E337  Args:    query: standard method for Task Decomposition================================= Tool Message =================================Name: retrieve_contextSource: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Task decomposition can be done...Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Component One: Planning...================================== Ai Message ==================================Tool Calls:  retrieve_context (call_0dbMOw7266jvETbXWn4JqWpR) Call ID: call_0dbMOw7266jvETbXWn4JqWpR  Args:    query: common extensions of the standard method for Task Decomposition================================= Tool Message =================================Name: retrieve_contextSource: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Task decomposition can be done...Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}Content: Component One: Planning...================================== Ai Message ==================================The standard method for Task Decomposition often used is the Chain of Thought (CoT)...

Note that the agent:

Generates a query to search for a standard method for task decomposition;
Receiving the answer, generates a second query to search for common extensions of it;
Having received all necessary context, answers the question.

We can see the full sequence of steps, along with latency and other metadata, in theLangSmith trace.

You can add a deeper level of control and customization using theLangGraph framework directly— for example, you can add steps to grade document relevance and rewrite search queries. Check out LangGraph’sAgentic RAG tutorial for more advanced formulations.

RAG chains

In the aboveagentic RAG formulation we allow the LLM to use its discretion in generating atool call to help answer user queries. This is a good general-purpose solution, but comes with some trade-offs:

✅ Benefits	⚠️ Drawbacks
Search only when needed – The LLM can handle greetings, follow-ups, and simple queries without triggering unnecessary searches.	Two inference calls – When a search is performed, it requires one call to generate the query and another to produce the final response.
Contextual search queries – By treating search as a tool with a`query` input, the LLM crafts its own queries that incorporate conversational context.	Reduced control – The LLM may skip searches when they are actually needed, or issue extra searches when unnecessary.
Multiple searches allowed – The LLM can execute several searches in support of a single user query.

Another common approach is a two-step chain, in which we always run a search (potentially using the raw user query) and incorporate the result as context for a single LLM query. This results in a single inference call per query, buying reduced latency at the expense of flexibility.In this approach we no longer call the model in a loop, but instead make a single pass.We can implement this chain by removing tools from the agent and instead incorporating the retrieval step into a custom prompt:

from langchain.agents.middlewareimport dynamic_prompt, ModelRequest@dynamic_promptdef prompt_with_context(request: ModelRequest) ->str:    """Inject context into state messages."""    last_query= request.state["messages"][-1].text    retrieved_docs= vector_store.similarity_search(last_query)    docs_content= "\n\n".join(doc.page_contentfor docin retrieved_docs)    system_message= (        "You are a helpful assistant. Use the following context in your response:"        f"\n\n{docs_content}"    )    return system_messageagent= create_agent(model,tools=[],middleware=[prompt_with_context])

Let’s try this out:

query= "What is task decomposition?"for stepin agent.stream(    {"messages": [{"role":"user","content": query}]},    stream_mode="values",):    step["messages"][-1].pretty_print()

================================ Human Message =================================What is task decomposition?================================== Ai Message ==================================Task decomposition is...

In theLangSmith trace we can see the retrieved context incorporated into the model prompt.This is a fast and effective method for simple queries in constrained settings, when we typically do want to run user queries through semantic search to pull additional context.

Returning source documents

The aboveRAG chain incorporates retrieved context into a single system message for that run.As in theagentic RAG formulation, we sometimes want to include raw source documents in the application state to have access to document metadata. We can do this for the two-step chain case by:

Adding a key to the state to store the retrieved documents
Adding a new node via apre-model hook to populate that key (as well as inject the context).

from typingimport Anyfrom langchain_core.documentsimport Documentfrom langchain.agents.middlewareimport AgentMiddleware, AgentStateclass State(AgentState):    context: list[Document]class RetrieveDocumentsMiddleware(AgentMiddleware[State]):    state_schema= State    def before_model(self,state: AgentState) -> dict[str, Any]| None:        last_message= state["messages"][-1]        retrieved_docs= vector_store.similarity_search(last_message.text)        docs_content= "\n\n".join(doc.page_contentfor docin retrieved_docs)        augmented_message_content= (            f"{last_message.text}\n\n"            "Use the following context to answer the query:\n"            f"{docs_content}"        )        return {            "messages": [last_message.model_copy(update={"content": augmented_message_content})],            "context": retrieved_docs,        }agent= create_agent(    model,    tools=[],    middleware=[RetrieveDocumentsMiddleware()],)

Next steps

Now that we’ve implemented a simple RAG application viacreate_agent, we can easily incorporate new features and go deeper:

Stream tokens and other information for responsive user experiences
Addconversational memory to support multi-turn interactions
Addlong-term memory to support memory across conversational threads
Addstructured responses
Deploy your application withLangSmith Deployments

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Was this page helpful?

Build a semantic search engine with LangChain

Build a SQL agent

⌘I

Movatterモバイル変換

Tutorials

Conceptual overviews

Additional resources

​Overview

​Concepts

​Preview

​Setup

​Installation

​LangSmith

​Components

​1. Indexing

​Loading documents

​Splitting documents

​Storing documents

​2. Retrieval and Generation

​RAG agents

​RAG chains

​Next steps

Overview

Concepts

Preview

Setup

Installation

LangSmith

Components

1. Indexing

Loading documents

Splitting documents

Storing documents

2. Retrieval and Generation

RAG agents

RAG chains

Next steps