Graph RAG
This guide provides an introduction to Graph RAG. For detailed documentation of allsupported features and configurations, refer to theGraph RAG Project Page.
Overview
TheGraphRetriever
from thelangchain-graph-retriever
package provides a LangChainretriever that combinesunstructured similarity searchon vectors withstructured traversal of metadata properties. This enables graph-basedretrieval over anexisting vector store.
Integration details
Retriever | Source | PyPI Package | Latest | Project Page |
---|---|---|---|---|
GraphRetriever | github.com/datastax/graph-rag | langchain-graph-retriever | Graph RAG |
Benefits
Link based on existing metadata:Use existing metadata fields without additional processing. Retrieve more from anexisting vector store!
Change links on demand:Edges can be specified on-the-fly, allowing different relationships to be traversedbased on the question.
Pluggable Traversal Strategies:Use built-in traversal strategies like Eager or MMR, or define custom logic to selectwhich nodes to explore.
Broad compatibility:Adapters are available for a variety of vector stores with support for additionalstores easily added.
Setup
Installation
This retriever lives in thelangchain-graph-retriever
package.
pip install -qU langchain-graph-retriever
Instantiation
The following examples will show how to perform graph traversal over some sampleDocuments about animals.
Prerequisites
Toggle for Details
Ensure you have Python 3.10+ installed
Install the following package that provides sample data.
pip install -qU graph_rag_example_helpers
Download the test documents:
from graph_rag_example_helpers.datasets.animalsimport fetch_documents
animals= fetch_documents()- Selectembeddings model:
pip install -qU langchain-openai
import getpass
import os
ifnot os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"]= getpass.getpass("Enter API key for OpenAI: ")
from langchain_openaiimport OpenAIEmbeddings
embeddings= OpenAIEmbeddings(model="text-embedding-3-large")
Populating the Vector store
This section shows how to populate a variety of vector stores with the sample data.
For help on choosing one of the vector stores below, or to add support for yourvector store, consult the documentation aboutAdapters and Supported Stores.
- AstraDB
- Apache Cassandra
- OpenSearch
- Chroma
- InMemory
Install thelangchain-graph-retriever
package with theastra
extra:
pip install "langchain-graph-retriever[astra]"
Then create a vector store and load the test documents:
from langchain_astradbimport AstraDBVectorStore
vector_store= AstraDBVectorStore.from_documents(
documents=animals,
embedding=embeddings,
collection_name="animals",
api_endpoint=ASTRA_DB_API_ENDPOINT,
token=ASTRA_DB_APPLICATION_TOKEN,
)
For theASTRA_DB_API_ENDPOINT
andASTRA_DB_APPLICATION_TOKEN
credentials,consult theAstraDB Vector Store Guide.
For faster initial testing, consider using theInMemory Vector Store.
Install thelangchain-graph-retriever
package with thecassandra
extra:
pip install "langchain-graph-retriever[cassandra]"
Then create a vector store and load the test documents:
from langchain_community.vectorstores.cassandraimport Cassandra
from langchain_graph_retriever.transformersimport ShreddingTransformer
vector_store= Cassandra.from_documents(
documents=list(ShreddingTransformer().transform_documents(animals)),
embedding=embeddings,
table_name="animals",
)
For help creating a Cassandra connection, consult theApache Cassandra Vector Store Guide
Apache Cassandra doesn't support searching in nested metadata. Because of thisit is necessary to use theShreddingTransformer
when inserting documents.
Install thelangchain-graph-retriever
package with theopensearch
extra:
pip install "langchain-graph-retriever[opensearch]"
Then create a vector store and load the test documents:
from langchain_community.vectorstoresimport OpenSearchVectorSearch
vector_store= OpenSearchVectorSearch.from_documents(
documents=animals,
embedding=embeddings,
engine="faiss",
index_name="animals",
opensearch_url=OPEN_SEARCH_URL,
bulk_size=500,
)
For help creating an OpenSearch connection, consult theOpenSearch Vector Store Guide.
Install thelangchain-graph-retriever
package with thechroma
extra:
pip install "langchain-graph-retriever[chroma]"
Then create a vector store and load the test documents:
from langchain_chroma.vectorstoresimport Chroma
from langchain_graph_retriever.transformersimport ShreddingTransformer
vector_store= Chroma.from_documents(
documents=list(ShreddingTransformer().transform_documents(animals)),
embedding=embeddings,
collection_name="animals",
)
For help creating an Chroma connection, consult theChroma Vector Store Guide.
Chroma doesn't support searching in nested metadata. Because of thisit is necessary to use theShreddingTransformer
when inserting documents.
Install thelangchain-graph-retriever
package:
pip install "langchain-graph-retriever"
Then create a vector store and load the test documents:
from langchain_core.vectorstoresimport InMemoryVectorStore
vector_store= InMemoryVectorStore.from_documents(
documents=animals,
embedding=embeddings,
)
Using theInMemoryVectorStore
is the fastest way to get started with Graph RAGbut it isn't recommended for production use. Instead it is recommended to useAstraDB orOpenSearch.
Graph Traversal
This graph retriever starts with a single animal that best matches the query, thentraverses to other animals sharing the samehabitat
and/ororigin
.
from graph_retriever.strategiesimport Eager
from langchain_graph_retrieverimport GraphRetriever
traversal_retriever= GraphRetriever(
store= vector_store,
edges=[("habitat","habitat"),("origin","origin")],
strategy= Eager(k=5, start_k=1, max_depth=2),
)
The above creates a graph traversing retriever that starts with the nearestanimal (start_k=1
), retrieves 5 documents (k=5
) and limits the search to documentsthat are at most 2 steps away from the first animal (max_depth=2
).
Theedges
define how metadata values can be used for traversal. In this case, everyanimal is connected to other animals with the samehabitat
and/ororigin
.
results= traversal_retriever.invoke("what animals could be found near a capybara?")
for docin results:
print(f"{doc.id}:{doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals.
heron: herons are wading birds known for their long legs and necks, often seen near water.
crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years.
frog: frogs are amphibians known for their jumping ability and croaking sounds.
duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.
Graph traversal improves retrieval quality by leveraging structured relationships inthe data. Unlike standard similarity search (see below), it provides a clear,explainable rationale for why documents are selected.
In this case, the documentscapybara
,heron
,frog
,crocodile
, andnewt
allshare the samehabitat=wetlands
, as defined by their metadata. This should increaseDocument Relevance and the quality of the answer from the LLM.
Comparison to Standard Retrieval
Whenmax_depth=0
, the graph traversing retriever behaves like a standard retriever:
standard_retriever= GraphRetriever(
store= vector_store,
edges=[("habitat","habitat"),("origin","origin")],
strategy= Eager(k=5, start_k=5, max_depth=0),
)
This creates a retriever that starts with the nearest 5 animals (start_k=5
),and returns them without any traversal (max_depth=0
). The edge definitionsare ignored in this case.
This is essentially the same as:
standard_retriever= vector_store.as_retriever(search_kwargs={"k":5})
For either case, invoking the retriever returns:
results= standard_retriever.invoke("what animals could be found near a capybara?")
for docin results:
print(f"{doc.id}:{doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals.
iguana: iguanas are large herbivorous lizards often found basking in trees and near water.
guinea pig: guinea pigs are small rodents often kept as pets due to their gentle and social nature.
hippopotamus: hippopotamuses are large semi-aquatic mammals known for their massive size and territorial behavior.
boar: boars are wild relatives of pigs, known for their tough hides and tusks.
These documents are joined based on similarity alone. Any structural data that existedin the store is ignored. As compared to graph retrieval, this can decrease DocumentRelevance because the returned results have a lower chance of being helpful to answerthe query.
Usage
Following the examples above,.invoke
is used to initiate retrieval on a query.
Use within a chain
Like other retrievers,GraphRetriever
can be incorporated into LLM applicationsviachains.
pip install -qU "langchain[google-genai]"
import getpass
import os
ifnot os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"]= getpass.getpass("Enter API key for Google Gemini: ")
from langchain.chat_modelsimport init_chat_model
llm= init_chat_model("gemini-2.0-flash", model_provider="google_genai")
from langchain_core.output_parsersimport StrOutputParser
from langchain_core.promptsimport ChatPromptTemplate
from langchain_core.runnablesimport RunnablePassthrough
prompt= ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.
Context: {context}
Question: {question}"""
)
defformat_docs(docs):
return"\n\n".join(f"text:{doc.page_content} metadata:{doc.metadata}"for docin docs)
chain=(
{"context": traversal_retriever| format_docs,"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke("what animals could be found near a capybara?")
Animals that could be found near a capybara include herons, crocodiles, frogs,
and ducks, as they all inhabit wetlands.
API reference
To explore all available parameters and advanced configurations, refer to theGraph RAG API reference.
Related
- Retrieverconceptual guide
- Retrieverhow-to guides