neo4j/neo4j-graphrag-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork106
Star644

Neo4j GraphRAG for Python

neo4j.com/docs/neo4j-graphrag-python/current/

License

View license

644 stars 106 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 331 Commits
.github		.github
docs		docs
examples		examples
images		images
src/neo4j_graphrag		src/neo4j_graphrag
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.snyk		.snyk
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.APACHE2.txt		LICENSE.APACHE2.txt
LICENSE.PYTHON.txt		LICENSE.PYTHON.txt
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
poetry.lock		poetry.lock
pr_agent.toml		pr_agent.toml
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Repository files navigation

Neo4j GraphRAG Package for Python

The official Neo4j GraphRAG package for Python enables developers to buildgraph retrieval augmented generation (GraphRAG) applications using the power of Neo4j and Python.As a first-party library, it offers a robust, feature-rich, and high-performance solution, with the added assurance of long-term support and maintenance directly from Neo4j.

📄 Documentation

Documentation can be foundhere

Resources

A series of blog posts demonstrating how to use this package:

Build a Knowledge Graph and use GenAI to answer questions:
- GraphRAG Python Package: Accelerating GenAI With Knowledge Graphs
Retrievers: when the Neo4j graph is already populated:

A list of Neo4j GenAI-related features can also be found atNeo4j GenAI Ecosystem.

🐍 Python Version Support

Version	Supported?
3.13	✓
3.12	✓
3.11	✓
3.10	✓
3.9	✓
3.8	✗

📦 Installation

To install the latest stable version, run:

pip install neo4j-graphrag

Optional Dependencies

This package has some optional features that can be enabled usingthe extra dependencies described below:

LLM providers (at least one is required for RAG and KG Builder Pipeline):
- ollama: LLMs from Ollama
- openai: LLMs from OpenAI (including AzureOpenAI)
- google: LLMs from Vertex AI
- cohere: LLMs from Cohere
- anthropic: LLMs from Anthropic
- mistralai: LLMs from MistralAI
sentence-transformers : to use embeddings from thesentence-transformers Python package
Vector database (to use :ref:External Retrievers):
- weaviate: store vectors in Weaviate
- pinecone: store vectors in Pinecone
- qdrant: store vectors in Qdrant
experimental: experimental features mainly related to the Knowledge Graph creation pipelines.

Install package with optional dependencies with (for instance):

pip install"neo4j-graphrag[openai]"

💻 Example Usage

The scripts below demonstrate how to get started with the package and make use of its key features.To run these examples, ensure that you have a Neo4j instance up and running and update theNEO4J_URI,NEO4J_USERNAME, andNEO4J_PASSWORD variables in each script with the details of your Neo4j instance.For the examples, make sure to export your OpenAI key as an environment variable namedOPENAI_API_KEY.Additional examples are available in theexamples folder.

Knowledge Graph Construction

NOTE: TheAPOC core library must be installed in your Neo4j instance in order to use this feature

This package offers two methods for constructing a knowledge graph.

ThePipeline class provides extensive customization options, making it ideal for advanced use cases.See theexamples/pipeline folder for examples of how to use this class.

For a more streamlined approach, theSimpleKGPipeline class offers a simplified abstraction layer over thePipeline, making it easier to build knowledge graphs.Both classes support working directly with text and PDFs.

importasynciofromneo4jimportGraphDatabasefromneo4j_graphrag.embeddingsimportOpenAIEmbeddingsfromneo4j_graphrag.experimental.pipeline.kg_builderimportSimpleKGPipelinefromneo4j_graphrag.llmimportOpenAILLMNEO4J_URI="neo4j://localhost:7687"NEO4J_USERNAME="neo4j"NEO4J_PASSWORD="password"# Connect to the Neo4j databasedriver=GraphDatabase.driver(NEO4J_URI,auth=(NEO4J_USERNAME,NEO4J_PASSWORD))# List the entities and relations the LLM should look for in the textnode_types= ["Person","House","Planet"]relationship_types= ["PARENT_OF","HEIR_OF","RULES"]patterns= [    ("Person","PARENT_OF","Person"),    ("Person","HEIR_OF","House"),    ("House","RULES","Planet"),]# Create an Embedder objectembedder=OpenAIEmbeddings(model="text-embedding-3-large")# Instantiate the LLMllm=OpenAILLM(model_name="gpt-4o",model_params={"max_tokens":2000,"response_format": {"type":"json_object"},"temperature":0,    },)# Instantiate the SimpleKGPipelinekg_builder=SimpleKGPipeline(llm=llm,driver=driver,embedder=embedder,schema={"node_types":node_types,"relationship_types":relationship_types,"patterns":patterns,    },on_error="IGNORE",from_pdf=False,)# Run the pipeline on a piece of texttext= ("The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House ""Atreides, an aristocratic family that rules the planet Caladan.")asyncio.run(kg_builder.run_async(text=text))driver.close()

Warning: In order to run this code, theopenai Python package needs to be installed:pip install "neo4j_graphrag[openai]"

Example knowledge graph created using the above script:

Creating a Vector Index

When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions your embeddings have.

fromneo4jimportGraphDatabasefromneo4j_graphrag.indexesimportcreate_vector_indexNEO4J_URI="neo4j://localhost:7687"NEO4J_USERNAME="neo4j"NEO4J_PASSWORD="password"INDEX_NAME="vector-index-name"# Connect to the Neo4j databasedriver=GraphDatabase.driver(NEO4J_URI,auth=(NEO4J_USERNAME,NEO4J_PASSWORD))# Create the indexcreate_vector_index(driver,INDEX_NAME,label="Chunk",embedding_property="embedding",dimensions=3072,similarity_fn="euclidean",)driver.close()

Populating a Vector Index

This example demonstrates one method for upserting data in your Neo4j database.It's important to note that there are alternative approaches, such as using theNeo4j Python driver.

Ensure that your vector index is created prior to executing this example.

fromneo4jimportGraphDatabasefromneo4j_graphrag.embeddingsimportOpenAIEmbeddingsfromneo4j_graphrag.indexesimportupsert_vectorsfromneo4j_graphrag.typesimportEntityTypeNEO4J_URI="neo4j://localhost:7687"NEO4J_USERNAME="neo4j"NEO4J_PASSWORD="password"# Connect to the Neo4j databasedriver=GraphDatabase.driver(NEO4J_URI,auth=(NEO4J_USERNAME,NEO4J_PASSWORD))# Create an Embedder objectembedder=OpenAIEmbeddings(model="text-embedding-3-large")# Generate an embedding for some texttext= ("The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House ""Atreides, an aristocratic family that rules the planet Caladan.")vector=embedder.embed_query(text)# Upsert the vectorupsert_vectors(driver,ids=["1234"],embedding_property="vectorProperty",embeddings=[vector],entity_type=EntityType.NODE,)driver.close()

Performing a Similarity Search

Please note that when querying a Neo4j vector indexapproximate nearest neighbor search is used, which may not always deliver exact results.For more information, refer to the Neo4j documentation onlimitations and issues of vector indexes.

In the example below, we perform a simple vector search using a retriever that conducts a similarity search over thevector-index-name vector index.

This library provides more retrievers beyond just theVectorRetriever.See theexamples folder for examples of how to use these retrievers.

Before running this example, make sure your vector index has been created and populated.

fromneo4jimportGraphDatabasefromneo4j_graphrag.embeddingsimportOpenAIEmbeddingsfromneo4j_graphrag.generationimportGraphRAGfromneo4j_graphrag.llmimportOpenAILLMfromneo4j_graphrag.retrieversimportVectorRetrieverNEO4J_URI="neo4j://localhost:7687"NEO4J_USERNAME="neo4j"NEO4J_PASSWORD="password"INDEX_NAME="vector-index-name"# Connect to the Neo4j databasedriver=GraphDatabase.driver(NEO4J_URI,auth=(NEO4J_USERNAME,NEO4J_PASSWORD))# Create an Embedder objectembedder=OpenAIEmbeddings(model="text-embedding-3-large")# Initialize the retrieverretriever=VectorRetriever(driver,INDEX_NAME,embedder)# Instantiate the LLMllm=OpenAILLM(model_name="gpt-4o",model_params={"temperature":0})# Instantiate the RAG pipelinerag=GraphRAG(retriever=retriever,llm=llm)# Query the graphquery_text="Who is Paul Atreides?"response=rag.search(query_text=query_text,retriever_config={"top_k":5})print(response.answer)driver.close()

🤝 Contributing

You must sign thecontributors license agreement in order to make contributions to this project.

Install Dependencies

Our Python dependencies are managed using Poetry.If Poetry is not yet installed on your system, you can follow the instructionshere to set it up.To begin development on this project, start by cloning the repository and then install all necessary dependencies, including the development dependencies, with the following command:

poetry install --with dev

Reporting Issues

If you have a bug to report or feature to request, firstsearch to see if an issue already exists.If a related issue doesn't exist, please raise a new issue using theissue form.

If you're a Neo4j Enterprise customer, you can also reach out toCustomer Support.

If you don't have a bug to report or feature request, but you need a hand withthe library; community support is available viaNeo4j Online Communityand/orDiscord.

Workflow for Contributions

Fork the repository.
Install Python and Poetry.
Create a working branch frommain and start with your changes!

Code Formatting and Linting

Our codebase follows strict formatting and linting standards usingRuff for code quality checks andMypy for type checking.Before contributing, ensure that all code is properly formatted, free of linting issues, and includes accurate type annotations.

To install Ruff, follow the instructionshere.
To set up Mypy, follow the steps outlinedhere.

Adherence to these standards is required for contributions to be accepted.

Using Pre-commit

We recommend setting uppre-commit to automate code quality checks.This ensures your changes meet our guidelines before committing.

Install pre-commit by following theinstallation guide.
Set up the pre-commit hooks by running:
```
pre-commit install
```
To manually check if a file meets the quality requirements, run:
```
pre-commit run --file path/to/file
```

Pull Requests

When you're finished with your changes, create a pull request (PR) using the following workflow.

Ensure you have formatted and linted your code.
Ensure that you havesigned the CLA.
Ensure that the base of your PR is set tomain.
Don't forget tolink your PR to an issueif you are solving one.
Check the checkbox toallow maintainer editsso that maintainers can make any necessary tweaks and update your branch for merge.
Reviewers may ask for changes to be made before a PR can be merged, either usingsuggested changesor normal pull request comments. You can apply suggested changes directly throughthe UI. Any other changes can be made in your fork and committed to the PR branch.
As you update your PR and apply changes, mark each conversation asresolved.
Update theCHANGELOG.md if you have made significant changes to the project, these include:
- Major changes:
  - New features
  - Bug fixes with high impact
  - Breaking changes
- Minor changes:
  - Documentation improvements
  - Code refactoring without functional impact
  - Minor bug fixes
KeepCHANGELOG.md changes brief and focus on the most important changes.

Updating the`CHANGELOG.md`

You can automatically generate a changelog suggestion for your PR by commenting on itusing CodiumAI:

@CodiumAI-Agent /update_changelog

Edit the suggestion if necessary and update the appropriate subsection in theCHANGELOG.md file under 'Next'.
Commit the changes.

🧪 Tests

To be able to run all tests, all extra packages needs to be installed.This is achieved by:

poetry install --all-extras

Unit Tests

Install the project dependencies then run the following command to run the unit tests locally:

poetry run pytest tests/unit

E2E tests

To execute end-to-end (e2e) tests, you need the following services to be running locally:

neo4j
weaviate
weaviate-text2vec-transformers

The simplest way to set these up is by using Docker Compose:

docker compose -f tests/e2e/docker-compose.yml up

(tip: If you encounter any caching issues within the databases, you can completely remove them by runningdocker compose -f tests/e2e/docker-compose.yml down)

Once all the services are running, execute the following command to run the e2e tests: