Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Neo4j graph construction from unstructured data using LLMs

License

NotificationsYou must be signed in to change notification settings

neo4j-labs/llm-graph-builder

Repository files navigation

PythonFastAPIReact

Transform unstructured data (PDFs, DOCs, TXT, YouTube videos, web pages, etc.) into a structured Knowledge Graph stored in Neo4j using the power of Large Language Models (LLMs) and the LangChain framework.

This application allows you to upload files from various sources (local machine, GCS, S3 bucket, or web sources), choose your preferred LLM model, and generate a Knowledge Graph.


Key Features

Knowledge Graph Creation

  • Seamlessly transform unstructured data into structured Knowledge Graphs using advanced LLMs.
  • Extract nodes, relationships, and their properties to create structured graphs.

Schema Support

  • Use a custom schema or existing schemas configured in the settings to generate graphs.

Graph Visualization

  • View graphs for specific or multiple data sources simultaneously inNeo4j Bloom.

Chat with Data

  • Interact with your data in the Neo4j database through conversational queries.
  • Retrieve metadata about the source of responses to your queries.
  • For a dedicated chat interface, use the standalone chat application with/chat-only route.

LLMs Supported

  1. OpenAI
  2. Gemini
  3. Diffbot
  4. Azure OpenAI(dev deployed version)
  5. Anthropic(dev deployed version)
  6. Fireworks(dev deployed version)
  7. Groq(dev deployed version)
  8. Amazon Bedrock(dev deployed version)
  9. Ollama(dev deployed version)
  10. Deepseek(dev deployed version)
  11. Other OpenAI Compatible baseurl models(dev deployed version)

Getting Started

Prerequisites

  • Neo4j Database5.23 or later with APOC installed.
    • Neo4j Aura databases (including the free tier) are supported.
    • If usingNeo4j Desktop, you will need to deploy the backend and frontend separately (docker-compose is not supported).

Deployment Options

Local Deployment

Using Docker-Compose

Run the application using the defaultdocker-compose configuration.

  1. Supported LLM Models:

    • By default, only OpenAI and Diffbot are enabled. Gemini requires additional GCP configurations.
    • Use theVITE_LLM_MODELS_PROD variable to configure the models you need. Example:
      VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
  2. Input Sources:

    • By default, the following sources are enabled:local,YouTube,Wikipedia,AWS S3, andweb.
    • To add Google Cloud Storage (GCS) integration, includegcs and your Google client ID:
      VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,gcs,web"VITE_GOOGLE_CLIENT_ID="your-google-client-id"

Chat Modes

Configure chat modes using theVITE_CHAT_MODES variable:

  • By default, all modes are enabled:vector,graph_vector,graph,fulltext,graph_vector_fulltext,entity_vector, andglobal_vector.
  • To specify specific modes, update the variable. For example:
    VITE_CHAT_MODES="vector,graph"

Running Backend and Frontend Separately

For development, you can run the backend and frontend independently.

Frontend Setup

  1. Create the.env file in thefrontend folder by copyingfrontend/example.env.
  2. Update environment variables as needed.
  3. Run:
    cd frontendyarnyarn run dev

Backend Setup

  1. Create the.env file in thebackend folder by copyingbackend/example.env.
  2. Preconfigure user credentials in the.env file to bypass the login dialog:
    NEO4J_URI=<your-neo4j-uri>NEO4J_USERNAME=<your-username>NEO4J_PASSWORD=<your-password>NEO4J_DATABASE=<your-database-name>
  3. Run:
    cd backendpython -m venv envNamesource envName/bin/activatepip install -r requirements.txtuvicorn score:app --reload

Cloud Deployment

Deploy the application onGoogle Cloud Platform using the following commands:

Frontend Deployment

gcloud run deploy dev-frontend \  --source. \  --region us-central1 \  --allow-unauthenticated

Backend Deployment

gcloud run deploy dev-backend \  --set-env-vars"OPENAI_API_KEY=<your-openai-api-key>" \  --set-env-vars"DIFFBOT_API_KEY=<your-diffbot-api-key>" \  --set-env-vars"NEO4J_URI=<your-neo4j-uri>" \  --set-env-vars"NEO4J_USERNAME=<your-username>" \  --set-env-vars"NEO4J_PASSWORD=<your-password>" \  --source. \  --region us-central1 \  --allow-unauthenticated

For local llms (Ollama)

  1. Pull the docker imgage of ollama
docker pull ollama/ollama
  1. Run the ollama docker image
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  1. Execute any llm model ex llama3
dockerexec -it ollama ollama run llama3
  1. Configure env variable in docker compose.
LLM_MODEL_CONFIG_ollama_<model_name>#exampleLLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3,http://host.docker.internal:11434}
  1. Configure the backend API url
VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}
  1. Open the application in browser and select the ollama model for the extraction.
  2. Enjoy Graph Building.

Usage

  1. Connect to Neo4j Aura Instance which can be both AURA DS or AURA DB by passing URI and password through Backend env, fill using login dialog or drag and drop the Neo4j credentials file.
  2. To differntiate we have added different icons. For AURA DB we have a database icon and for AURA DS we have scientific molecule icon right under Neo4j Connection details label.
  3. Choose your source from a list of Unstructured sources to create graph.
  4. Change the LLM (if required) from drop down, which will be used to generate graph.
  5. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings.
  6. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation.
  7. Have a look at the graph for individual files using 'View' in grid or select one or more files and 'Preview Graph'
  8. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM.

ENV

Env Variable NameMandatory/OptionalDefault ValueDescription
BACKEND ENV
OPENAI_API_KEYMandatoryAn OpenAPI Key is required to use open LLM model to authenticate andn track requests
DIFFBOT_API_KEYMandatoryAPI key is required to use Diffbot's NLP service to extraction entities and relatioship from unstructured data
BUCKETMandatorybucket name to store uploaded file on GCS
NEO4J_USER_AGENTOptionalllm-graph-builderName of the user agent to track neo4j database activity
ENABLE_USER_AGENTOptionaltrueBoolean value to enable/disable neo4j user agent
DUPLICATE_TEXT_DISTANCEMandatory5This value used to find distance for all node pairs in the graph and calculated based on node properties
DUPLICATE_SCORE_VALUEMandatory0.97Node score value to match duplicate node
EFFECTIVE_SEARCH_RATIOMandatory1
GRAPH_CLEANUP_MODELOptional0.97Model name to clean-up graph in post processing
MAX_TOKEN_CHUNK_SIZEOptional10000Maximum token size to process file content
YOUTUBE_TRANSCRIPT_PROXYOptionalProxy key to process youtube video for getting transcript
EMBEDDING_MODELOptionalall-MiniLM-L6-v2Model for generating the text embedding (all-MiniLM-L6-v2 , openai , vertexai)
IS_EMBEDDINGOptionaltrueFlag to enable text embedding
KNN_MIN_SCOREOptional0.94Minimum score for KNN algorithm
GEMINI_ENABLEDOptionalFalseFlag to enable Gemini
GCP_LOG_METRICS_ENABLEDOptionalFalseFlag to enable Google Cloud logs
NUMBER_OF_CHUNKS_TO_COMBINEOptional5Number of chunks to combine when processing embeddings
UPDATE_GRAPH_CHUNKS_PROCESSEDOptional20Number of chunks processed before updating progress
NEO4J_URIOptionalneo4j://database:7687URI for Neo4j database
NEO4J_USERNAMEOptionalneo4jUsername for Neo4j database
NEO4J_PASSWORDOptionalpasswordPassword for Neo4j database
LANGCHAIN_API_KEYOptionalAPI key for Langchain
LANGCHAIN_PROJECTOptionalProject for Langchain
LANGCHAIN_TRACING_V2OptionaltrueFlag to enable Langchain tracing
GCS_FILE_CACHEOptionalFalseIf set to True, will save the files to process into GCS. If set to False, will save the files locally
LANGCHAIN_ENDPOINTOptionalhttps://api.smith.langchain.comEndpoint for Langchain API
ENTITY_EMBEDDINGOptionalFalseIf set to True, It will add embeddings for each entity in database
LLM_MODEL_CONFIG_ollama_<model_name>OptionalSet ollama config as - model_name,model_local_url for local deployments
RAGAS_EMBEDDING_MODELOptionalopenaiembedding model used by ragas evaluation framework
FRONTEND ENV
VITE_BLOOM_URLMandatoryhttps://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=trueURL for Bloom visualization
VITE_REACT_APP_SOURCESMandatorylocal,youtube,wiki,s3List of input sources that will be available
VITE_CHAT_MODESMandatoryvector,graph+vector,graph,hybridChat modes available for Q&A
VITE_ENVMandatoryDEV or PRODEnvironment variable for the app
VITE_LLM_MODELSMandatory'diffbot,openai_gpt_3.5,openai_gpt_4o,openai_gpt_4o_mini,gemini_1.5_pro,gemini_1.5_flash,azure_ai_gpt_35,azure_ai_gpt_4o,ollama_llama3,groq_llama3_70b,anthropic_claude_3_5_sonnet'Supported Models For the application
VITE_BACKEND_API_URLOptionalhttp://localhost:8000URL for backend API
VITE_TIME_PER_PAGEOptional50Time per page for processing
VITE_CHUNK_SIZEOptional5242880Size of each chunk of file for upload
VITE_GOOGLE_CLIENT_IDOptionalClient ID for Google authentication
VITE_LLM_MODELS_PRODOptionalopenai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flashTo Distinguish models based on the Enviornment PROD or DEV
VITE_AUTH0_CLIENT_IDMandatory if you are enabling Authentication otherwise it is optionalOkta Oauth Client ID for authentication
VITE_AUTH0_DOMAINMandatory if you are enabling Authentication otherwise it is optionalOkta Oauth Cliend Domain
VITE_SKIP_AUTHOptionaltrueFlag to skip the authentication
VITE_CHUNK_OVERLAPOptional20variable to configure chunk overlap
VITE_TOKENS_PER_CHUNKOptional100variable to configure tokens count per chunk.This gives flexibility for users who may require different chunk sizes for various tokenization tasks, especially when working with large datasets or specific language models.
VITE_CHUNK_TO_COMBINEOptional1variable to configure number of chunks to combine for parllel processing.

Links

LLM Knowledge Graph Builder Application

Neo4j Workspace

Reference

Demo of application

Contact

For any inquiries or support, feel free to raiseGithub Issue

Happy Graph Building!


[8]ページ先頭

©2009-2025 Movatter.jp