HKUDS/LightRAGPublic

NotificationsYou must be signed in to change notification settings
Fork4.1k
Star28.5k

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage#2369

Merged

danielaskdd merged 86 commits intomainfrom

workspace-isolation

Nov 18, 2025

Merged

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage#2369
danielaskdd merged 86 commits intomainfrom
workspace-isolation

Conversation

Copy link

Collaborator

danielaskdd commentedNov 17, 2025

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage

🎯 Problem Statement

When multiple LightRAG objects with differentworkspace values are instantiated simultaneously, the following issues occur:

Pipeline Status Sharing Conflicts: All workspaces share a singlepipeline_status, causing pipeline states from different workspaces to interfere with each other
Lock Mechanism Deficiency: Existing locks (_pipeline_status_lock,_graph_db_lock,_storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily
In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation

✨ Solution

1.Workspace Isolation for Pipeline Status

Treatpipeline_status as a special namespace (storage type), similar to KV storage but without persistence
Create independent pipeline_status namespace for each workspace
Namespace format:<workspace>:pipeline_status

2.Unified Workspace-Based Lock Mechanism

Remove legacy global locks:_pipeline_status_lock,_graph_db_lock,_storage_lock
Introduce unified keyed lock mechanism: implemented via_storage_keyed_lock
Lock namespace:<workspace>:<storage_type>
Lock key: Fixed asdefault_key
Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention

3.New`get_namespace_lock()` Function

defget_namespace_lock(namespace:str,workspace:str|None=None,enable_logging:bool=False)->NamespaceLock

Simplifies namespace-level lock acquisition
Automatically handles workspace and namespace combination
Unified lock interface, replacing multiple independent locks

4.Add Workspace Parameter to All Namespace Operations

Updated function signatures to support workspace parameter:

initialize_pipeline_status(workspace: str | None = None)
get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)
get_update_flag(namespace: str, workspace: str | None = None)
set_all_update_flags(namespace: str, workspace: str | None = None)
clear_all_update_flags(namespace: str, workspace: str | None = None)
get_all_update_flags_status(workspace: str | None = None)
try_initialize_namespace(namespace: str, workspace: str | None = None)

5.Default Workspace Support (Backward Compatibility)

Added global variable_default_workspace
Added functionset_default_workspace(workspace: str | None = None)
Added functionget_default_workspace() -> str
Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
Behavior: Automatically use default workspace when workspace parameter is None

6.Unified Namespace Naming Convention

Addedget_final_namespace() function:

defget_final_namespace(namespace:str,workspace:str|None=None)->str

Centralized logic for combining workspace and namespace
Format:<workspace>:<namespace> or<namespace> (when workspace is empty)
Ensures consistent naming across all namespace operations

7. Standardize empty workspace handling from "_" to "" across storage

Unify empty workspace behavior by changing workspace from "_" to ""
Fixed incorrect empty workspace detection in get_all_update_flags_status()

8.Auto-initialize pipeline status in`initialize_storages()`

Remove manual initialize_pipeline_status calls
Auto-init in initialize_storages method
Update error and warning messages and for clarity
Remove manual initialize_pipeline_status() calls across codebase
Update docs and examples

📝 Key Modified Files

lightrag/kg/shared_storage.py: Core modification file
- Added workspace isolation logic
- Implementedget_namespace_lock()
- Implementedget_final_namespace()
- Added default workspace support
- Added workspace parameter to all namespace operation functions
Storage Implementation Files (using new lock mechanism):
- lightrag/kg/json_kv_impl.py
- lightrag/kg/json_doc_status_impl.py
- lightrag/kg/nano_vector_db_impl.py
- lightrag/kg/faiss_impl.py
- lightrag/kg/networkx_impl.py
- All storage implementations now useget_namespace_lock() instead of legacy locks
API and Core Logic Files:
- lightrag/lightrag.py: Set default workspace
- lightrag/api/lightrag_server.py: Pipeline status initialization
- lightrag/api/routers/document_routes.py: Use new namespace lock interface

🧪 Testing Recommendations

Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks

🎉 Expected Outcomes

✅ Complete workspace-level isolation
✅ LightRAG instances with different workspaces can run concurrently without interference
✅ Pipeline status no longer interferes across workspaces
✅ Optimized lock granularity, reduced unnecessary lock contention
✅ 100% backward compatible with existing code

BukeLyand others added30 commits

November 17, 2025 12:53

feat: Add workspace isolation support for pipeline status

eb52ec9

Problem:In multi-tenant scenarios, different workspaces share a single globalpipeline_status namespace, causing pipelines from different tenants toblock each other, severely impacting concurrent processing performance.Solution:- Extended get_namespace_data() to recognize workspace-specific pipeline  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)- Added workspace parameter to initialize_pipeline_status() for per-tenant  isolated pipeline namespaces- Updated all 7 call sites to use workspace-aware locks:  * lightrag.py: process_document_queue(), aremove_document()  * document_routes.py: background_delete_documents(), clear_documents(),    cancel_pipeline(), get_pipeline_status(), delete_documents()Impact:- Different workspaces can process documents concurrently without blocking- Backward compatible: empty workspace defaults to "pipeline_status"- Maintains fail-fast: uninitialized pipeline raises clear error- Expected N× performance improvement for N concurrent tenantsBug fixes:- Fixed AttributeError by using self.workspace instead of self.global_config- Fixed pipeline status endpoint to show workspace-specific status- Fixed delete endpoint to check workspace-specific busy flagCode changes: 4 files, 141 insertions(+), 28 deletions(-)Testing: All syntax checks passed, comprehensive workspace isolation tests completed

fix: Add default workspace support for backward compatibility

18a4870

Fixes two compatibility issues in workspace isolation:1. Problem: lightrag_server.py calls initialize_pipeline_status()   without workspace parameter, causing pipeline to initialize in   global namespace instead of rag's workspace.   Solution: Add set_default_workspace() mechanism in shared_storage.   LightRAG.initialize_storages() now sets default workspace, which   initialize_pipeline_status() uses when called without parameters.2. Problem: /health endpoint hardcoded to use "pipeline_status",   cannot return workspace-specific status or support frontend   workspace selection.   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now   extracts workspace from header or falls back to server default,   returning correct workspace-specific pipeline status.Changes:- lightrag/kg/shared_storage.py: Add set/get_default_workspace()- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,  update /health endpoint to support LIGHTRAG-WORKSPACE headerTesting:- Backward compatibility: Old code works without modification- Multi-instance safety: Explicit workspace passing preserved- /health endpoint: Supports both default and header-specified workspacesRelated:#2353

support async chunking func to improve processing performance when a …

7740500

…heavy `chunking_func` is passed in by user

easier version: detect chunking_func result is coroutine or not

5016025

Support async chunking functions in LightRAG processing pipeline

af54239

- Add Awaitable and Union type imports- Update chunking_func type annotation- Handle coroutine results with await- Add return type validation- Update docstring for async support

Replace PyPDF2 with pypdf for PDF processing

c434879

- Update import from PyPDF2 to pypdf- Change dependency to pypdf>=6.1.0- Update all requirements files- Remove PyPDF2 from lock file- Use modern pypdf library

Update env.example

ff8f158

Add data sanitization to JSON writing to prevent UTF-8 encoding errors

23cbb9c

• Add _sanitize_json_data helper function• Recursively clean strings in data• Sanitize before JSON serialization• Prevent encoding-related crashes• Use existing sanitize_text_for_encoding

Add specialized JSON string sanitizer to prevent UTF-8 encoding errors

5885637

• Remove surrogate characters (U+D800-DFFF)• Filter Unicode non-characters• Direct char-by-char filtering

Improve JSON data sanitization to handle tuples and dict keys

abeaac8

- Sanitize dictionary keys- Preserve tuple types- Handle nested structures better

Remove deprecated response_type parameter from query settings

93a3e47

- Bump API version to 0254- Remove response format UI controls- Hard-code response_type in query params- Add migration for version 19- Clean up settings store structure

Optimize JSON write with fast/slow path to reduce memory usage

f289cf6

- Fast path for clean data (no sanitization)- Slow path sanitizes during encoding- Reload shared memory after sanitization- Custom encoder avoids deep copies- Comprehensive test coverage

Optimize JSON string sanitization with precompiled regex and zero-copy

7f54f47

- Precompile regex pattern at module level- Zero-copy path for clean strings- Use C-level regex for performance- Remove deprecated _sanitize_json_data- Fast detection for common case

Fix migration to reload sanitized data and prevent memory corruption

cca0800

• Reload cleaned data after sanitization• Update shared memory with clean data• Add specific surrogate char tests• Test migration sanitization flow• Prevent dirty data in memory

Fix empty dict handling after JSON sanitization

a08bc72

• Replace truthy checks with `is not None`• Handle empty dict edge case properly• Prevent data reload failures• Add comprehensive test coverage• Fix JsonKVStorage and DocStatusStorage

Update env.example

72f68c2

Replace asyncio.iscoroutine with inspect.isawaitable for better detec…

7d394fb

…tion

refactor: move document deps to api group, remove dynamic imports

69a0b74

- Merge offline-docs into api extras- Remove pipmaster dynamic installs- Add async document processing- Pre-check docling availability- Update offline deployment docs

Implement lazy configuration initialization for API server

7b7f93d

• Add lazy config initialization• Maintain backward compatibility• Support programmatic usage• Add gunicorn dependency• Explicit config in entry points

Update uv.lock

fa9206d

Add support for environment variable fallback for API key and default…

5127bf2

… host for cloud models

Add a better regex

67dfd85

Improve error handling and logging in cloud model detection

6351047

Improve docling integration with macOS compatibility and CLI flag

c246eff

- Add --docling CLI flag for easier setup- Add numpy version constraints- Exclude docling on macOS (fork-safety)

Add macOS compatibility check for DOCLING with multi-worker Gunicorn

2f2f35b

Fix null reference errors in graph database error handling

423e4e9

- Initialize result vars to None- Add null checks before consume calls- Prevent crashes in except blocks- Apply fix to both Neo4J and Memgraph

Refactor exception handling in MemgraphStorage label methods

8283c86

Add max_token_size parameter to embedding function decorators

7722156

- Add max_token_size=8192 to all embed funcs- Move siliconcloud to deprecated folder- Import wrap_embedding_func_with_attrs- Update EmbeddingFunc docstring- Fix langfuse import type annotation

Improve Bedrock error handling with retry logic and custom exceptions

f5b4858

• Add specific exception types• Implement proper retry mechanism• Better error classification• Enhanced logging and validation• Enable embedding retry decorator

Add configurable embedding token limit with validation

14a6c24

- Add EMBEDDING_TOKEN_LIMIT env var- Set max_token_size on embedding func- Add token limit property to LightRAG- Validate summary length vs limit- Log warning when limit exceeded

danielaskdd added8 commits

November 18, 2025 08:07

Fix linting

fc9f7c7

Reduce log level and improve workspace mismatch message clarity

6cef8df

• Change warning to info level• Simplify workspace mismatch wording

test: add concurrent execution to workspace isolation test

6ae0c14

• Add async sleep to mock functions• Test concurrent ainsert operations• Use asyncio.gather for parallel exec• Measure concurrent execution time

Refactor test configuration to use pytest fixtures and CLI options

1fe05df

• Add pytest command-line options• Create session-scoped fixtures• Remove hardcoded environment vars• Update test function signatures• Improve configuration priority

Standardize test directory creation and remove tempfile dependency

4fef731

• Remove unused tempfile import• Use consistent project temp/ structure• Clean up existing directories first• Create directories with os.makedirs• Use descriptive test directory names

Add GitHub CI workflow and test markers for offline/integration tests

4ea2124

- Add GitHub Actions workflow for CI- Mark integration tests requiring services- Add offline test markers for isolated tests- Skip integration tests by default- Configure pytest markers and collection

Fix test to use default workspace parameter behavior

41bf6d0

Add testing workflow guidelines to basic development rules

a11912f

* Define pytest marker patterns* Document CI/CD test execution* Specify offline vs integration tests* Add test isolation best practices* Reference testing guidelines doc

Copy link

CollaboratorAuthor

danielaskdd commentedNov 18, 2025

@codex review

chatgpt-codex-connectorbot reviewed

Nov 18, 2025

View reviewed changes

Copy link

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pyproject.tomlShow resolvedHide resolved

lightrag/kg/shared_storage.pyShow resolvedHide resolved

danielaskdd added2 commits

November 18, 2025 12:17

Replace pytest group reference with explicit dependencies in evaluation

472b498

• Remove pytest group dependency• Add explicit pytest>=8.4.2• Add pytest-asyncio>=1.2.0• Add pre-commit directly• Fix potential circular dependency

Fix namespace parsing when workspace contains colons

f8dd2e0

• Use rsplit instead of split• Handle colons in workspace names

Copy link

CollaboratorAuthor

danielaskdd commentedNov 18, 2025

@codex review

chatgpt-codex-connectorbot reviewed

Nov 18, 2025

View reviewed changes

Copy link

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightrag/lightrag.py OutdatedShow resolvedHide resolved

lightrag/api/routers/document_routes.py OutdatedShow resolvedHide resolved

Fix missing workspace parameter in update flags status call

1745b30

Copy link

CollaboratorAuthor

danielaskdd commentedNov 18, 2025

@codex review

chatgpt-codex-connectorbot reviewed

Nov 18, 2025

View reviewed changes

Copy link

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightrag/lightrag.py OutdatedShow resolvedHide resolved

Fix: auto-acquire pipeline when idle in document deletion

4048fc4

• Track if we acquired the pipeline lock• Auto-acquire pipeline when idle• Only release if we acquired it• Prevent concurrent deletion conflicts• Improve deletion job validation

Copy link

CollaboratorAuthor

danielaskdd commentedNov 18, 2025

@codex review

danielaskdd added2 commits

November 18, 2025 13:33

Rename test classes to prevent warning from pytest

7e9c8ed

• TestResult → ExecutionResult• TestStats → ExecutionStats• Update class docstrings• Update type hints• Update variable references

Rename GitHub workflow from "Tests" to "Offline Unit Tests"

656025b

chatgpt-codex-connectorbot reviewed

Nov 18, 2025

View reviewed changes

Copy link

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightrag/lightrag.pyShow resolvedHide resolved

Fix document deletion concurrency control and validation logic

702cfd2

• Clarify job naming for single vs batch deletion• Update job name validation in busy pipeline check

Copy link

CollaboratorAuthor

danielaskdd commentedNov 18, 2025

@codex review

Copy link

chatgpt-codex-connectorbot commentedNov 18, 2025

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

danielaskdd merged commitdfbc973 intomain

Nov 18, 2025

4 checks passed

danielaskdd deleted the workspace-isolation branch

November 18, 2025 07:21

danielaskdd mentioned this pull request

Nov 22, 2025

[Feature Request]:Workspace-Level Document Isolation and Retrieval Separation#2373

Open

2 tasks

xtfocus mentioned this pull request

Nov 25, 2025

Multi-User / Multi-Tenant#310

Closed

sillydong mentioned this pull request

Dec 1, 2025

feat: Finish implement workspace isolation in lightrag_server#2445

Open

4 tasks

Labels

None yet

5 participants

Movatterモバイル変換

Conversation

danielaskdd commentedNov 17, 2025

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage

🎯 Problem Statement

✨ Solution

1.Workspace Isolation for Pipeline Status

2.Unified Workspace-Based Lock Mechanism

3.Newget_namespace_lock() Function

4.Add Workspace Parameter to All Namespace Operations

5.Default Workspace Support (Backward Compatibility)

6.Unified Namespace Naming Convention

7. Standardize empty workspace handling from "_" to "" across storage

8.Auto-initialize pipeline status ininitialize_storages()

📝 Key Modified Files

🧪 Testing Recommendations

🎉 Expected Outcomes

Uh oh!

danielaskdd commentedNov 18, 2025

Uh oh!

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

danielaskdd commentedNov 18, 2025

Uh oh!

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

danielaskdd commentedNov 18, 2025

Uh oh!

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

danielaskdd commentedNov 18, 2025

Uh oh!

chatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

danielaskdd commentedNov 18, 2025

Uh oh!

chatgpt-codex-connectorbot commentedNov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

3.New`get_namespace_lock()` Function

8.Auto-initialize pipeline status in`initialize_storages()`