Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage#2369

Merged
danielaskdd merged 86 commits intomainfrom
workspace-isolation
Nov 18, 2025
Merged

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage#2369
danielaskdd merged 86 commits intomainfrom
workspace-isolation

Conversation

@danielaskdd
Copy link
Collaborator

Feat: Add Workspace Isolation for Pipeline Status and In-memory Storage

🎯 Problem Statement

When multiple LightRAG objects with differentworkspace values are instantiated simultaneously, the following issues occur:

  1. Pipeline Status Sharing Conflicts: All workspaces share a singlepipeline_status, causing pipeline states from different workspaces to interfere with each other
  2. Lock Mechanism Deficiency: Existing locks (_pipeline_status_lock,_graph_db_lock,_storage_lock) are not workspace-isolated, causing operations from different workspaces to block each other unnecessarily
  3. In Memory Json KV Storage Lack of Workspace Isolation: Related namespace functions don't provide workspace parameters, preventing true workspace isolation

✨ Solution

1.Workspace Isolation for Pipeline Status

  • Treatpipeline_status as a special namespace (storage type), similar to KV storage but without persistence
  • Create independent pipeline_status namespace for each workspace
  • Namespace format:<workspace>:pipeline_status

2.Unified Workspace-Based Lock Mechanism

  • Remove legacy global locks:_pipeline_status_lock,_graph_db_lock,_storage_lock
  • Introduce unified keyed lock mechanism: implemented via_storage_keyed_lock
  • Lock namespace:<workspace>:<storage_type>
  • Lock key: Fixed asdefault_key
  • Benefits: Fine-grained workspace-level isolation, avoiding cross-workspace lock contention

3.Newget_namespace_lock() Function

defget_namespace_lock(namespace:str,workspace:str|None=None,enable_logging:bool=False)->NamespaceLock
  • Simplifies namespace-level lock acquisition
  • Automatically handles workspace and namespace combination
  • Unified lock interface, replacing multiple independent locks

4.Add Workspace Parameter to All Namespace Operations

Updated function signatures to support workspace parameter:

  • initialize_pipeline_status(workspace: str | None = None)
  • get_namespace_data(namespace: str, first_init: bool = False, workspace: str | None = None)
  • get_update_flag(namespace: str, workspace: str | None = None)
  • set_all_update_flags(namespace: str, workspace: str | None = None)
  • clear_all_update_flags(namespace: str, workspace: str | None = None)
  • get_all_update_flags_status(workspace: str | None = None)
  • try_initialize_namespace(namespace: str, workspace: str | None = None)

5.Default Workspace Support (Backward Compatibility)

  • Added global variable_default_workspace
  • Added functionset_default_workspace(workspace: str | None = None)
  • Added functionget_default_workspace() -> str
  • Purpose: Maintain compatibility with legacy code that doesn't provide workspace parameter
  • Behavior: Automatically use default workspace when workspace parameter is None

6.Unified Namespace Naming Convention

Addedget_final_namespace() function:

defget_final_namespace(namespace:str,workspace:str|None=None)->str
  • Centralized logic for combining workspace and namespace
  • Format:<workspace>:<namespace> or<namespace> (when workspace is empty)
  • Ensures consistent naming across all namespace operations

7. Standardize empty workspace handling from "_" to "" across storage

  • Unify empty workspace behavior by changing workspace from "_" to ""
  • Fixed incorrect empty workspace detection in get_all_update_flags_status()

8.Auto-initialize pipeline status ininitialize_storages()

  • Remove manual initialize_pipeline_status calls
  • Auto-init in initialize_storages method
  • Update error and warning messages and for clarity
  • Remove manual initialize_pipeline_status() calls across codebase
  • Update docs and examples

📝 Key Modified Files

  • lightrag/kg/shared_storage.py: Core modification file

    • Added workspace isolation logic
    • Implementedget_namespace_lock()
    • Implementedget_final_namespace()
    • Added default workspace support
    • Added workspace parameter to all namespace operation functions
  • Storage Implementation Files (using new lock mechanism):

    • lightrag/kg/json_kv_impl.py
    • lightrag/kg/json_doc_status_impl.py
    • lightrag/kg/nano_vector_db_impl.py
    • lightrag/kg/faiss_impl.py
    • lightrag/kg/networkx_impl.py
    • All storage implementations now useget_namespace_lock() instead of legacy locks
  • API and Core Logic Files:

    • lightrag/lightrag.py: Set default workspace
    • lightrag/api/lightrag_server.py: Pipeline status initialization
    • lightrag/api/routers/document_routes.py: Use new namespace lock interface

🧪 Testing Recommendations

  1. Multi-Workspace Concurrency Test: Create multiple LightRAG instances with different workspaces simultaneously, verify no interference
  2. Pipeline Status Isolation Test: Verify pipeline status for different workspaces runs independently
  3. Backward Compatibility Test: Verify legacy code without workspace specification still works correctly
  4. Lock Mechanism Test: Verify new keyed lock mechanism works correctly without deadlocks

🎉 Expected Outcomes

  • ✅ Complete workspace-level isolation
  • ✅ LightRAG instances with different workspaces can run concurrently without interference
  • ✅ Pipeline status no longer interferes across workspaces
  • ✅ Optimized lock granularity, reduced unnecessary lock contention
  • ✅ 100% backward compatible with existing code

chatgpt-codex-connector[bot], Mobious, superuely, and tongda reacted with thumbs up emoji
BukeLyand others added30 commitsNovember 17, 2025 12:53
Problem:In multi-tenant scenarios, different workspaces share a single globalpipeline_status namespace, causing pipelines from different tenants toblock each other, severely impacting concurrent processing performance.Solution:- Extended get_namespace_data() to recognize workspace-specific pipeline  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)- Added workspace parameter to initialize_pipeline_status() for per-tenant  isolated pipeline namespaces- Updated all 7 call sites to use workspace-aware locks:  * lightrag.py: process_document_queue(), aremove_document()  * document_routes.py: background_delete_documents(), clear_documents(),    cancel_pipeline(), get_pipeline_status(), delete_documents()Impact:- Different workspaces can process documents concurrently without blocking- Backward compatible: empty workspace defaults to "pipeline_status"- Maintains fail-fast: uninitialized pipeline raises clear error- Expected N× performance improvement for N concurrent tenantsBug fixes:- Fixed AttributeError by using self.workspace instead of self.global_config- Fixed pipeline status endpoint to show workspace-specific status- Fixed delete endpoint to check workspace-specific busy flagCode changes: 4 files, 141 insertions(+), 28 deletions(-)Testing: All syntax checks passed, comprehensive workspace isolation tests completed
Fixes two compatibility issues in workspace isolation:1. Problem: lightrag_server.py calls initialize_pipeline_status()   without workspace parameter, causing pipeline to initialize in   global namespace instead of rag's workspace.   Solution: Add set_default_workspace() mechanism in shared_storage.   LightRAG.initialize_storages() now sets default workspace, which   initialize_pipeline_status() uses when called without parameters.2. Problem: /health endpoint hardcoded to use "pipeline_status",   cannot return workspace-specific status or support frontend   workspace selection.   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now   extracts workspace from header or falls back to server default,   returning correct workspace-specific pipeline status.Changes:- lightrag/kg/shared_storage.py: Add set/get_default_workspace()- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,  update /health endpoint to support LIGHTRAG-WORKSPACE headerTesting:- Backward compatibility: Old code works without modification- Multi-instance safety: Explicit workspace passing preserved- /health endpoint: Supports both default and header-specified workspacesRelated:#2353
- Add Awaitable and Union type imports- Update chunking_func type annotation- Handle coroutine results with await- Add return type validation- Update docstring for async support
- Update import from PyPDF2 to pypdf- Change dependency to pypdf>=6.1.0- Update all requirements files- Remove PyPDF2 from lock file- Use modern pypdf library
• Add _sanitize_json_data helper function• Recursively clean strings in data• Sanitize before JSON serialization• Prevent encoding-related crashes• Use existing sanitize_text_for_encoding
• Remove surrogate characters (U+D800-DFFF)• Filter Unicode non-characters• Direct char-by-char filtering
- Sanitize dictionary keys- Preserve tuple types- Handle nested structures better
- Bump API version to 0254- Remove response format UI controls- Hard-code response_type in query params- Add migration for version 19- Clean up settings store structure
- Fast path for clean data (no sanitization)- Slow path sanitizes during encoding- Reload shared memory after sanitization- Custom encoder avoids deep copies- Comprehensive test coverage
- Precompile regex pattern at module level- Zero-copy path for clean strings- Use C-level regex for performance- Remove deprecated _sanitize_json_data- Fast detection for common case
• Reload cleaned data after sanitization• Update shared memory with clean data• Add specific surrogate char tests• Test migration sanitization flow• Prevent dirty data in memory
• Replace truthy checks with `is not None`• Handle empty dict edge case properly• Prevent data reload failures• Add comprehensive test coverage• Fix JsonKVStorage and DocStatusStorage
- Merge offline-docs into api extras- Remove pipmaster dynamic installs- Add async document processing- Pre-check docling availability- Update offline deployment docs
• Add lazy config initialization• Maintain backward compatibility• Support programmatic usage• Add gunicorn dependency• Explicit config in entry points
- Add --docling CLI flag for easier setup- Add numpy version constraints- Exclude docling on macOS (fork-safety)
- Initialize result vars to None- Add null checks before consume calls- Prevent crashes in except blocks- Apply fix to both Neo4J and Memgraph
- Add max_token_size=8192 to all embed funcs- Move siliconcloud to deprecated folder- Import wrap_embedding_func_with_attrs- Update EmbeddingFunc docstring- Fix langfuse import type annotation
• Add specific exception types• Implement proper retry mechanism• Better error classification• Enhanced logging and validation• Enable embedding retry decorator
- Add EMBEDDING_TOKEN_LIMIT env var- Set max_token_size on embedding func- Add token limit property to LightRAG- Validate summary length vs limit- Log warning when limit exceeded
• Change warning to info level• Simplify workspace mismatch wording
• Add async sleep to mock functions• Test concurrent ainsert operations• Use asyncio.gather for parallel exec• Measure concurrent execution time
• Add pytest command-line options• Create session-scoped fixtures• Remove hardcoded environment vars• Update test function signatures• Improve configuration priority
• Remove unused tempfile import• Use consistent project temp/ structure• Clean up existing directories first• Create directories with os.makedirs• Use descriptive test directory names
- Add GitHub Actions workflow for CI- Mark integration tests requiring services- Add offline test markers for isolated tests- Skip integration tests by default- Configure pytest markers and collection
* Define pytest marker patterns* Document CI/CD test execution* Specify offline vs integration tests* Add test isolation best practices* Reference testing guidelines doc
@danielaskdd
Copy link
CollaboratorAuthor

@codex review

Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

• Remove pytest group dependency• Add explicit pytest>=8.4.2• Add pytest-asyncio>=1.2.0• Add pre-commit directly• Fix potential circular dependency
• Use rsplit instead of split• Handle colons in workspace names
@danielaskdd
Copy link
CollaboratorAuthor

@codex review

Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdd
Copy link
CollaboratorAuthor

@codex review

Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

• Track if we acquired the pipeline lock• Auto-acquire pipeline when idle• Only release if we acquired it• Prevent concurrent deletion conflicts• Improve deletion job validation
@danielaskdd
Copy link
CollaboratorAuthor

@codex review

• TestResult → ExecutionResult• TestStats → ExecutionStats• Update class docstrings• Update type hints• Update variable references
Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

• Clarify job naming for single vs batch deletion• Update job name validation in busy pipeline check
@danielaskdd
Copy link
CollaboratorAuthor

@codex review

@chatgpt-codex-connector

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielaskdddanielaskdd merged commitdfbc973 intomainNov 18, 2025
4 checks passed
@danielaskdddanielaskdd deleted the workspace-isolation branchNovember 18, 2025 07:21
@xtfocusxtfocus mentioned this pull requestNov 25, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@chatgpt-codex-connectorchatgpt-codex-connector[bot]chatgpt-codex-connector[bot] left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@danielaskdd@BukeLy@tongda@LacombeLouis@sleeepyin

Comments


[8]ページ先頭

©2009-2026 Movatter.jp