Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: Vector Storage Model Isolation with Automatic Migration#2391

Merged
63 commits merged intoHKUDS:mainfrom
BukeLy:feature/vector-model-isolation
Dec 21, 2025
Merged

feat: Vector Storage Model Isolation with Automatic Migration#2391
63 commits merged intoHKUDS:mainfrom
BukeLy:feature/vector-model-isolation

Conversation

@BukeLy
Copy link
Contributor

@BukeLyBukeLy commentedNov 19, 2025
edited
Loading

Vector Storage Model Isolation with Unified Auto-Migration

Problem

Current LightRAG vector storage forces all embedding models to share the same dimension:

  1. Dimension Lock: First workspace locks dimension, subsequent workspaces with different dimensions fail
  2. Model Confusion: Different models' data mixed in same collection, no way to distinguish source

Solution

Automatic Model Suffix

Collections/tables auto-named with model identifier:{base}_{model}_{dim}d

Examples:

  • lightrag_vdb_chunks_text_embedding_ada_002_1536d
  • lightrag_vdb_chunks_bge_small_768d

Unified Migration & Cleanup

PostgreSQL and Qdrant follow same 4-case logic:

  • Case 1: Both exist → Check and migrate current workspace's unmigrated data (multi-tenant support)
  • Case 2: Only new exists → Normal operation
  • Case 3: Neither exists → Create new
  • Case 4: Only legacy exists → Migrate (500/batch) → Verify → Auto-delete legacy

Key Features

EmbeddingFunc Enhancement

embedding_func=EmbeddingFunc(embedding_dim=1536,func=my_embed,model_name="text-embedding-ada-002"# New parameter)

Fully Automated

  • Migration auto-triggered
  • Data integrity verified
  • Legacy auto-cleanup
  • Zero manual intervention

Unified Behavior

  • PostgreSQL and Qdrant identical logic
  • Same safety guarantees
  • Same user experience

Usage

Single Model

embedding_func=EmbeddingFunc(embedding_dim=1536,func=my_embed,model_name="text-embedding-ada-002")rag=LightRAG(working_dir="./rag_storage",embedding_func=embedding_func,vector_storage="PGVectorStorage")awaitrag.initialize_storages()# Auto-migration + cleanup

Multi-Model Isolation

# Tenant A: BGE-Small (768d)rag_a=LightRAG(embedding_func=EmbeddingFunc(768,func=bge_small,model_name="bge-small"),vector_storage="QdrantVectorDBStorage")# Tenant B: BGE-Large (1024d)rag_b=LightRAG(embedding_func=EmbeddingFunc(1024,func=bge_large,model_name="bge-large"),vector_storage="QdrantVectorDBStorage")# Auto-creates isolated collectionsawaitrag_a.initialize_storages()# lightrag_vdb_chunks_bge_small_768dawaitrag_b.initialize_storages()# lightrag_vdb_chunks_bge_large_1024d

Security Fixes

This PR fixes two security issues discovered during code review:

Bug 1: UnifiedLock False Security

Problem: Whenself._lock is None,__aenter__ only logs WARNING but still returns successfully, allowing critical sections to run without lock protection.

Fix:

  • UnifiedLock.__aenter__ now raisesRuntimeError when lock is None
  • Provides clear error message to guide developers
  • Prevents race conditions and data corruption

Files Modified:

  • lightrag/kg/shared_storage.py
  • tests/test_unified_lock_safety.py (3 tests)

Bug 2: Cross-Workspace Data Leakage in Migration

Problem:setup_table() doesn't filter by workspace during legacy table migration, causing workspace A to receive data from all workspaces.

Fix:

  • Addedworkspace parameter tosetup_table()
  • Added workspace filtering in COUNT and SELECT queries
  • Logs WARNING when workspace is None

Files Modified:

  • lightrag/kg/postgres_impl.py
  • tests/test_workspace_migration_isolation.py (3 tests)
  • tests/test_e2e_multi_instance.py (1 E2E test)

Backward Compatible:

  • workspace parameter is optional, defaults to None
  • Existing code continues working without modification

Testing

Unit Tests (32) - All Pass ✅

  • PostgreSQL: 9 migration tests
  • Qdrant: 8 migration tests
  • Security: 4 no-suffix tests + 3 UnifiedLock tests + 3 workspace isolation tests
  • Dimension mismatch: 5 tests

E2E Tests (6 jobs) - All Pass ✅

  • Python 3.10 + 3.12, real database containers
  • PostgreSQL: Legacy migration + auto-delete + multi-model + workspace isolation + dimension mismatch
  • Qdrant: Legacy migration + auto-delete + multi-model + dimension mismatch

Code Quality

  • ruff-format, ruff, trailing-whitespace passed
  • Clear comments, unified style

Migration Guide

New Users

Addmodel_name parameter:

embedding_func=EmbeddingFunc(...,model_name="your-model")

Existing Users

Option 1: Keep current setup (no action needed)

Option 2: Enable model isolation (recommended)

  1. Addmodel_name parameter
  2. Restart app
  3. Auto-migration runs once
  4. Legacy auto-deleted

Before: 5-6 steps with manual cleanup
After: 2 steps fully automated

Backward Compatible

✅ No breaking changes

  • Existing code withoutmodel_name continues working
  • Legacy naming supported
  • No API changes

Design Principles

  1. Safety First: Never delete collections with data
  2. Automation: No manual configuration needed
  3. Consistency: Both backends behave identically
  4. Simplicity: No extra config options

Test Results: 38/38 Pass ✅

  • Unit: 32/32 (17 migration + 7 safety + 3 lock + 5 dimension)
  • E2E: 6/6 jobs (Python 3.10 + 3.12, real containers)
  • Lint: All checks passed

chatgpt-codex-connector[bot] and sangnguyens reacted with thumbs up emojiyrangana reacted with heart emoji
BukeLyand others added30 commitsNovember 19, 2025 02:11
Why this change is needed:To support vector storage model isolation, we need to track which model is used for embeddings and generate unique identifiers for collections/tables.How it solves it:- Added model_name field to EmbeddingFunc- Added get_model_identifier() method to generate sanitized suffix- Added unit tests to verify behaviorImpact:Enables subsequent changes in storage backends to isolate data by model.Testing:Added tests/test_embedding_func.py passing.
Why this change is needed:To enforce consistent naming and migration strategy across all vector storages.How it solves it:- Added _generate_collection_suffix() helper- Added _get_legacy_collection_name() and _get_new_collection_name() interfacesImpact:Prepares storage implementations for multi-model support.Testing:Added tests/test_base_storage_integrity.py passing.
Why this change is needed:To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data.How it solves it:- Modified QdrantVectorDBStorage to use model-specific collection suffixes- Implemented automated migration logic from legacy collections to new schema- Fixed Shared-Data lock re-entrancy issue in multiprocess mode- Added comprehensive tests for collection naming and migration triggersImpact:- Existing users will have data automatically migrated on next startup- New workspaces will use isolated collections based on embedding model- Fixes potential lock-related bugs in shared storageTesting:- Added tests/test_qdrant_migration.py passing- Verified migration logic covers all 4 states (New/Legacy existence combinations)
Why this change is needed:PostgreSQL vector storage needs model isolation to prevent dimensionconflicts when different workspaces use different embedding models.Without this, the first workspace locks the vector dimension for allsubsequent workspaces, causing failures.How it solves it:- Implements dynamic table naming with model suffix: {table}_{model}_{dim}d- Adds setup_table() method mirroring Qdrant's approach for consistency- Implements 4-branch migration logic: both exist -> warn, only new -> use,  neither -> create, only legacy -> migrate- Batch migration: 500 records/batch (same as Qdrant)- No automatic rollback to support idempotent re-runsImpact:- PostgreSQL tables now isolated by embedding model and dimension- Automatic data migration from legacy tables on startup- Backward compatible: model_name=None defaults to "unknown"- All SQL operations use dynamic table namesTesting:- 6 new tests for PostgreSQL migration (100% pass)- Tests cover: naming, migration trigger, scenarios 1-3- 3 additional scenario tests added for Qdrant completenessCo-Authored-By: Claude <noreply@anthropic.com>
Why this change is needed:After implementing model isolation, two critical bugs were discovered that would cause data access failures:Bug 1: In delete_entity_relation(), the SQL query uses positional parameters($1, $2) but the parameter dict was not converted to a list of values beforepassing to db.execute(). This caused parameter binding failures when trying todelete entity relations.Bug 2: Four read methods (get_by_id, get_by_ids, get_vectors_by_ids, drop)were still using namespace_to_table_name(self.namespace) to get legacy tablenames instead of self.table_name with model suffix. This meant these methodswould query the wrong table (legacy without suffix) while data was beinginserted into the new table (with suffix), causing data not found errors.How it solves it:- Bug 1: Convert parameter dict to list using list(params.values()) before  passing to db.execute(), matching the pattern used in other methods- Bug 2: Replace all namespace_to_table_name(self.namespace) calls with  self.table_name in the four affected methods, ensuring they query the  correct model-specific tableImpact:- delete_entity_relation now correctly deletes relations by entity name- All read operations now correctly query model-specific tables- Data written with model isolation can now be properly retrieved- Maintains consistency with write operations using self.table_nameTesting:- All 6 PostgreSQL migration tests pass (test_postgres_migration.py)- All 6 Qdrant migration tests pass (test_qdrant_migration.py)- Verified parameter binding works correctly- Verified read methods access correct tables
Why this is needed:Users need practical examples to understand how to use the new vector storagemodel isolation feature. Without examples, the automatic migration and multi-modelcoexistence patterns may not be clear to developers implementing this feature.What this adds:- Comprehensive demo covering three key scenarios:  1. Creating new workspace with explicit model name  2. Automatic migration from legacy format (without model_name)  3. Multiple embedding models coexisting safely- Detailed inline comments explaining each scenario- Expected collection/table naming patterns- Verification steps for each scenarioImpact:- Provides clear guidance for users upgrading to model isolation- Demonstrates best practices for specifying model_name- Shows how to verify successful migrations- Reduces support burden by answering common questions upfrontTesting:Example code includes complete async/await patterns and can be run directlyafter configuring OpenAI API credentials. Each scenario is self-containedwith explanatory output.Related commits:-df5aacb: Qdrant model isolation implementation-ad68624: PostgreSQL model isolation implementation
Why this change is needed:The previous fix in commit7dc1f83 incorrectly "fixed" delete_entity_relationby converting the parameter dict to a list. However, PostgreSQLDB.execute()expects a dict[str, Any] parameter, not a list. The execute() method internallyconverts dict values to tuple (line 1487: tuple(data.values())), so passinga list bypasses the expected interface and causes parameter binding issues.What was wrong:```pythonparams = {"workspace": self.workspace, "entity_name": entity_name}await self.db.execute(delete_sql, list(params.values()))  # WRONG```The correct approach (matching delete_entity method):```pythonawait self.db.execute(    delete_sql, {"workspace": self.workspace, "entity_name": entity_name})```How it solves it:- Pass parameters as a dict directly to db.execute(), matching the method signature- Maintain consistency with delete_entity() which correctly passes a dict- Let db.execute() handle the dict-to-tuple conversion internally as designedImpact:- delete_entity_relation now correctly passes parameters to PostgreSQL- Method interface consistency with other delete operations- Proper parameter binding ensures reliable entity relation deletionTesting:- All 6 PostgreSQL migration tests pass- Verified parameter passing matches delete_entity pattern- Code review identified the issue before production useRelated:- Fixes incorrect "fix" from commit7dc1f83- Aligns with PostgreSQLDB.execute() interface (line 1477-1480)
Why this change is needed:Before creating a PR, we need to validate that the vector storage model isolationfeature works correctly in the CI environment. The existing tests.yml only runson main/dev branches and only tests marked as 'offline'. We need a dedicatedworkflow to test feature branches and specifically run migration tests.What this adds:- New workflow: feature-tests.yml- Triggers on:  1. Manual dispatch (workflow_dispatch) - can be triggered from GitHub UI  2. Push to feature/** branches - automatic testing  3. Pull requests to main/dev - pre-merge validation- Runs migration tests across Python 3.10, 3.11, 3.12- Specifically tests:  - test_qdrant_migration.py (6 tests)  - test_postgres_migration.py (6 tests)- Uploads test results as artifactsHow to use:1. Automatic: Push to feature/vector-model-isolation triggers tests2. Manual: Go to Actions tab → Feature Branch Tests → Run workflow3. PR: Tests run automatically when PR is createdImpact:- Enables pre-PR validation on GitHub infrastructure- Catches issues before code review- Provides test results across multiple Python versions- No need for local test environment setupTesting:After pushing this commit, tests will run automatically on the feature branch.Can also be triggered manually from GitHub Actions UI.
Why this change is needed:While unit tests with mocks verify code logic, they cannot catch real-worldissues like database connectivity, SQL syntax errors, vector dimension mismatches,or actual data migration failures. E2E tests with real database services provideconfidence that the feature works in production-like environments.What this adds:1. E2E workflow (.github/workflows/e2e-tests.yml):   - PostgreSQL job with ankane/pgvector:latest service   - Qdrant job with qdrant/qdrant:latest service   - Runs on Python 3.10 and 3.12   - Manual trigger + automatic on PR2. PostgreSQL E2E tests (test_e2e_postgres_migration.py):   - Fresh installation: Create new table with model suffix   - Legacy migration: Migrate 10 real records from legacy table   - Multi-model: Two models create separate tables with different dimensions   - Tests real SQL execution, pgvector operations, data integrity3. Qdrant E2E tests (test_e2e_qdrant_migration.py):   - Fresh installation: Create new collection with model suffix   - Legacy migration: Migrate 10 real vectors from legacy collection   - Multi-model: Two models create separate collections (768d vs 1024d)   - Tests real Qdrant API calls, collection creation, vector operationsHow it solves it:- Uses GitHub Actions services to spin up real databases- Tests connect to actual PostgreSQL with pgvector extension- Tests connect to actual Qdrant server with HTTP API- Verifies complete data flow: create → migrate → verify- Validates dimension isolation and data integrityImpact:- Catches database-specific issues before production- Validates migration logic with real data- Confirms multi-model isolation works end-to-end- Provides high confidence for merge to mainTesting:After this commit, E2E tests can be triggered manually from GitHub Actions UI:  Actions → E2E Tests (Real Databases) → Run workflowExpected results:- PostgreSQL E2E: 3 tests pass (fresh install, migration, multi-model)- Qdrant E2E: 3 tests pass (fresh install, migration, multi-model)- Total: 6 E2E tests validating real database operationsNote:E2E tests are separate from fast unit tests and only run on:1. Manual trigger (workflow_dispatch)2. Pull requests that modify storage implementation filesThis keeps the main CI fast while providing thorough validation when needed.
Fix pytest fixture scope incompatibility with pytest-asyncio.Changed fixture scope from "module" to "function" to matchpytest-asyncio's default event loop scope.Issue: ScopeMismatch error when accessing function-scopedevent loop fixture from module-scoped fixtures.Testing: Fixes E2E test execution in GitHub Actions
Add missing connection retry configuration parameters:- connection_retry_attempts: 3- connection_retry_backoff: 0.5- connection_retry_backoff_max: 5.0- pool_close_timeout: 5.0These are required by PostgreSQLDB initialization.Issue: KeyError: 'connection_retry_attempts' in E2E tests
Replaced storage-level E2E tests with comprehensive LightRAG-based tests.Key improvements:- Use complete LightRAG initialization (not just storage classes)- Proper mock LLM/embedding functions matching real usage patterns- Added tokenizer support for realistic testingTest coverage:1. test_legacy_migration_postgres: Automatic migration from legacy table (1536d)2. test_multi_instance_postgres: Multiple LightRAG instances (768d + 1024d)3. test_multi_instance_qdrant: Multiple Qdrant instances (768d + 1024d)Scenarios tested:- ✓ Multi-dimension support (768d, 1024d, 1536d)- ✓ Multi-model names (model-a, model-b, text-embedding-ada-002)- ✓ Legacy migration (backward compatibility)- ✓ Multi-instance coexistence- ✓ PostgreSQL and Qdrant storage backendsRemoved:- tests/test_e2e_postgres_migration.py (replaced)- tests/test_e2e_qdrant_migration.py (replaced)Updated:- .github/workflows/e2e-tests.yml: Use unified test file
Why this change is needed:Complete E2E test coverage for vector model isolation feature requirestesting legacy data migration for both PostgreSQL and Qdrant backends.Previously only PostgreSQL migration was tested.How it solves it:- Add test_legacy_migration_qdrant() function to test automatic migration  from legacy collection (no model suffix) to model-suffixed collection- Test creates legacy "lightrag_vdb_chunks" collection with 1536d vectors- Initializes LightRAG with model_name="text-embedding-ada-002"- Verifies automatic migration to "lightrag_vdb_chunks_text_embedding_ada_002_1536d"- Validates vector count, dimension, and collection existenceImpact:- Ensures Qdrant migration works correctly in real scenarios- Provides parity with PostgreSQL E2E test coverage- Will be automatically run in CI via -k "qdrant" filterTesting:- Test follows same pattern as test_legacy_migration_postgres- Uses complete LightRAG initialization with mock LLM and embedding- Includes proper cleanup via qdrant_cleanup fixture- Syntax validated with python3 -m py_compile
Why this change is needed:E2E tests were failing in GitHub Actions CI with two critical issues:1. PostgreSQL tests failed with "ModuleNotFoundError: No module named 'qdrant_client'"2. Qdrant container health check never became healthyHow it solves it:1. Added qdrant-client to PostgreSQL job dependencies   - test_e2e_multi_instance.py imports QdrantClient at module level   - Even with -k "postgres" filter, pytest imports the whole module first   - Both PostgreSQL and Qdrant tests now share dependencies2. Changed Qdrant health check from curl to wget   - Qdrant Docker image may not have curl pre-installed   - wget is more commonly available in minimal container images   - New command: wget --no-verbose --tries=1 --spiderImpact:- Fixes PostgreSQL E2E test import errors- Enables Qdrant container to pass health checks- Allows both test suites to run successfully in CITesting:- Will verify in next CI run that both jobs complete successfully- Health check should now return "healthy" status within retry window
Why this change is needed:Qdrant Docker image does not have curl or wget pre-installed,causing health check to always fail and container to be markedas unhealthy after timeout.How it solves it:Remove health check from Qdrant service container configuration.The E2E test already has a "Wait for Qdrant" step that uses curlfrom the runner environment to verify service readiness beforerunning tests.Impact:- Qdrant container will start immediately without health check delays- Service readiness still verified by test-level wait step- Eliminates container startup failuresTesting:Next CI run should successfully start Qdrant container and passthe wait/verify steps in the test workflow.
Changes made:- Updated the batch insert logic to use a dictionary for row values, improving clarity and ensuring compatibility with the database execution method.- Adjusted the insert query construction to utilize named parameters, enhancing readability and maintainability.Impact:- Streamlines the insertion process and reduces potential errors related to parameter binding.Testing:- Functionality remains intact; no new tests required as existing tests cover the insert operations.
Why this change is needed:E2E tests were failing with TypeError because they used non-existentparameters kv_storage_cls_kwargs, graph_storage_cls_kwargs, anddoc_status_storage_cls_kwargs. These parameters do not exist inLightRAG's __init__ method.How it solves it:Removed the three non-existent parameters from all LightRAG initializationsin test_e2e_multi_instance.py:- test_legacy_migration_postgres- test_multi_instance_postgres (both instances A and B)PostgreSQL storage classes (PGKVStorage, PGGraphStorage, PGDocStatusStorage)use ClientManager which reads configuration from environment variables(POSTGRES_HOST, POSTGRES_PORT, etc.) that are already set in the E2Eworkflow, so no additional kwargs are needed.Impact:- Fixes TypeError on LightRAG initialization- E2E tests can now properly instantiate with PostgreSQL storages- Configuration still works via environment variablesTesting:Next E2E run should successfully initialize LightRAG instancesand proceed to actual migration/multi-instance testing.
Why this change is needed:E2E tests were failing with:"ValueError: Storage implementation 'PGKVStorage' requires the followingenvironment variables: POSTGRES_DATABASE"The workflow was setting POSTGRES_DB but LightRAG's check_storage_env_vars()expects POSTGRES_DATABASE (matching ClientManager.get_config()).How it solves it:Changed environment variable name from POSTGRES_DB to POSTGRES_DATABASEin the "Run PostgreSQL E2E tests" step.Impact:- PGKVStorage, PGGraphStorage, and PGDocStatusStorage can now properly  initialize using ClientManager's configuration- Fixes ValueError during LightRAG initializationTesting:Next E2E run should pass environment variable validation and proceedto actual test execution.
Why this change is needed:Previous wait strategy used `/health` endpoint with `-f` flag and only30 second timeout, causing timeouts in GitHub Actions.How it solves it:- Use root endpoint `/` instead of `/health` (Qdrant API root responds)- Remove `-f` flag to accept any response (not just 2xx)- Increase timeout from 30s to 60s- Add progress output for each attempt- Add clear error message on failureImpact:More reliable Qdrant service detection in E2E testsTesting:Will verify on GitHub Actions E2E test run
Why this change is needed:Tests were accessing rag.chunk_entity_relation_graph.chunk_vdb whichdoesn't exist. The chunk_entity_relation_graph is a BaseGraphStorageand doesn't have a chunk_vdb attribute.How it solves it:Changed all occurrences to use direct LightRAG attributes:- rag.chunks_vdb.table_name (PostgreSQL)- rag.chunks_vdb.final_namespace (Qdrant)Impact:Fixes AttributeError that would occur when E2E tests runTesting:Will verify on GitHub Actions E2E test run
Why these changes are needed:1. LightRAG wraps embedding_func with priority_limit_async_func_call   decorator, causing loss of get_model_identifier method2. UnifiedLock.__aexit__ set main_lock_released flag incorrectlyHow it solves them:1. _generate_collection_suffix now tries multiple approaches:   - First check if embedding_func has get_model_identifier   - Fallback to original EmbeddingFunc in global_config   - Return empty string for backward compatibility2. Move main_lock_released = True inside the if block so flag   is only set when lock actually exists and is releasedImpact:- Fixes E2E tests that initialize complete LightRAG instances- Fixes incorrect async lock cleanup in exception scenarios- Maintains backward compatibilityTesting:All unit tests pass (test_qdrant_migration.py, test_postgres_migration.py)
Why this change is needed:asdict() converts nested dataclasses to dicts. When LightRAG createsglobal_config with asdict(self), the embedding_func field (which is anEmbeddingFunc dataclass) gets converted to a plain dict, losing itsget_model_identifier() method.How it solves it:1. Save original EmbeddingFunc object before asdict() call2. Restore it in global_config after asdict()3. Add null check and debug logging in _generate_collection_suffixImpact:- E2E tests with full LightRAG initialization now work correctly- Vector storage model isolation features function properly- Maintains backward compatibilityTesting:All unit tests pass (12/12 in migration tests)
Why this change is needed:The legacy_namespace logic was incorrectly including workspace in thecollection name, causing migration to fail in E2E tests. When workspacewas set (e.g., to a temp directory path), legacy_namespace became"/tmp/xxx_chunks" instead of "lightrag_vdb_chunks", so the migrationlogic couldn't find the legacy collection.How it solves it:Changed legacy_namespace to always use the old naming scheme withoutworkspace prefix: "lightrag_vdb_{namespace}". This matches the actualcollection names from pre-migration code and aligns with PostgreSQL'sapproach where legacy_table_name = base_table (without workspace).Impact:- Qdrant legacy data migration now works correctly in E2E tests- All unit tests pass (6/6 for both Qdrant and PostgreSQL)- E2E test_legacy_migration_qdrant should now passTesting:- Unit tests: pytest tests/test_qdrant_migration.py -v (6/6 passed)- Unit tests: pytest tests/test_postgres_migration.py -v (6/6 passed)- Updated test_qdrant_collection_naming to verify new legacy_namespace
Why this change is needed:PostgreSQLDB class doesn't have a fetch() method. The migration codewas incorrectly using db.fetch() for batch data retrieval, causingAttributeError during E2E tests.How it solves it:1. Changed db.fetch(sql, params) to db.query(sql, params, multirows=True)2. Updated all test mocks to support the multirows parameter3. Consolidated mock_query implementation to handle both single and multi-row queriesImpact:- PostgreSQL legacy data migration now works correctly in E2E tests- All unit tests pass (6/6)- Aligns with PostgreSQLDB's actual APITesting:- pytest tests/test_postgres_migration.py -v (6/6 passed)- Updated test_postgres_migration_trigger mock- Updated test_scenario_2_legacy_upgrade_migration mock- Updated base mock_pg_db fixture
…n CI)Why this change is needed:E2E PostgreSQL tests were failing because they specified graph_storage="PGGraphStorage",but the CI environment doesn't have the Apache AGE extension installed. This causedinitialize_storages() to fail with "function create_graph(unknown) does not exist".How it solves it:Removed graph_storage="PGGraphStorage" parameter in all PostgreSQL E2E tests,allowing LightRAG to use the default NetworkXStorage which doesn't requireexternal dependencies.Impact:- PostgreSQL E2E tests can now run successfully in CI- Vector storage migration tests can complete without AGE extension dependency- Maintains test coverage for vector storage model isolation featureTesting:The vector storage migration tests (which are the focus of this PR) don'tdepend on graph storage implementation and can run with NetworkXStorage.
Remove unused embedding functions (C and D) that were defined but neverused, causing F841 lint errors.Also fix E712 errors by using 'is True' instead of '== True' forboolean comparisons in assertions.Testing:- All pre-commit hooks pass- Verified with: uv run pre-commit run --all-files
Implement intelligent legacy collection detection to support multiplenaming patterns from older LightRAG versions:1. lightrag_vdb_{namespace} - Current legacy format2. {workspace}_{namespace} - Old format with workspace3. {namespace} - Old format without workspaceThis ensures users can seamlessly upgrade from any previous versionwithout manual data migration.Also add comprehensive test coverage for all migration scenarios:- Case 1: Both new and legacy exist (warning)- Case 2: Only new exists (already migrated)- Backward compatibility with old workspace naming- Backward compatibility with no-workspace naming- Empty legacy collection handling- Workspace isolation verification- Model switching scenarioTesting:- All 15 migration tests pass- No breaking changes to existing tests- Verified with: pytest tests/test_*migration*.py -v
Why this change is needed:1. uv.lock revision was downgraded from 3 to 2, causing potential   dependency resolution issues2. Code formatting in test_e2e_multi_instance.py did not match   ruff-format requirementsHow it solves it:1. Restored uv.lock from main branch to get revision 3 back2. Ran ruff format to auto-fix code formatting issues:   - Split long print statement into multiple lines   - Split long VectorParams instantiation into multiple linesImpact:- uv.lock now has correct revision number (3 instead of 2)- Code formatting now passes pre-commit ruff-format checks- Consistent with main branch dependency resolutionTesting:- Verified uv.lock revision: head -3 uv.lock shows "revision = 3"- Verified formatting: uv run ruff format tests/test_e2e_multi_instance.py  reports "1 file reformatted"
Why this change is needed:CI lint checks were failing due to ruff-format violations in assert statements.How it solves it:Applied pre-commit ruff-format rules to reformat assert statementsto match the preferred style (condition on new line before error message).Impact:- Fixes all remaining lint errors in test_e2e_multi_instance.py- Ensures CI passes for PRHKUDS#2391Testing:Ran 'uv run pre-commit run --files tests/test_e2e_multi_instance.py'which reformatted 1 file with ~15-20 assert statement fixes.
@BukeLyBukeLy closed thisNov 20, 2025
@BukeLyBukeLy reopened thisNov 20, 2025
@BukeLyBukeLy marked this pull request as ready for reviewNovember 20, 2025 04:33
@danielaskdd
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ffix restartsWhy this change is needed:Two critical issues were identified in Codex review of PRHKUDS#2391:1. Migration fails when legacy collections/tables use different embedding dimensions   (e.g., upgrading from 1536d to 3072d models causes initialization failures)2. When model_suffix is empty (no model_name provided), table_name equals legacy_table_name,   causing Case 1 logic to delete the only table/collection on second startupHow it solves it:- Added dimension compatibility checks before migration in both Qdrant and PostgreSQL- PostgreSQL uses two-method detection: pg_attribute metadata query + vector sampling fallback- When dimensions mismatch, skip migration and create new empty table/collection, preserving legacy data- Added safety check to detect when new and legacy names are identical, preventing deletion- Both backends log clear warnings about dimension mismatches and skipped migrationsImpact:- lightrag/kg/qdrant_impl.py: Added dimension check (lines 254-297) and no-suffix safety (lines 163-169)- lightrag/kg/postgres_impl.py: Added dimension check with fallback (lines 2347-2410) and no-suffix safety (lines 2281-2287)- tests/test_no_model_suffix_safety.py: New test file with 4 test cases covering edge scenarios- Backward compatible: All existing scenarios continue working unchangedTesting:- All 20 tests pass (16 existing migration tests + 4 new safety tests)- E2E tests enhanced with explicit verification points for dimension mismatch scenarios- Verified graceful degradation when dimension detection fails- Code style verified with ruff and pre-commit hooks
…ationWhy this change is needed:Two critical P0 security vulnerabilities were identified in CursorReview:1. UnifiedLock silently allows unprotected execution when lock is None, creating   false security and potential race conditions in multi-process scenarios2. PostgreSQL migration copies ALL workspace data during legacy table migration,   violating multi-tenant isolation and causing data leakageHow it solves it:- UnifiedLock now raises RuntimeError when lock is None instead of WARNING- Added workspace parameter to setup_table() for proper data isolation- Migration queries now filter by workspace in both COUNT and SELECT operations- Added clear error messages to help developers diagnose initialization issuesImpact:- lightrag/kg/shared_storage.py: UnifiedLock raises exception on None lock- lightrag/kg/postgres_impl.py: Added workspace filtering to migration logic- tests/test_unified_lock_safety.py: 3 tests for lock safety- tests/test_workspace_migration_isolation.py: 3 tests for workspace isolation- tests/test_dimension_mismatch.py: Updated table names and mocks- tests/test_postgres_migration.py: Updated mocks for workspace filteringTesting:- All 31 tests pass (16 migration + 4 safety + 3 lock + 3 workspace + 5 dimension)- Backward compatible: existing code continues working unchanged- Code style verified with ruff and pre-commit hooks
Why this change is needed:Add end-to-end test to verify the P0 bug fix for cross-workspace dataleakage during PostgreSQL migration. Unit tests use mocks and cannot verifythat real SQL queries correctly filter by workspace in actual database.What this test does:- Creates legacy table with MIXED data (workspace_a + workspace_b)- Initializes LightRAG for workspace_a only- Verifies ONLY workspace_a data migrated to new table- Verifies workspace_b data NOT leaked to new table (0 records)- Verifies workspace_b data preserved in legacy table (3 records)- Verifies workspace_a data cleaned from legacy after migration (0 records)Impact:- tests/test_e2e_multi_instance.py: Add test_workspace_migration_isolation_e2e_postgres- Validates multi-tenant isolation in real PostgreSQL environment- Prevents regression of critical security fixTesting:E2E test passes with real PostgreSQL container, confirming workspacefiltering works correctly with actual SQL execution.
Problem:When UnifiedLock.__aexit__ encountered an exception during async_lock.release(),the error recovery logic would incorrectly attempt to release async_lock againbecause it only checked main_lock_released flag. This could cause:- Double-release attempts on already-failed locks- Masking of original exceptions- Undefined behavior in lock stateRoot Cause:The recovery logic used only main_lock_released to determine whether to attemptasync_lock release, without tracking whether async_lock.release() had alreadybeen attempted and failed.Fix:- Added async_lock_released flag to track async_lock release attempts- Updated recovery logic condition to check both main_lock_released AND  async_lock_released before attempting async_lock release- This ensures async_lock.release() is only called once, even if it failsTesting:- Added test_aexit_no_double_release_on_async_lock_failure:  Verifies async_lock.release() is called only once when it fails- Added test_aexit_recovery_on_main_lock_failure:  Verifies recovery logic still works when main lock fails- All 5 UnifiedLock safety tests passImpact:- Eliminates double-release bugs in multiprocess lock scenarios- Preserves correct error propagation- Maintains recovery logic for legitimate failure casesFiles Modified:- lightrag/kg/shared_storage.py: Added async_lock_released tracking- tests/test_unified_lock_safety.py: Added 2 new tests (5 total now pass)
…le creationThis commit fixes two critical issues in PostgreSQL storage:BUG 1: Legacy table cleanup causing data loss across workspaces---------------------------------------------------------------PROBLEM:- After migrating workspace_a data from legacy table, the ENTIRE legacy  table was deleted- This caused workspace_b's data (still in legacy table) to be lost- Multi-tenant data isolation was violatedFIX:- Implement workspace-aware cleanup: only delete migrated workspace's data- Check if other workspaces still have data before dropping table- Only drop legacy table when it becomes completely empty- If other workspace data exists, preserve legacy table with remaining recordsLocation: postgres_impl.py PGVectorStorage.setup_table() lines 2510-2567Test verification:- test_workspace_migration_isolation_e2e_postgres validates this fixBUG 2: PGDocStatusStorage missing table initialization-------------------------------------------------------PROBLEM:- PGDocStatusStorage.initialize() only set workspace, never created table- Caused "relation 'lightrag_doc_status' does not exist" errors- document insertion (ainsert) failed immediatelyFIX:- Add table creation to initialize() method using _pg_create_table()- Consistent with other storage implementations:  * MongoDocStatusStorage creates collections  * JsonDocStatusStorage creates directories  * PGDocStatusStorage now creates tables ✓Location: postgres_impl.py PGDocStatusStorage.initialize() lines 2965-2971Test Results:- Unit tests: 13/13 passed (test_unified_lock_safety,  test_workspace_migration_isolation, test_dimension_mismatch)- E2E tests require PostgreSQL serverRelated: PRHKUDS#2391 (Vector Storage Model Isolation)
Critical Bug Fix:PostgreSQLDB.execute() expects data as dict, but workspace cleanupwas passing a list [workspace], causing cleanup to fail with"PostgreSQLDB.execute() expects data as dict, got list" error.Changes:1. Fixed postgres_impl.py:2522   - Changed: await db.execute(delete_query, [workspace])   - To: await db.execute(delete_query, {"workspace": workspace})2. Improved test_postgres_migration.py mock   - Enhanced COUNT(*) mock to properly distinguish between:     * Legacy table with workspace filter (returns 50)     * Legacy table without filter after deletion (returns 0)     * New table verification (returns 50)   - Uses storage.legacy_table_name dynamically instead of hardcoded strings   - Detects table type by checking for model suffix patterns3. Fixed test_unified_lock_safety.py formatting   - Applied ruff formatting to assert statementImpact:- Workspace-aware legacy cleanup now works correctly- Legacy tables properly deleted when all workspace data migrated- Legacy tables preserved when other workspace data remainsTests: All 25 unit tests pass
Apply ruff-format fixes to 6 test files to pass pre-commit checks:- test_dimension_mismatch.py- test_e2e_multi_instance.py- test_no_model_suffix_safety.py- test_postgres_migration.py- test_unified_lock_safety.py- test_workspace_migration_isolation.pyChanges are primarily assert statement reformatting to match ruff style guide.
@danielaskdd
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connectorchatgpt-codex-connectorbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Why this change is needed:In multi-tenant deployments, when workspace A migrates first (creatingthe new model-suffixed table), subsequent workspace B initializationenters Case 1 (both tables exist). The original Case 1 logic onlychecked if the legacy table was empty globally, without checking ifthe current workspace had unmigrated data. This caused workspace B'sdata to remain in the legacy table while the application queried thenew table, resulting in data loss for workspace B.How it solves the problem:1. Extracted migration logic into _pg_migrate_workspace_data() helper   function to avoid code duplication2. Modified Case 1 to check if current workspace has data in legacy   table and migrate it if found3. Both Case 1 and Case 4 now use the same migration helper, ensuring   consistent behavior4. After migration, only delete the current workspace's data from   legacy table, preserving other workspaces' dataImpact:- Prevents data loss in multi-tenant PostgreSQL deployments- Maintains backward compatibility with single-tenant setups- Reduces code duplication between Case 1 and Case 4Testing:All PostgreSQL migration tests pass (8/8)
Add test_case1_sequential_workspace_migration to verify the fix forthe multi-tenant data loss bug in PostgreSQL Case 1 migration.Problem:- When workspace_a migrates first (Case 4: only legacy table exists)- Then workspace_b initializes later (Case 1: both tables exist)- Bug: Case 1 only checked if legacy table was globally empty- Result: workspace_b's data was not migrated, causing data lossTest Scenario:1. Legacy table contains data from both workspace_a (3 records) and   workspace_b (3 records)2. workspace_a initializes first → triggers Case 4 migration3. workspace_b initializes second → triggers Case 1 migration4. Verify workspace_b's data is correctly migrated to new table5. Verify workspace_b's data is deleted from legacy table6. Verify legacy table is dropped when emptyThis test uses mock tracking of inserted records to verify migrationbehavior without requiring a real PostgreSQL database.Related: GitHub PRHKUDS#2391 comment #2553973066
@BukeLyBukeLy closed thisNov 25, 2025
@BukeLyBukeLy reopened thisNov 25, 2025
@danielaskdd
Copy link
Collaborator

@codex review

@chatgpt-codex-connector

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@BukeLyBukeLyforce-pushed thefeature/vector-model-isolation branch from310da3e toee38449CompareNovember 25, 2025 18:24
Why this change is needed:1. Added clarifying comments to _pg_migrate_workspace_data() parameter handling2. Removed dead code from PGDocStatusStorage.initialize() that was never executedChanges:1. PostgreSQL Migration Parameter Documentation (lightrag/kg/postgres_impl.py:2240-2241):   - Added comments explaining dict rebuild for correct value ordering   - Clarifies that Python 3.7+ dict insertion order is relied upon   - Documents that execute() converts dict to tuple via .values()2. Dead Code Removal (lightrag/kg/postgres_impl.py:3061-3062):   - Removed unreachable table creation code from PGDocStatusStorage.initialize()   - Table is already created by PostgreSQLDB.initdb() during initialization   - This code path was never executed as table always exists before initialize() is called   - Added NOTE comment explaining where table creation actually happensImpact:- No functional changes - only code clarification and cleanup- Reduces maintenance burden by removing unreachable code- Improves code readability with better documentationTesting:- All 14 PostgreSQL migration tests pass- All 5 UnifiedLock safety tests pass- Pre-commit checks pass (ruff-format, ruff)
@BukeLyBukeLyforce-pushed thefeature/vector-model-isolation branch fromee38449 tocf68cdfCompareNovember 25, 2025 18:29
@BukeLy
Copy link
ContributorAuthor

@danielaskdd Hi Daniel any updates about the PR?

@danielaskdd
Copy link
Collaborator

The data migration logic currently contains multiple issues, particularly in handling various legacy data schema version transitions. Due to the complexity and intricacy of the problem, it is challenging to fully convey the details in text. I will submit a PR to address these issues directly. See: PR#2513.

BukeLy reacted with heart emoji

@danielaskdddanielaskdd closed this pull request by merging all changes intoHKUDS:main inde48940Dec 21, 2025
@BukeLyBukeLy deleted the feature/vector-model-isolation branchDecember 22, 2025 03:20
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@chatgpt-codex-connectorchatgpt-codex-connector[bot]chatgpt-codex-connector[bot] left review comments

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@BukeLy@danielaskdd

Comments


[8]ページ先頭

©2009-2026 Movatter.jp