Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Refactor: Add PostgreSQL Connection Retry Mechanism with Network Robustness#2192

Merged
danielaskdd merged 3 commits intoHKUDS:mainfrom
danielaskdd:postgres-network-retry
Oct 9, 2025
Merged

Refactor: Add PostgreSQL Connection Retry Mechanism with Network Robustness#2192
danielaskdd merged 3 commits intoHKUDS:mainfrom
danielaskdd:postgres-network-retry

Conversation

@danielaskdd
Copy link
Collaborator

@danielaskdddanielaskdd commentedOct 9, 2025
edited
Loading

Add PostgreSQL Connection Retry Mechanism with Network Robustness

Summary

Adds automatic retry logic with exponential backoff for PostgreSQL database operations to handle transient network failures gracefully. This enhancement significantly improves system reliability in production environments with unstable network connections.


Problem Statement

Currently, the PostgreSQL implementation lacks resilience against transient network issues:

  • ❌ Single connection failure causes immediate operation failure
  • ❌ No automatic recovery from temporary network interruptions
  • ❌ Connection pool issues can cascade into system-wide failures
  • ❌ No protection against connection pool deadlocks during cleanup

This makes the system fragile in production environments with:

  • Unstable network connections
  • Database server restarts
  • Network infrastructure maintenance
  • Temporary connection pool exhaustion

Solution

Implements a comprehensive retry mechanism using thetenacity library with the following features:

Core Retry Logic

  • Automatic retries on transient failures (configurable, default: 3 attempts)
  • Exponential backoff with configurable min/max delays
  • Connection pool reset after failures to ensure fresh connections
  • Thread-safe pool management with asyncio locks
  • Timeout protection for pool cleanup operations

Transient Error Handling

Automatically retries on:

  • asyncio.TimeoutError /TimeoutError
  • ConnectionError /OSError
  • asyncpg.exceptions.InterfaceError
  • asyncpg.exceptions.TooManyConnectionsError
  • asyncpg.exceptions.CannotConnectNowError
  • asyncpg.exceptions.PostgresConnectionError
  • asyncpg.exceptions.ConnectionDoesNotExistError
  • asyncpg.exceptions.ConnectionFailureError

Non-Transient Errors

Correctly handles without retry:

  • UniqueViolationError
  • DuplicateTableError
  • Other application-level errors

Technical Implementation

Key Changes tolightrag/kg/postgres_impl.py

  1. New Retry Configuration (lines 82-119)

    # Configurable via environment variablesself.connection_retry_attempts=max(1,min(10,int(os.environ.get("POSTGRES_CONNECTION_RETRIES",3))))self.connection_retry_backoff=max(0.1,min(5.0,float(os.environ.get("POSTGRES_CONNECTION_RETRY_BACKOFF",0.5))))self.connection_retry_backoff_max=max(self.connection_retry_backoff,min(60.0,float(...)))self.pool_close_timeout=max(1.0,min(30.0,float(os.environ.get("POSTGRES_POOL_CLOSE_TIMEOUT",5.0))))
  2. Central Retry Method_run_with_retry() (lines 298-346)

    • Orchestrates retry logic for all database operations
    • Ensures pool availability before operations
    • Handles AGE configuration when needed
    • Implements exponential backoff strategy
  3. Pool Management Methods

    • _ensure_pool(): Lazy pool initialization with double-check locking
    • _reset_pool(): Safe pool cleanup with timeout protection
    • _before_sleep(): Retry callback with logging and pool reset
  4. Enhanced Error Reporting (lines 3296-3299)

    "detail":repr(e),# Better debugging info"error_type":e.__class__.__name__,# Exception type tracking
  5. Refactored Methods

    • query(): Now uses_run_with_retry()
    • execute(): Improved error handling with proper logging
    • initdb(): Retry logic for initialization

Configuration

All parameters configurable via environment variables with safe defaults:

Environment VariableDefaultMinMaxDescription
POSTGRES_CONNECTION_RETRIES3110Number of retry attempts
POSTGRES_CONNECTION_RETRY_BACKOFF0.5s0.1s5.0sInitial backoff delay
POSTGRES_CONNECTION_RETRY_BACKOFF_MAX5.0sbackoff60.0sMaximum backoff delay
POSTGRES_POOL_CLOSE_TIMEOUT5.0s1.0s30.0sPool close timeout

Example configuration:

POSTGRES_CONNECTION_RETRIES=3POSTGRES_CONNECTION_RETRY_BACKOFF=0.5POSTGRES_CONNECTION_RETRY_BACKOFF_MAX=5.0POSTGRES_POOL_CLOSE_TIMEOUT=5.0

Behavior Changes

Improved Reliability

  • Before: Single network hiccup → immediate failure
  • After: Automatic retry with exponential backoff → eventual success

Better Logging

  • Warning logs for each retry attempt with attempt number
  • Detailed error information withrepr() instead ofstr()
  • Exception type tracking for better debugging

Configuration Visibility

Startup log shows active retry configuration:

INFO: PostgreSQL, Retry config: attempts=3, backoff=0.50s, backoff_max=5.00s, pool_close_timeout=5.00s

Breaking Changes

None. This is a backward-compatible enhancement:

  • ✅ All existing code continues to work unchanged
  • ✅ Default behavior same as before (with retries enabled)
  • ✅ Optional environment variables for customization
  • ✅ No API changes

Performance Impact

  • Normal operations: Negligible overhead (single pool check)
  • Failure scenarios: Additional latency from retries (expected trade-off)
  • Concurrent operations: No additional overhead (lock-free in success path)

…ndling• Implement connection retry with backoff• Add transient error detection• Pool management with timeout guards
- Add retry environment variables- Fix asyncpg import in retry tests
• Move retry config to ClientManager• Remove env var parsing from PostgreSQLDB• Add config params to test setup
@danielaskdddanielaskdd changed the title## Refactor: PostgreSQL Connection Retry Mechanism with Network Robustness## Refactor: Add PostgreSQL Connection Retry Mechanism with Network RobustnessOct 9, 2025
@danielaskdddanielaskdd merged commitb4d61eb intoHKUDS:mainOct 9, 2025
1 check passed
@danielaskdddanielaskdd deleted the postgres-network-retry branchOctober 10, 2025 07:37
@danielaskdddanielaskdd changed the title## Refactor: Add PostgreSQL Connection Retry Mechanism with Network RobustnessRefactor: Add PostgreSQL Connection Retry Mechanism with Network RobustnessOct 11, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@danielaskdd

Comments


[8]ページ先頭

©2009-2026 Movatter.jp