Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Search optimization and indexing based on datetime#405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
GrzegorzPustulka wants to merge6 commits intostac-utils:main
base:main
Choose a base branch
Loading
fromGrzegorzPustulka:search_optimization

Conversation

GrzegorzPustulka
Copy link
Contributor

@GrzegorzPustulkaGrzegorzPustulka commentedJun 18, 2025
edited
Loading

Related Issue(s):

Index Management System with Time-based Partitioning

Description

This PR introduces a new index management system that enables automatic index partitioning based on dates and index size control with automatic splitting.

How it works

System Architecture

The system consists of several main components:

1. Search Engine Adapters

  • SearchEngineAdapter - base class
  • ElasticsearchAdapter andOpenSearchAdapter - implementations for specific engines

2. Index Selection Strategies

  • AsyncDatetimeBasedIndexSelector /SyncDatetimeBasedIndexSelector - date-based index filtering
  • UnfilteredIndexSelector - returns all indexes (fallback)
  • Cache with TTL (default 1 hour) for performance

3. Data Insertion Strategies

  • Simple strategy: one index per collection (behavior as before)
  • Datetime strategy: indexes partitioned by dates with automatic partitioning

Datetime Strategy - Operation Details

Index Format:

items_collection-name_2025-01-01-2025-03-31

Item Insertion Process:

  1. System checks item date (properties.datetime)
  2. Looks for existing index that covers this date
  3. If not found - creates new index from this date
  4. Checks target index size
  5. If exceeds limit (DATETIME_INDEX_MAX_SIZE_GB) - splits index

Early Date Handling:
If item has date earlier than oldest index:

  1. Creates new index from this earlier date
  2. Updates oldest index alias to end one day before new date

Index Splitting:
When index exceeds size limit:

  1. Updates current index alias to end on last item's date
  2. Creates new index from next day
  3. New items go to new index

Cache and Performance

IndexCacheManager:

  • Stores mapping of collection aliases to index lists
  • TTL default 1 hour
  • Automatic refresh on expiration
  • Manual refresh after index modifications

AsyncIndexAliasLoader / SyncIndexAliasLoader:

  • Load alias information from search engine
  • Use cache manager to store results
  • Async and sync versions for different usage contexts

Configuration

New Environment Variables:

# Enable datetime strategy (default false)ENABLE_DATETIME_INDEX_FILTERING=true# Maximum index size in GB before splitting (default 25)DATETIME_INDEX_MAX_SIZE_GB=50

Usage Examples

Scenario 1: Adding items to new collection

  1. First item with date2025-01-15 → creates indexitems_collection_2025-01-15
  2. Subsequent items with similar dates → go to same index

Scenario 2: Size limit exceeded

  1. Indexitems_collection_2025-01-01 reaches 25GB
  2. New item with date2025-03-15 → system splits index:
    • Old:items_collection_2025-01-01-2025-03-15
    • New:items_collection_2025-03-16

Scenario 3: Item with early date

  1. Existing index:items_collection_2025-02-01
  2. New item with date2024-12-15 → creates:
    • New:items_collection_2024-12-15-2025-01-31

Search

System automatically filters indexes during search:

Query with date range:

{"datetime": {"gte":"2025-02-01","lte":"2025-02-28"  }}

Searches only indexes containing items from this period, instead of all collection indexes.

Factories

IndexSelectorFactory:

  • Creates appropriate selector based on configuration
  • create_async_selector() /create_sync_selector()

IndexInsertionFactory:

  • Creates insertion strategy based on configuration
  • Automatically detects engine type and creates appropriate adapter

SearchEngineAdapterFactory:

  • Detects whether you're using Elasticsearch or OpenSearch
  • Creates appropriate adapter with engine-specific methods

Backward Compatibility

  • WhenENABLE_DATETIME_INDEX_FILTERING=false → works as before
  • Existing indexes remain unchanged

All operations have sync and async versions for different usage contexts in the application.

PR Checklist:

  • Code is formatted and linted (runpre-commit run --all-files)
  • Tests pass (runmake test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

jonhealy1 reacted with thumbs up emoji
@GrzegorzPustulkaGrzegorzPustulka marked this pull request as ready for reviewJuly 7, 2025 20:01
@GrzegorzPustulka
Copy link
ContributorAuthor

GrzegorzPustulka commentedJul 8, 2025
edited
Loading

@jonhealy1
@StijnCaerts
@jamesfisher-geo

The MR is already finished and ready for code review.

jamesfisher-geo reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

1 participant
@GrzegorzPustulka

[8]ページ先頭

©2009-2025 Movatter.jp