- Notifications
You must be signed in to change notification settings - Fork3.8k
Description
Summary
Use bloom filters at the delegator level to pre-filter segments by primary key values before dispatching search/query requests to QueryNodes. This significantly reduces the number of segments that need to be searched when queries contain PK term filters.
Motivation
When users search or query with a primary key filter (e.g.,pk in [1, 2, 3]), every segment on every QueryNode currently processes the request, even though most segments don't contain the target PKs. This wastes CPU and I/O resources, especially for large collections with many segments.
Design
Proxy hint detection: The proxy analyzes the filter expression and sets a
HasPKFilterhint flag in the search/query request when it detects an optimizable PK term filter.Delegator bloom filter check: When the delegator receives a request with the PK filter hint, it:
- Unmarshals the plan to extract PK values
- Checks each segment's bloom filter against the PK values
- Builds per-segment PK hint sets containing only the PKs thatmay exist in that segment
- Skips segments where no PKs pass the bloom filter check
C++ TermExpr optimization: The narrowed PK hint sets are passed through to the C++ execution layer via QueryContext, allowing TermExpr to use the pre-filtered PK values instead of the original full set.
Key Changes
- internal/proxy/task_search.go / task_query.go: PK filter hint detection at proxy
- internal/querynodev2/delegator/segment_hint_builder.go: Bloom filter-based segment hint builder
- internal/core/src/exec/expression/TermExpr.cpp: C++ side PK hint consumption
- pkg/proto/plan.proto / internal.proto: New proto fields for PK hints
- pkg/metrics/querynode_metrics.go: New metrics for hint effectiveness tracking
Configuration
- queryNode.pkFilterHint.enabled: Enable/disable the feature (default: true)
- queryNode.pkFilterHint.maxPKCount: Maximum PK count threshold for hint optimization (default: 1000)