- Notifications
You must be signed in to change notification settings - Fork715
feat: add super cluster support for metrics#9225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
6c96941 to5070245CompareFailed to generate code suggestions for PR |
Greptile OverviewGreptile Summaryimplemented metrics super cluster v1 support that enables cross-region query aggregation by having each querier fetch data from both local and remote regions, then computing results in the leader region Key Changes
Issues Found
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram participant User participant LeaderRegion as Leader Region (HTTP Handler) participant Querier1 as Querier (Region 1) participant Querier2 as Querier (Region 2) participant LocalStorage as Local Storage participant RemoteRegion as Remote Region (gRPC) User->>LeaderRegion: POST /api/{org}/prometheus/api/v1/query_range Note over User,LeaderRegion: Query params: query, start, end, step, regions, clusters LeaderRegion->>LeaderRegion: Parse parameters and check super_cluster enabled LeaderRegion->>LeaderRegion: Partition time range by querier count par Parallel Querier Dispatch LeaderRegion->>Querier1: gRPC Metrics.Query (time partition 1) LeaderRegion->>Querier2: gRPC Metrics.Query (time partition 2) end par Each Querier Processing Querier1->>LocalStorage: Load local region data LocalStorage-->>Querier1: Local metrics data alt Super Cluster Enabled Querier1->>RemoteRegion: gRPC Metrics.Data (same time range) RemoteRegion->>RemoteRegion: Query metrics from remote region RemoteRegion-->>Querier1: Stream metrics data Querier1->>Querier1: Merge local + remote data end Querier1->>Querier1: Execute PromQL computation Querier1-->>LeaderRegion: Return computed results Querier2->>LocalStorage: Load local region data LocalStorage-->>Querier2: Local metrics data alt Super Cluster Enabled Querier2->>RemoteRegion: gRPC Metrics.Data (same time range) RemoteRegion-->>Querier2: Stream metrics data Querier2->>Querier2: Merge local + remote data end Querier2->>Querier2: Execute PromQL computation Querier2-->>LeaderRegion: Return computed results end LeaderRegion->>LeaderRegion: Merge results from all queriers LeaderRegion->>LeaderRegion: Apply final aggregations LeaderRegion-->>User: Return final PromQL result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
18 files reviewed, 2 comments
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Pull request overview
This PR implements super cluster support for metrics queries, enabling a leader region to fetch and aggregate data from multiple regions. The implementation follows a distributed query pattern where the leader region partitions requests by time range, dispatches them to queriers across all regions, and merges the results.
Key changes:
- Added new gRPC streaming service
metrics.datafor cross-region data retrieval - Introduced
QueryContextstruct to encapsulate query execution parameters - Added API parameters:
search_type,regions, andclustersfor region/cluster selection - Migrated from
std::collections::HashSettohashbrown::HashSetacross promql modules
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/service/search/super_cluster/leader.rs | Renamed variables from "nodes" to "clusters" for clarity in super cluster context |
| src/service/promql/search/mod.rs | Refactored cache logic to use booleanuse_cache instead of negatedcache_disabled; added job ID generation from trace_id |
| src/service/promql/search/grpc/mod.rs | Added newdata() function for streaming metrics responses with time-range partitioning |
| src/service/promql/search/grpc/wal.rs | Changedlabel_selector parameter fromOption<HashSet> toHashSet |
| src/service/promql/engine.rs | Integrated super cluster data loading with local cluster data; refactored to useQueryContext; updated all tests |
| src/service/promql/exec.rs | RefactoredPromqlContext to use newQueryContext struct for better parameter organization |
| src/service/promql/utils.rs | Simplifiedapply_label_selector to acceptHashSet directly instead ofOption<HashSet> |
| src/service/promql/mod.rs | Added new fields toMetricsQueryRequest for super cluster support |
| src/service/alerts/mod.rs | Updated alert evaluation to check for super cluster configuration |
| src/proto/proto/cluster/metrics.proto | Added new gRPCData streaming method and new fields for super cluster configuration |
| src/proto/src/generated/cluster.rs | Generated code from proto changes |
| src/handler/grpc/request/metrics/querier.rs | Implemented newdata() gRPC method for streaming metrics data |
| src/handler/grpc/mod.rs | Updated conversion logic forMetricsQueryRequest |
| src/handler/http/request/promql/mod.rs | Added super cluster detection and new API parameters handling |
| src/config/src/meta/promql/value.rs | AddedQueryContext struct for query execution parameters |
| src/config/src/meta/promql/mod.rs | Added custom deserializer for comma-separated or array regions/clusters parameters |
| src/config/src/meta/cluster.rs | Addedis_local() method toNodeInfo trait |
| src/config/src/cluster.rs | Refactored to extractget_local_http_addr() andget_local_grpc_addr() helper functions |
| src/common/infra/cluster/nats.rs | Used new helper functions for consistent address generation |
| src/infra/src/table/users.rs | Made user inserts idempotent by treating unique constraint violations as success |
| src/infra/src/table/organizations.rs | Made organization inserts idempotent by treating unique constraint violations as success |
| src/infra/src/table/org_users.rs | Made org_user inserts idempotent by treating unique constraint violations as success |
💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
fa0672a intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Metrics super cluster v1
This version we implement a simple solution that leader region fetch data from other regions and only compute the result in leader region.
The mainly logic is:
What we changed
metrics.datathat allow you get metrics data from other region.API changes
Form paramaters
Field descriptions:
querystartendstep15s,1m) or float number of secondstimeout30s,1m)use_cache*use_streaming*search_type*regions,for multiple region, e.g.c1,c2*clusters,for multiple region, e.g.c1,c2