Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: add super cluster support for metrics#9225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
hengfeiyang merged 20 commits intomainfromfeat/metrics-supercluster
Nov 27, 2025

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyanghengfeiyang commentedNov 21, 2025
edited
Loading

Metrics super cluster v1

This version we implement a simple solution that leader region fetch data from other regions and only compute the result in leader region.

The mainly logic is:

  1. User fire a http PromQL request and go to one region we call it leader region
  2. Leader region partition by time range and dispatch the request to all the queriers in this region
  3. Each querier load data from local region also load data from other regions with same time range.
  4. Calculate the result back to region leader
  5. Region leader merge the result back to user

What we changed

  1. We added a gRPC service namedmetrics.data that allow you get metrics data from other region.
  2. Add super cluster check and load data from other region for querier.

API changes

POST /api/{org}/prometheus/api/v1/query_range

Form paramaters

Field descriptions:

FieldTypeDescription
querystringPromQL expression
startstringStart timestamp, inclusive (RFC3339 or Unix timestamp)
endstringEnd timestamp, inclusive (RFC3339 or Unix timestamp)
stepstringQuery resolution step width in duration format (e.g.,15s,1m) or float number of seconds
timeoutstringEvaluation timeout (e.g.,30s,1m)
use_cachebooleanWhether to use cache
*use_streamingbooleanWhether to use streaming output
*search_typestringSearch event type
*regionsstring[]Regions to query. Default: all regions. Use, for multiple region, e.g.c1,c2
*clustersstring[]Clusters to query. Default: all clusters. Use, for multiple region, e.g.c1,c2

* new parameter

greptile-apps[bot] reacted with thumbs up emoji
@hengfeiyanghengfeiyang marked this pull request as draftNovember 21, 2025 07:31
@hengfeiyanghengfeiyang marked this pull request as ready for reviewNovember 27, 2025 05:09
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

Greptile Overview

Greptile Summary

implemented metrics super cluster v1 support that enables cross-region query aggregation by having each querier fetch data from both local and remote regions, then computing results in the leader region

Key Changes

  • added new gRPCMetrics.Data streaming endpoint that allows queriers to fetch metrics data from other regions
  • introducedQueryContext struct to encapsulate query metadata (org_id,trace_id,is_super_cluster, etc.) for cleaner parameter passing throughout the codebase
  • refactored PromQL engine to spawn async tasks that fetch remote region data in parallel with local data loading when super cluster is enabled
  • addedis_super_cluster flag to protobuf definitions and HTTP request handling
  • improved database table operations to handle unique constraint violations gracefully in concurrent scenarios

Issues Found

  • potential panic in gRPC handler whenquery field isNone (line 90 insrc/handler/grpc/request/metrics/querier.rs)
  • import style inconsistency insrc/infra/src/table/organizations.rs (doesn't follow custom instruction about consolidating imports)

Confidence Score: 4/5

  • This PR is safe to merge with one critical fix needed for the potential panic in the gRPC handler
  • The implementation is well-structured with proper error handling, metrics tracking, and follows Rust best practices. The super cluster logic is cleanly separated with feature flags and only executes in enterprise mode. However, there's one critical issue whereunwrap() could cause a panic if the query field isNone. The database unique constraint handling improvements are a nice defensive programming addition. Once the panic issue is fixed, this should be safe to deploy.
  • Pay close attention tosrc/handler/grpc/request/metrics/querier.rs line 90 - fix the potential panic before merging

Important Files Changed

File Analysis

FilenameScoreOverview
src/service/promql/engine.rs4/5Refactored to support super cluster by addingQueryContext struct containing query metadata, spawning async tasks to fetch data from remote regions, and merging results with local data
src/service/promql/search/grpc/mod.rs5/5Added newdata streaming endpoint that enables remote regions to fetch metrics data, supports efficient data transfer for super cluster queries
src/handler/grpc/request/metrics/querier.rs5/5Implemented gRPC server-side handlers forMetrics.Query andMetrics.Data endpoints with proper error handling and metrics tracking
src/proto/proto/cluster/metrics.proto5/5Addedis_super_cluster flag andData streaming RPC endpoint, plusquery_data andlabel_selector fields to support super cluster feature
src/handler/http/request/promql/mod.rs4/5Added super cluster enablement check based on enterprise config, passesis_super_cluster flag to search layer
src/config/src/meta/promql/value.rs5/5AddedQueryContext struct to encapsulate query metadata (trace_id, org_id, super cluster flag, etc.) for cleaner parameter passing

Sequence Diagram

sequenceDiagram    participant User    participant LeaderRegion as Leader Region (HTTP Handler)    participant Querier1 as Querier (Region 1)    participant Querier2 as Querier (Region 2)    participant LocalStorage as Local Storage    participant RemoteRegion as Remote Region (gRPC)        User->>LeaderRegion: POST /api/{org}/prometheus/api/v1/query_range    Note over User,LeaderRegion: Query params: query, start, end, step, regions, clusters        LeaderRegion->>LeaderRegion: Parse parameters and check super_cluster enabled    LeaderRegion->>LeaderRegion: Partition time range by querier count        par Parallel Querier Dispatch        LeaderRegion->>Querier1: gRPC Metrics.Query (time partition 1)        LeaderRegion->>Querier2: gRPC Metrics.Query (time partition 2)    end        par Each Querier Processing        Querier1->>LocalStorage: Load local region data        LocalStorage-->>Querier1: Local metrics data                alt Super Cluster Enabled            Querier1->>RemoteRegion: gRPC Metrics.Data (same time range)            RemoteRegion->>RemoteRegion: Query metrics from remote region            RemoteRegion-->>Querier1: Stream metrics data            Querier1->>Querier1: Merge local + remote data        end                Querier1->>Querier1: Execute PromQL computation        Querier1-->>LeaderRegion: Return computed results                Querier2->>LocalStorage: Load local region data        LocalStorage-->>Querier2: Local metrics data                alt Super Cluster Enabled            Querier2->>RemoteRegion: gRPC Metrics.Data (same time range)            RemoteRegion-->>Querier2: Stream metrics data            Querier2->>Querier2: Merge local + remote data        end                Querier2->>Querier2: Execute PromQL computation        Querier2-->>LeaderRegion: Return computed results    end        LeaderRegion->>LeaderRegion: Merge results from all queriers    LeaderRegion->>LeaderRegion: Apply final aggregations    LeaderRegion-->>User: Return final PromQL result
Loading

Copy link
Contributor

@greptile-appsgreptile-appsbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

18 files reviewed, 2 comments

Edit Code Review Agent Settings |Greptile

Copy link
Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull request overview

This PR implements super cluster support for metrics queries, enabling a leader region to fetch and aggregate data from multiple regions. The implementation follows a distributed query pattern where the leader region partitions requests by time range, dispatches them to queriers across all regions, and merges the results.

Key changes:

  • Added new gRPC streaming servicemetrics.data for cross-region data retrieval
  • IntroducedQueryContext struct to encapsulate query execution parameters
  • Added API parameters:search_type,regions, andclusters for region/cluster selection
  • Migrated fromstd::collections::HashSet tohashbrown::HashSet across promql modules

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file
FileDescription
src/service/search/super_cluster/leader.rsRenamed variables from "nodes" to "clusters" for clarity in super cluster context
src/service/promql/search/mod.rsRefactored cache logic to use booleanuse_cache instead of negatedcache_disabled; added job ID generation from trace_id
src/service/promql/search/grpc/mod.rsAdded newdata() function for streaming metrics responses with time-range partitioning
src/service/promql/search/grpc/wal.rsChangedlabel_selector parameter fromOption<HashSet> toHashSet
src/service/promql/engine.rsIntegrated super cluster data loading with local cluster data; refactored to useQueryContext; updated all tests
src/service/promql/exec.rsRefactoredPromqlContext to use newQueryContext struct for better parameter organization
src/service/promql/utils.rsSimplifiedapply_label_selector to acceptHashSet directly instead ofOption<HashSet>
src/service/promql/mod.rsAdded new fields toMetricsQueryRequest for super cluster support
src/service/alerts/mod.rsUpdated alert evaluation to check for super cluster configuration
src/proto/proto/cluster/metrics.protoAdded new gRPCData streaming method and new fields for super cluster configuration
src/proto/src/generated/cluster.rsGenerated code from proto changes
src/handler/grpc/request/metrics/querier.rsImplemented newdata() gRPC method for streaming metrics data
src/handler/grpc/mod.rsUpdated conversion logic forMetricsQueryRequest
src/handler/http/request/promql/mod.rsAdded super cluster detection and new API parameters handling
src/config/src/meta/promql/value.rsAddedQueryContext struct for query execution parameters
src/config/src/meta/promql/mod.rsAdded custom deserializer for comma-separated or array regions/clusters parameters
src/config/src/meta/cluster.rsAddedis_local() method toNodeInfo trait
src/config/src/cluster.rsRefactored to extractget_local_http_addr() andget_local_grpc_addr() helper functions
src/common/infra/cluster/nats.rsUsed new helper functions for consistent address generation
src/infra/src/table/users.rsMade user inserts idempotent by treating unique constraint violations as success
src/infra/src/table/organizations.rsMade organization inserts idempotent by treating unique constraint violations as success
src/infra/src/table/org_users.rsMade org_user inserts idempotent by treating unique constraint violations as success

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

@hengfeiyanghengfeiyang added the Needs-TestingNeeds-Testing labelNov 27, 2025
@hengfeiyanghengfeiyang merged commitfa0672a intomainNov 27, 2025
37 of 41 checks passed
@hengfeiyanghengfeiyang deleted the feat/metrics-supercluster branchNovember 27, 2025 12:39
@hengfeiyanghengfeiyang mentioned this pull requestDec 4, 2025
16 tasks
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

Copilot code reviewCopilotCopilot left review comments

@haohuaijinhaohuaijinhaohuaijin approved these changes

+1 more reviewer

@greptile-appsgreptile-apps[bot]greptile-apps[bot] left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@hengfeiyang@haohuaijin

[8]ページ先頭

©2009-2025 Movatter.jp