Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: add data model for client side metrics#1187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
daniel-sanche wants to merge106 commits intomain
base:main
Choose a base branch
Loading
fromcsm_1_data_model

Conversation

@daniel-sanche
Copy link
Contributor

@daniel-sanchedaniel-sanche commentedAug 11, 2025
edited
Loading

Blocked on#1206

This PR revives#923, which was de-priotirized to work on the sync client. This PR brings it back, working with both async and sync. It also adds a grpc interceptor, as an improved way to capture metadata across both clients


Design

The main architecture looks like this:

300137129-bebbb05a-20f0-45c2-9d38-e95a314edf64 drawio (1)

Most of the work is done by the ActiveOperationMetric class, which is instantiated with each rpc call, and updated through the lifecycle of the call. When the rpc is complete, it will callon_operation_complete andon_attempt_complete on the MetricsHandler, which can then log the completed data into OpenTelemetry (or theoretically, other locations if needed)

Note that there are separate classes for active vs completed metrics (ActiveOperationMetric,ActiveAttemptMetric,CompletedOperationMetric,CompletedAttemptMetric). This is so that we can keep fields mutable and optional while the request is ongoing, but pass down static immutable copies once the attempt is completed and no new data is coming

daniel-sancheand others added30 commitsJuly 25, 2025 16:12
@daniel-sanchedaniel-sanche added the kokoro:force-runAdd this label to force Kokoro to re-run the tests. labelNov 26, 2025
@yoshi-kokoroyoshi-kokoro removed the kokoro:force-runAdd this label to force Kokoro to re-run the tests. labelNov 26, 2025
@daniel-sanchedaniel-sanche added the kokoro:force-runAdd this label to force Kokoro to re-run the tests. labelNov 26, 2025
@yoshi-kokoroyoshi-kokoro removed the kokoro:force-runAdd this label to force Kokoro to re-run the tests. labelNov 26, 2025
@daniel-sanchedaniel-sanche marked this pull request as ready for reviewNovember 27, 2025 00:57
@daniel-sanche
Copy link
ContributorAuthor

Before merging, we should re-run the benchmarking code to make sure we are satisfied with the performance


# default values for zone and cluster data, if not captured
DEFAULT_ZONE="global"
DEFAULT_CLUSTER_ID="unspecified"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
DEFAULT_CLUSTER_ID="unspecified"
DEFAULT_CLUSTER_ID="<unspecified>"

DEFAULT_CLUSTER_ID="unspecified"

# keys for parsing metadata blobs
BIGTABLE_METADATA_KEY="x-goog-ext-425905942-bin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe a more descriptive name like BIGTABLE_LOCATION_METADATA_KEY?

classOperationType(Enum):
"""Enum for the type of operation being performed."""

READ_ROWS="ReadRows"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

there should also be a READ_ROW so we know if it's a point read or a scan

MUTATE_ROW="MutateRow"
CHECK_AND_MUTATE="CheckAndMutateRow"
READ_MODIFY_WRITE="ReadModifyWriteRow"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

how about BulkMutateRows for write batcher?

backoff_before_attempt_ns:int=0
# time waiting on grpc channel, in nanoseconds
# TODO: capture grpc_throttling_time
grpc_throttling_time_ns:int=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fyi: we realized that in java this metric also doesn't capture the time a request queued on the channel. So if it's hard to get it in python we can skip it.

op_type=self.op_type,
uuid=self.uuid,
completed_attempts=self.completed_attempts,
duration_ns=time.monotonic_ns()-self.start_time_ns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

same here, can we add a sanity check to make sure it's >= 0 ( or if its negative use 0 ) so that in case there's a bug in the code csm won't break the client.


@CrossSync.convert
asyncdefintercept_unary_unary(self,continuation,client_call_details,request):
@_with_active_operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

where is this called? Can you point me to the code location? It feels a bit weird that starting an attempt is called from an interceptor

def__init__(self,**kwargs):
pass

defon_operation_complete(self,op:CompletedOperationMetric)->None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

how is on_operation_complete called vs end_with_status in ActiveOperationMetric?

# record metadata from failed rpc
ifisinstance(exc,GoogleAPICallError)andexc.errors:
rpc_error=exc.errors[-1]
metadata=list(rpc_error.trailing_metadata())+list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

should this call metrics_interceptor._get_metadata()?

# record ending attempt for timeout failures
attempt_exc=exc_list[-1]
_track_retryable_error(operation)(attempt_exc)
operation.end_with_status(source_exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

where is end_with_success called?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@mutianfmutianfmutianf left review comments

At least 1 approving review is required to merge this pull request.

Assignees

@mutianfmutianf

Labels

api: bigtableIssues related to the googleapis/python-bigtable API.size: xlPull request size is extra large.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@daniel-sanche@mutianf@yoshi-kokoro@vermas2012

[8]ページ先頭

©2009-2025 Movatter.jp