Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: avoid unnecessary API call in QueryJob.result() when job is already finished#1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
tswast merged 14 commits intomainfromtswast-fix-query-retry
Apr 18, 2024

Conversation

@tswast
Copy link
Contributor

@tswasttswast commentedApr 12, 2024
edited
Loading

BEGIN_COMMIT_OVERRIDE
perf: avoid unnecessary API call inQueryJob.result() when job is already finished (#1900)

fix: retry query jobs that fail even with ambiguousjobs.getQueryResults REST errors (#1903,#1900)
END_COMMIT_OVERRIDE

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@product-auto-labelproduct-auto-labelbot added size: mPull request size is medium. api: bigqueryIssues related to the googleapis/python-bigquery API. labelsApr 12, 2024
@tswasttswast changed the titlefix: avoid unnecessary API call in QueryJob.result() when job is alre…fix: avoid unnecessary API call in QueryJob.result() when job is already finishedApr 12, 2024
@tswast
Copy link
ContributorAuthor

CC@chalmerlowe

self.assertEqual(result.location,"asia-northeast1")
self.assertEqual(result.query_id,"xyz-abc")

deftest_result_invokes_begins(self):
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As of#967 released in google-cloud-bigquery 3.0.0, the _begin method is no longer used for query jobs.

@tswasttswast marked this pull request as ready for reviewApril 13, 2024 14:44
@tswasttswast requested review froma team ascode ownersApril 13, 2024 14:44
@tswasttswast requested a review fromobada-abApril 13, 2024 14:44
@product-auto-labelproduct-auto-labelbot added size: lPull request size is large. and removed size: mPull request size is medium. labelsApr 13, 2024
timeout=transport_timeout,
)

def_done_or_raise(self,retry=DEFAULT_RETRY,timeout=None):
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This was overridden because we wantedresult() from the superclass to calljobs.getQueryResults, not justjobs.get (i.e.job.reload() in Python). Now that we aren't using the superclass forresult(), this method is no longer necessary.

try:
self.reload(retry=retry,timeout=transport_timeout)
exceptexceptions.GoogleAPIErrorasexc:
self.set_exception(exc)
Copy link
ContributorAuthor

@tswasttswastApr 13, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thought: We probably should have been callingset_exception based on the job status. Need to look into this further.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

OK. We are. 😅

self.set_exception(exception)

Which we call from_set_properties

Which we call fromreload

self._set_properties(api_response)

@chalmerlowechalmerlowe self-assigned thisApr 15, 2024
# wait for the query to finish. Unlike most methods,
# jobs.getQueryResults hangs as long as it can to ensure we
# know when the query has finished as soon as possible.
self._reload_query_results(retry=retry,timeout=timeout)
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Uh oh, ifjobs.getQueryResults fails because the job failed it can throw an exception butrestart_query_job will still beFalse.

But we don't wantrestart_query_job = True because sometimes this can raise an ambiguous exception such as quota exceeded, where we don't know if it's the job quota and it's a failed job or at a higher level (Google Frontend - GFE) where the job might actually still be running and/or succeeded.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This isn't the worst way to fail, but it'd be nice to do thejobs.get call above in case of an exception to get a chance at retrying this job if the job failed.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Filed#1903 to track improvements to ambiguous errors.12fa9fb fixes an issue where we weren't actually retrying after an ambiguous failure even though we thought we were.

Copy link
Collaborator

@chalmerlowechalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am halfway through my review.
Releasing these comments for now.
Will come back to this to finish out my review as soon as possible.

is_job_done=job_retry(is_job_done)

do_get_result()
# timeout can be `None` or an object from our superclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Whichsuperclass are we discussing here?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

google.api_core.future.polling.PollingFuture._DEFAULT_VALUE introduced ingoogleapis/python-api-core#462.

I've updated the comments with some more info as well as some things to consider in case we want to have a default value for timeout in future.

# rateLimitExceeded errors, which can be raised either by the Google load
# balancer or the BigQuery job server.
_DEFAULT_JOB_DEADLINE=3.0*_DEFAULT_RETRY_DEADLINE
_DEFAULT_JOB_DEADLINE=4.0*_DEFAULT_RETRY_DEADLINE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the purpose of using 4.0 here?
Can we get a comment indicating why 4.0?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Updated to2.0 * (2.0 * _DEFAULT_RETRY_DEADLINE) and added some explanation both here and inQueryJob.result().

Note: This still only gets us 1 query retry in the face of the problematic ambiguous error codes fromjobs.getQueryResults() but that's better than the nothing that we were actually getting before in some cases. I don't feel comfortable bumping this much further, though maybe 3.0 * 2.0 * _DEFAULT_RETRY_DEADLINE would be slightly less arbitrary at 1 hour?

}
conn=make_connection(
query_resource,query_resource_done,job_resource_done,query_page_resource
job_resource,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure I am tracking the relationship between the make connection inputs versus the assert_has_calls checks.

Can you explain how these tests are supposed to work?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

make_connection is a convention ingoogle-cloud-bigquery unit tests that actually predates our use of the "mock" package. It mocks out the responses to REST API calls, previously with a fake implementation of our "Connection" class from the_http module and now with a true mock object. For every quest that our test makes, there should be a corresponding response. As withMock.side_effect, any exceptions in this list will be raised, instead.

I'm guessing your question also relates to "Why thisparticular set of requests / responses?". I've added some comments explaining why we're expecting this sequence of API calls. I've also updated this test to more explicitly check for a possible cause of customer issue b/332850329.

connection.api_request.assert_has_calls(
[query_results_call,query_results_call,reload_call]
[
reload_call,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same thing here. Can I get some clarity on what we are doing and looking for?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Added some explanation here as well as above in themake_connection() call.

Co-authored-by: Chalmer Lowe <chalmerlowe@google.com>
Copy link
Collaborator

@chalmerlowechalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM


job.result()
withfreezegun.freeze_time("1970-01-01 00:00:00",tick=False):
job.result(timeout=1.125)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is there are reason we are using such a specific number?
1.125.

Can I get a comment here to let future me know why we picked this number?

method="GET",
path=f"/projects/{self.PROJECT}/queries/{self.JOB_ID}",
query_params={
"maxResults":0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

IsmaxResults of0 synonymous with asking for all results? OR is it really asking for zero results?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We actually want 0 rows. If we omit this or ask for non-zero number of rows, thejobs.getQueryResults API can hang when the query has wide rows (many columns).

Copy link
Collaborator

@chalmerlowechalmerlowe left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for all the comments, etc.
Future me thanks you as well.

LGTM, APPROVED.

tswast reacted with heart emoji
@tswasttswast merged commit1367b58 intomainApr 18, 2024
@tswasttswast deleted the tswast-fix-query-retry branchApril 18, 2024 14:31
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@chalmerlowechalmerlowechalmerlowe approved these changes

@obada-abobada-abAwaiting requested review from obada-abobada-ab was automatically assigned from googleapis/api-bigquery

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.size: lPull request size is large.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@tswast@chalmerlowe@shollyman

[8]ページ先頭

©2009-2025 Movatter.jp