- Notifications
You must be signed in to change notification settings - Fork322
fix: avoid unnecessary API call in QueryJob.result() when job is already finished#1900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
tswast commentedApr 12, 2024
| self.assertEqual(result.location,"asia-northeast1") | ||
| self.assertEqual(result.query_id,"xyz-abc") | ||
| deftest_result_invokes_begins(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As of#967 released in google-cloud-bigquery 3.0.0, the _begin method is no longer used for query jobs.
Uh oh!
There was an error while loading.Please reload this page.
| timeout=transport_timeout, | ||
| ) | ||
| def_done_or_raise(self,retry=DEFAULT_RETRY,timeout=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This was overridden because we wantedresult() from the superclass to calljobs.getQueryResults, not justjobs.get (i.e.job.reload() in Python). Now that we aren't using the superclass forresult(), this method is no longer necessary.
| try: | ||
| self.reload(retry=retry,timeout=transport_timeout) | ||
| exceptexceptions.GoogleAPIErrorasexc: | ||
| self.set_exception(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thought: We probably should have been callingset_exception based on the job status. Need to look into this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK. We are. 😅
| self.set_exception(exception) |
Which we call from_set_properties
| self._set_future_result() |
Which we call fromreload
| self._set_properties(api_response) |
| # wait for the query to finish. Unlike most methods, | ||
| # jobs.getQueryResults hangs as long as it can to ensure we | ||
| # know when the query has finished as soon as possible. | ||
| self._reload_query_results(retry=retry,timeout=timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Uh oh, ifjobs.getQueryResults fails because the job failed it can throw an exception butrestart_query_job will still beFalse.
But we don't wantrestart_query_job = True because sometimes this can raise an ambiguous exception such as quota exceeded, where we don't know if it's the job quota and it's a failed job or at a higher level (Google Frontend - GFE) where the job might actually still be running and/or succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This isn't the worst way to fail, but it'd be nice to do thejobs.get call above in case of an exception to get a chance at retrying this job if the job failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
chalmerlowe left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I am halfway through my review.
Releasing these comments for now.
Will come back to this to finish out my review as soon as possible.
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/job/query.py Outdated
| is_job_done=job_retry(is_job_done) | ||
| do_get_result() | ||
| # timeout can be `None` or an object from our superclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Whichsuperclass are we discussing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
google.api_core.future.polling.PollingFuture._DEFAULT_VALUE introduced ingoogleapis/python-api-core#462.
I've updated the comments with some more info as well as some things to consider in case we want to have a default value for timeout in future.
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/retry.py Outdated
| # rateLimitExceeded errors, which can be raised either by the Google load | ||
| # balancer or the BigQuery job server. | ||
| _DEFAULT_JOB_DEADLINE=3.0*_DEFAULT_RETRY_DEADLINE | ||
| _DEFAULT_JOB_DEADLINE=4.0*_DEFAULT_RETRY_DEADLINE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
What is the purpose of using 4.0 here?
Can we get a comment indicating why 4.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Updated to2.0 * (2.0 * _DEFAULT_RETRY_DEADLINE) and added some explanation both here and inQueryJob.result().
Note: This still only gets us 1 query retry in the face of the problematic ambiguous error codes fromjobs.getQueryResults() but that's better than the nothing that we were actually getting before in some cases. I don't feel comfortable bumping this much further, though maybe 3.0 * 2.0 * _DEFAULT_RETRY_DEADLINE would be slightly less arbitrary at 1 hour?
Uh oh!
There was an error while loading.Please reload this page.
| } | ||
| conn=make_connection( | ||
| query_resource,query_resource_done,job_resource_done,query_page_resource | ||
| job_resource, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not sure I am tracking the relationship between the make connection inputs versus the assert_has_calls checks.
Can you explain how these tests are supposed to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
make_connection is a convention ingoogle-cloud-bigquery unit tests that actually predates our use of the "mock" package. It mocks out the responses to REST API calls, previously with a fake implementation of our "Connection" class from the_http module and now with a true mock object. For every quest that our test makes, there should be a corresponding response. As withMock.side_effect, any exceptions in this list will be raised, instead.
I'm guessing your question also relates to "Why thisparticular set of requests / responses?". I've added some comments explaining why we're expecting this sequence of API calls. I've also updated this test to more explicitly check for a possible cause of customer issue b/332850329.
| connection.api_request.assert_has_calls( | ||
| [query_results_call,query_results_call,reload_call] | ||
| [ | ||
| reload_call, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Same thing here. Can I get some clarity on what we are doing and looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Added some explanation here as well as above in themake_connection() call.
Co-authored-by: Chalmer Lowe <chalmerlowe@google.com>
chalmerlowe left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
Uh oh!
There was an error while loading.Please reload this page.
tests/unit/job/test_query.py Outdated
| job.result() | ||
| withfreezegun.freeze_time("1970-01-01 00:00:00",tick=False): | ||
| job.result(timeout=1.125) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is there are reason we are using such a specific number?1.125.
Can I get a comment here to let future me know why we picked this number?
| method="GET", | ||
| path=f"/projects/{self.PROJECT}/queries/{self.JOB_ID}", | ||
| query_params={ | ||
| "maxResults":0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
IsmaxResults of0 synonymous with asking for all results? OR is it really asking for zero results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We actually want 0 rows. If we omit this or ask for non-zero number of rows, thejobs.getQueryResults API can hang when the query has wide rows (many columns).
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Chalmer Lowe <chalmerlowe@google.com>
…ast-fix-query-retry
chalmerlowe left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for all the comments, etc.
Future me thanks you as well.
LGTM, APPROVED.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
BEGIN_COMMIT_OVERRIDE
perf: avoid unnecessary API call in
QueryJob.result()when job is already finished (#1900)fix: retry query jobs that fail even with ambiguous
jobs.getQueryResultsREST errors (#1903,#1900)END_COMMIT_OVERRIDE
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
query_and_waitfor lower-latency small queries python-bigquery-magics#15 since this loop for waiting for the query to finish will be easier to add a progress bar (if we decide that's needed)