- Notifications
You must be signed in to change notification settings - Fork321
fix: retry 404 errors inClient.query(...)#2135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
| ifjob_retryisNone: | ||
| future=do_query() | ||
| else: | ||
| future=job_retry(do_query)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Note: even though I add the retry here, we'll still need the one inQueryJob, because BigQuery queries can be started successfully but still fail for a retriablle reason. By adding the retry here, we can retry the queries that fail for a retriable reason.
google/cloud/bigquery/retry.py Outdated
| # Per https://github.com/googleapis/python-bigquery/issues/2134, sometimes | ||
| # we get a 404 error. In this case, if we get this far, assume that the job | ||
| # doesn't actually exist and try again. | ||
| orisinstance(exc,exceptions.NotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thought: Does this make it so that the queries retry if the table is not found? Or is this only going to happen for query job not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
😢 It does hang / retry too much. Tested with
importgoogle.cloud.bigqueryclient=google.cloud.bigquery.Client()job=client.query("SELECT * FROM `swast-scratch.my_dataset.doesntexistnoreally`")job.result()
Need to play a similar trick as withget_job_retry, instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Fixed in latest commit.
tswast commentedFeb 19, 2025
I confirmed with my colleague that after at |
| get_job_retry=retry.with_predicate( | ||
| lambdaexc:isinstance(exc,core_exceptions.NotFound) | ||
| # Reference the original retry to avoid recursion. | ||
| orretry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@parthea Do you have any thoughts on how stable this will be? I couldn't find any public way to pull out the previous predicate from a Retry object when looking at thehttps://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_base.py code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
An alternative to consider: do something like I did with retry versus job_retry and introduce more parameters for job_insert_retry and job_get_retry or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Filedgoogleapis/python-api-core#796, in the meantime I'll pursue an alternative that doesn't require access to private attributes.
| returnTrue | ||
| # Reference the original job_retry to avoid recursion. | ||
| returnjob_retry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@parthea likewise here. I ended up using a slightly different pattern here due to the double nesting of retry objects.
tswast commentedFeb 19, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
system-3.12 failure: Edit: Actually, it does appear to be related. It's a different kind of 404 -- trying to query a table in a different location. I'll see if I can disambiguate that from the job 404. Edit2: fixed in8e20cb0. I use the message to disambiguate job not found from dataset/table not found based on the stack trace in#2134. System test passing locally (and much more quickly). |
tswast commentedFeb 20, 2025
Kokoro (https://btx.cloud.google.com/invocations/91aed90f-987e-4200-be14-09717b48923f/targets/cloud-devrel%2Fclient-libraries%2Fpython%2Fgoogleapis%2Fpython-bigquery%2Fpresubmit%2Fpresubmit/log) and system-3.12 and snippets tests are passing. Ready for review! |
tswast commentedFeb 20, 2025
Cover is failing: I'll see if I can update a unit test for this soon. |
tswast commentedFeb 21, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
| withpytest.raises(DataLoss,match="we lost your job, sorry"): | ||
| client.query("SELECT 1;",job_id=None) | ||
| deftest_query_job_rpc_fail_w_conflict_random_id_job_fetch_fails_no_retries(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Could you help me understand what this test is for? It feels like we are just verifying thatQueryJob._begin() andclient.get_job() are called.
Linchin left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you, lgtm.
c6d5f8a intomainUh oh!
There was an error while loading.Please reload this page.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes#2134 🦕