- Notifications
You must be signed in to change notification settings - Fork321
feat: add default timeout for Client.get_job()#1935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
tswast left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Please add some tests where*Job.reload(timeout=None),*Job.done(timeout=None), and*Job.result(timeout=None) disable the timeouts forjobs.get.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/job/base.py Outdated
| extra_params["location"]=self.location | ||
| span_attributes= {"path":self.path} | ||
| kwargs:Dict[str,Any]= {} | ||
| iftype(timeout)isobject: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Let's add some comments explaining why we need to omit the timeout if we get_DEFAULT_VALUE. (Because we want to use the default API-level timeouts but wait indefinitely for the job to finish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I added explanation whykwargs is used here, but I feel likeQueryjob.result() is a better place to explain the default timeout behavior, so I added it in docstring there instead.
google/cloud/bigquery/job/base.py Outdated
| self, | ||
| retry:"retries.Retry"=DEFAULT_RETRY, | ||
| timeout:Optional[float]=None, | ||
| timeout:Optional[Union[float,object]]=PollingFuture._DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Done isn't waiting for the job to complete, so I wonder if we actually needPollingFuture._DEFAULT_VALUE here?
It seems more analogous toClient.get_job so maybe we can use the sameDEFAULT_GET_JOB_TIMEOUT, here? 🤔 We need to double-check that we aren't calling any other API requests like getQueryResults, if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
done() callsreload(), which in turn callsClient.get_job(). So I added the sentinel values along the pathresult() ->done() ->reload(). Indeed I realize we don't have to add sentinel fordone() orreload(), because it will be passed fromresult() anyway. As to getQueryResults, I don't think it's called fromdone() onward (it's used inresult()). I think either way (usingPollingFuture._DEFAULT_VALUE orDEFAULT_GET_JOB_TIMEOUT) it's effectively the same, so I think either way it's fine. Are there any other factors I'm not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd like to keep our type annotations as narrow as we can. My worry withPollingFuture._DEFAULT_VALUE is it opens it up toobject, which technically is anything (string, bytes, anything) in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Indeed, I ended up only using sentinel forresult(), and usedDEFAULT_GET_JOB_TIMEOUT for subsequent calls.
google/cloud/bigquery/job/query.py Outdated
| iftype(timeout)isobject: | ||
| timeout=None | ||
| get_job_timeout=PollingFuture._DEFAULT_VALUE | ||
| else: | ||
| get_job_timeout=timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not do the kwargs trick here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder if it's necessary - the purpose here is slightly different from when callingclient.get_job(). I'm only preserving the sentinel value for thejob.done() call, and use None as default for any other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
done, following later comments
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/job/base.py Outdated
| client=None, | ||
| retry:"retries.Retry"=DEFAULT_RETRY, | ||
| timeout:Optional[float]=None, | ||
| timeout:Optional[Union[float,object]]=POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd much rather this defaulted to the same timeout asget_job() and anywhere that callsreload(), we make sure we do the check forPOLLING_DEFAULT_VALUE before callingreload() to make sure we only pass infloat orNone or omit the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
done
google/cloud/bigquery/job/base.py Outdated
| self, | ||
| retry:"retries.Retry"=DEFAULT_RETRY, | ||
| timeout:Optional[float]=None, | ||
| timeout:Optional[Union[float,object]]=POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Same here. I don't think this needs to handlePOLLING_DEFAULT_VALUE since it's not waiting for the job to finish, just making at most 1 API call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
done
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/job/query.py Outdated
| # timeout to QueryJob.done(), and use None for the other timeouts. | ||
| iftype(timeout)isobject: | ||
| timeout=None | ||
| get_job_timeout=POLLING_DEFAULT_VALUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not set toDEFAULT_GET_JOB_TIMEOUT or do the kwargs trick to omit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
done
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
tswast left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Love it! Just a couple of typos to fix please before merging
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
When calling
QueryJob.result(),QueryJob.reload()is called in the background with an empty timeout value. This leadsQueryJobto waiting indefinitely ifQueryJob.reload()loses connection.This PR:
QueryJob.reload()to callClient.get_job()instead of calling API directlyClient.get_job()Fixes#1922 🦕