- Notifications
You must be signed in to change notification settings - Fork322
perf: usejobs.getQueryResults to download result sets#347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
tswast commentedOct 27, 2020
Based on#341 |
google/cloud/bigquery/job.py Outdated
| ) | ||
| self._query_results=None | ||
| self._get_query_results_kwargs= {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this need to be a thread-local variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually, the cached query results might need to be thread-local too. Imagine if two threads calledresult with different starting indexes and/or max results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We'll also need some logic like
to see if we can use the cached page ifresult is called more than once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Done in latest commit.
Since `getQueryResults` was already used to wait for the job to finish,this avoids an additional call to `tabledata.list`. The first page ofresults are cached in-memory.Additional changes will come in the future to avoid calling the BQStorage API when the cached results contain the full result set.
7364196 to983c8d2CompareAlso, move to thread-local variables for values that wereintended to track parameters across methods.
startIndex is no longer passed to the iteratorIt is used in the initial (cached) call togetQueryResults
| Iterator of row data | ||
| :class:`~google.cloud.bigquery.table.Row`-s. | ||
| """ | ||
| row_iterator=RowIterator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Be sure to populate extra args with the field projection. We only need rows and page token.
tswast commentedNov 3, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Per our discussion, I'll be splitting this into 2 PRs:
I'll base them on the refactoring to split up the giant job module here:#361 |
Uh oh!
There was an error while loading.Please reload this page.
Since
getQueryResultswas already used to wait for the job to finish,this avoids an additional call to
tabledata.list. The first page ofresults are cached in-memory.
Additional changes will come in the future to avoid calling the BQ
Storage API when the cached results contain the full result set.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Towards#362