NotificationsYou must be signed in to change notification settings
Fork322
Star786

perf: use`jobs.getQueryResults` to download result sets#347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

tswast wants to merge8 commits intogoogleapis:masterfromtswast:optimized-query-getQueryResults

Closed

perf: use`jobs.getQueryResults` to download result sets#347

tswast wants to merge8 commits intogoogleapis:masterfromtswast:optimized-query-getQueryResults

Conversation

Copy link

Contributor

tswast commentedOct 27, 2020•
edited
Loading

SincegetQueryResults was already used to wait for the job to finish,
this avoids an additional call totabledata.list. The first page of
results are cached in-memory.

Additional changes will come in the future to avoid calling the BQ
Storage API when the cached results contain the full result set.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Towards#362

google-clabot added the cla: yesThis human has signed the Contributor License Agreement. label

Oct 27, 2020

Copy link

ContributorAuthor

tswast commentedOct 27, 2020

Based on#341

tswast commented

Oct 28, 2020

View reviewed changes

google/cloud/bigquery/job.py Outdated

		)

		self._query_results=None
		self._get_query_results_kwargs= {}

Copy link

ContributorAuthor

tswastOct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this need to be a thread-local variable?

Copy link

ContributorAuthor

tswastOct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually, the cached query results might need to be thread-local too. Imagine if two threads calledresult with different starting indexes and/or max results.

Copy link

ContributorAuthor

tswastOct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We'll also need some logic like

https://github.com/googleapis/google-cloud-go/blob/925033712191bce44fa99eb117d6531106042272/bigquery/iterator.go#L314

to see if we can use the cached page ifresult is called more than once

Copy link

ContributorAuthor

tswastOct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done in latest commit.

perf: usejobs.getQueryResults to download result sets

983c8d2

Since `getQueryResults` was already used to wait for the job to finish,this avoids an additional call to `tabledata.list`. The first page ofresults are cached in-memory.Additional changes will come in the future to avoid calling the BQStorage API when the cached results contain the full result set.

tswast force-pushed theoptimized-query-getQueryResults branch from7364196 to983c8d2Compare

October 29, 2020 19:15

fix: validate the query results cache before using

f52ed71

Also, move to thread-local variables for values that wereintended to track parameters across methods.

tswast marked this pull request as ready for review

October 30, 2020 21:40

tswast requested review froma team andshollyman

October 30, 2020 21:40

tswast added4 commits

November 2, 2020 09:44

Merge remote-tracking branch 'upstream/master' into optimized-query-g…

e149360

…etQueryResults

blacken. update dbapi to use thread local var

9b5920f

fix dbapi tests

07e6043

fix system test

af2e2cc

startIndex is no longer passed to the iteratorIt is used in the initial (cached) call togetQueryResults

tswast mentioned this pull request

Nov 2, 2020

refactor: split job.py and test_job.py#358

Closed

tswast added2 commits

November 2, 2020 10:50

add unit tests for missing coverage

540d530

blacken

6e83fbf

tswast commented

Nov 2, 2020

View reviewed changes

google/cloud/bigquery/client.py

		Iterator of row data
		:class:`~google.cloud.bigquery.table.Row`-s.
		"""
		row_iterator=RowIterator(

Copy link

ContributorAuthor

tswastNov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Be sure to populate extra args with the field projection. We only need rows and page token.

tswast added the do not mergeIndicates a pull request not ready for merge, due to either quality or timing. label

Nov 2, 2020

Copy link

ContributorAuthor

tswast commentedNov 3, 2020•
edited
Loading

Per our discussion, I'll be splitting this into 2 PRs:

CallgetQueryResults (no cache) fromRowIterator -- make sure to add a projection to exclude the schema and other irrelevant job stats.perf: usejobs.getQueryResults to download result sets #363
Cache the first page of results.

I'll base them on the refactoring to split up the giant job module here:#361

tswast closed this

Nov 4, 2020

tswast mentioned this pull request

Nov 5, 2020

perf: cache first page ofjobs.getQueryResults rows#374

Merged

4 tasks

Labels

cla: yes

This human has signed the Contributor License Agreement.

do not merge

Indicates a pull request not ready for merge, due to either quality or timing.

Movatterモバイル変換

perf: usejobs.getQueryResults to download result sets#347

perf: usejobs.getQueryResults to download result sets#347

Uh oh!

Conversation

tswast commentedOct 27, 2020• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

tswast commentedOct 27, 2020

Uh oh!

tswastOct 28, 2020

Choose a reason for hiding this comment

Uh oh!

tswastOct 28, 2020

Choose a reason for hiding this comment

Uh oh!

tswastOct 28, 2020

Choose a reason for hiding this comment

Uh oh!

tswastOct 30, 2020

Choose a reason for hiding this comment

Uh oh!

tswastNov 2, 2020

Choose a reason for hiding this comment

Uh oh!

tswast commentedNov 3, 2020• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: use`jobs.getQueryResults` to download result sets#347

perf: use`jobs.getQueryResults` to download result sets#347

tswast commentedOct 27, 2020•
edited
Loading

tswast commentedNov 3, 2020•
edited
Loading