Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perf: usejobs.getQueryResults to download result sets#347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@tswast
Copy link
Contributor

@tswasttswast commentedOct 27, 2020
edited
Loading

SincegetQueryResults was already used to wait for the job to finish,
this avoids an additional call totabledata.list. The first page of
results are cached in-memory.

Additional changes will come in the future to avoid calling the BQ
Storage API when the cached results contain the full result set.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Towards#362

@google-clagoogle-clabot added the cla: yesThis human has signed the Contributor License Agreement. labelOct 27, 2020
@tswast
Copy link
ContributorAuthor

Based on#341

)

self._query_results=None
self._get_query_results_kwargs= {}
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this need to be a thread-local variable?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually, the cached query results might need to be thread-local too. Imagine if two threads calledresult with different starting indexes and/or max results.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We'll also need some logic like

https://github.com/googleapis/google-cloud-go/blob/925033712191bce44fa99eb117d6531106042272/bigquery/iterator.go#L314

to see if we can use the cached page ifresult is called more than once

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done in latest commit.

Since `getQueryResults` was already used to wait for the job to finish,this avoids an additional call to `tabledata.list`. The first page ofresults are cached in-memory.Additional changes will come in the future to avoid calling the BQStorage API when the cached results contain the full result set.
@tswasttswastforce-pushed theoptimized-query-getQueryResults branch from7364196 to983c8d2CompareOctober 29, 2020 19:15
Also, move to thread-local variables for values that wereintended to track parameters across methods.
@tswasttswast marked this pull request as ready for reviewOctober 30, 2020 21:40
@tswasttswast requested review froma team andshollymanOctober 30, 2020 21:40
startIndex is no longer passed to the iteratorIt is used in the initial (cached) call togetQueryResults
Iterator of row data
:class:`~google.cloud.bigquery.table.Row`-s.
"""
row_iterator=RowIterator(
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Be sure to populate extra args with the field projection. We only need rows and page token.

@tswasttswast added the do not mergeIndicates a pull request not ready for merge, due to either quality or timing. labelNov 2, 2020
@tswast
Copy link
ContributorAuthor

tswast commentedNov 3, 2020
edited
Loading

Per our discussion, I'll be splitting this into 2 PRs:

  1. CallgetQueryResults (no cache) fromRowIterator -- make sure to add a projection to exclude the schema and other irrelevant job stats.perf: usejobs.getQueryResults to download result sets #363
  2. Cache the first page of results.

I'll base them on the refactoring to split up the giant job module here:#361

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@shollymanshollymanAwaiting requested review from shollyman

Assignees

No one assigned

Labels

cla: yesThis human has signed the Contributor License Agreement.do not mergeIndicates a pull request not ready for merge, due to either quality or timing.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@tswast

[8]ページ先頭

©2009-2025 Movatter.jp