Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perf: DB-API uses more efficientquery_and_wait when no job ID is provided#1747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gcf-merge-on-green merged 58 commits intomainfromb1745-dbapi-query_and_wait
Dec 19, 2023

Conversation

@tswast
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes#1745 🦕

…or` of resultsSet the `QUERY_PREVIEW_ENABLED=TRUE` environment variable to use this with thenew JOB_CREATION_OPTIONAL mode (currently in preview).
@tswasttswast requested review fromLinchin andchalmerlowe and removed request forobada-abDecember 11, 2023 19:49
@tswast
Copy link
ContributorAuthor

I tested this manually in ipython as well, since it seems we don't have much (any?) system tests / samples in this repo that test this functionality.

# In [1]:importgoogle.cloud.bigquery.dbapiasbqdbconn=bqdb.connect()cur=conn.cursor()cur.execute("SELECT 1")cur.fetchall()# Out[1]: [Row((1,), {'f0_': 0})]# In [2]:cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[2]: False# In [4]:cur.execute("SELECT name, SUM(`number`) AS total FROM `bigquery-public-data.usa_names.usa_1910_2013` GROUP BY name")cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[4]: False# In [5]:r=cur.fetchall()len(r)# Out[5]: 29828# In [7]:cur.execute("SELECT name, number, year FROM `bigquery-public-data.usa_names.usa_1910_2013`")cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[7]: True# In [8]:r=cur.fetchall()len(r)# Out[8]: 5552452

I have also tested these same queries with SQLAlchemy to make sure this doesn't somehow break the connector.

# In [1]:fromsqlalchemyimport*fromsqlalchemy.engineimportcreate_enginefromsqlalchemy.schemaimport*engine=create_engine('bigquery://swast-scratch')# In [2]:table=Table('usa_1910_2013',MetaData(bind=engine),autoload=True,schema='bigquery-public-data.usa_names',)select([func.count('*')],from_obj=table).scalar()# Out[2]: 5552452# In[3]:len(select(    [table.c.name,func.sum(table.c.number).label('total')],from_obj=table).group_by(table.c.name).execute().all())# Out[3]: 29828# In[4]:len(select(    [table.c.name,table.c.number,table.c.year],from_obj=table).execute().all())# Out[4]: 5552452

For sanity, I checked that there is a speedup when using this change:

After this change:

# In [1]:importgoogle.cloud.bigquery.dbapiasbqdbconn=bqdb.connect()cur=conn.cursor()# In [2]:%%timeit-n10-r10cur.execute("SELECT 1")r=cur.fetchall()# Out [2]:# 319 ms ± 19.5 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)# In [3]:%%timeit-n5-r4cur.execute("SELECT name, SUM(`number`) AS total FROM `bigquery-public-data.usa_names.usa_1910_2013` GROUP BY name")cur.fetchall()# Out [3]:# 1.63 s ± 80.3 ms per loop (mean ± std. dev. of 4 runs, 5 loops each)

Before this change:

# In [1]:import google.cloud.bigquery.dbapi as bqdbconn = bqdb.connect()cur = conn.cursor()# In [2]:%%timeit -n10 -r10cur.execute("SELECT 1")r = cur.fetchall()# Out [2]:# 1.18 s ± 73.2 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)# In [3]:%%timeit -n5 -r4cur.execute("SELECT name, SUM(`number`) AS total FROM `bigquery-public-data.usa_names.usa_1910_2013` GROUP BY name")cur.fetchall()# Out [3]:# 1.67 s ± 62.2 ms per loop (mean ± std. dev. of 4 runs, 5 loops each)

This means that small query results (SELECT 1) have a (1.18 / 0.319) = 3.7x speedup! For medium-sized results/queries, this is less dramatic at (1.67 / 1.63) = 1.2x speedup and not statistically significant.

"""
ifself._first_page_responseisNone:
if (
nothasattr(self,"_first_page_response")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Because we set at line 1591self._first_page_response = first_page_response, this attribute will always exist? Maybe we can check whether the value is None or not.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This also was needed for some tests where we have a mock row iterator but want to test with a real implementation of this method.

Linchin reacted with thumbs up emoji
ifself.next_page_tokenisnotNone:
# The developer has already started paging through results if
# next_page_token is set.
ifhasattr(self,"next_page_token")andself.next_page_tokenisnotNone:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just for my education, it looks like attributenext_page_token is inherited from "grandparent" classIterator from the core library, whichcreates this attribute at init. Is it necessary to check whether this attribute exist or not?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This was purely for some failing unit tests where this superclass was mocked out.

Linchin reacted with thumbs up emoji
@Linchin
Copy link
Contributor

Thank you Tim for the timely fix! LGTM, except for some nits.

@LinchinLinchin self-requested a reviewDecember 17, 2023 05:58
Copy link
Collaborator

@chalmerlowechalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@tswasttswast added the automergeMerge the pull request once unit tests and other checks pass. labelDec 19, 2023
@gcf-merge-on-greengcf-merge-on-greenbot merged commitd225a94 intomainDec 19, 2023
@gcf-merge-on-greengcf-merge-on-greenbot deleted the b1745-dbapi-query_and_wait branchDecember 19, 2023 22:00
@gcf-merge-on-greengcf-merge-on-greenbot removed the automergeMerge the pull request once unit tests and other checks pass. labelDec 19, 2023
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@chalmerlowechalmerlowechalmerlowe approved these changes

@LinchinLinchinLinchin approved these changes

Assignees

No one assigned

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.size: lPull request size is large.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

dbapi: Skip storage client fetch when results cached

3 participants

@tswast@Linchin@chalmerlowe

[8]ページ先頭

©2009-2025 Movatter.jp