- Notifications
You must be signed in to change notification settings - Fork321
perf: DB-API uses more efficientquery_and_wait when no job ID is provided#1747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…_is_completely_cached
…_is_completely_cached
…or` of resultsSet the `QUERY_PREVIEW_ENABLED=TRUE` environment variable to use this with thenew JOB_CREATION_OPTIONAL mode (currently in preview).
tswast commentedDec 12, 2023
I tested this manually in ipython as well, since it seems we don't have much (any?) system tests / samples in this repo that test this functionality. # In [1]:importgoogle.cloud.bigquery.dbapiasbqdbconn=bqdb.connect()cur=conn.cursor()cur.execute("SELECT 1")cur.fetchall()# Out[1]: [Row((1,), {'f0_': 0})]# In [2]:cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[2]: False# In [4]:cur.execute("SELECT name, SUM(`number`) AS total FROM `bigquery-public-data.usa_names.usa_1910_2013` GROUP BY name")cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[4]: False# In [5]:r=cur.fetchall()len(r)# Out[5]: 29828# In [7]:cur.execute("SELECT name, number, year FROM `bigquery-public-data.usa_names.usa_1910_2013`")cur._query_rows._should_use_bqstorage(None,create_bqstorage_client=True)# Out[7]: True# In [8]:r=cur.fetchall()len(r)# Out[8]: 5552452 I have also tested these same queries with SQLAlchemy to make sure this doesn't somehow break the connector. # In [1]:fromsqlalchemyimport*fromsqlalchemy.engineimportcreate_enginefromsqlalchemy.schemaimport*engine=create_engine('bigquery://swast-scratch')# In [2]:table=Table('usa_1910_2013',MetaData(bind=engine),autoload=True,schema='bigquery-public-data.usa_names',)select([func.count('*')],from_obj=table).scalar()# Out[2]: 5552452# In[3]:len(select( [table.c.name,func.sum(table.c.number).label('total')],from_obj=table).group_by(table.c.name).execute().all())# Out[3]: 29828# In[4]:len(select( [table.c.name,table.c.number,table.c.year],from_obj=table).execute().all())# Out[4]: 5552452 For sanity, I checked that there is a speedup when using this change: After this change: # In [1]:importgoogle.cloud.bigquery.dbapiasbqdbconn=bqdb.connect()cur=conn.cursor()# In [2]:%%timeit-n10-r10cur.execute("SELECT 1")r=cur.fetchall()# Out [2]:# 319 ms ± 19.5 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)# In [3]:%%timeit-n5-r4cur.execute("SELECT name, SUM(`number`) AS total FROM `bigquery-public-data.usa_names.usa_1910_2013` GROUP BY name")cur.fetchall()# Out [3]:# 1.63 s ± 80.3 ms per loop (mean ± std. dev. of 4 runs, 5 loops each) Before this change: This means that small query results (SELECT 1) have a (1.18 / 0.319) = 3.7x speedup! For medium-sized results/queries, this is less dramatic at (1.67 / 1.63) = 1.2x speedup and not statistically significant. |
| """ | ||
| ifself._first_page_responseisNone: | ||
| if ( | ||
| nothasattr(self,"_first_page_response") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Because we set at line 1591self._first_page_response = first_page_response, this attribute will always exist? Maybe we can check whether the value is None or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This also was needed for some tests where we have a mock row iterator but want to test with a real implementation of this method.
| ifself.next_page_tokenisnotNone: | ||
| # The developer has already started paging through results if | ||
| # next_page_token is set. | ||
| ifhasattr(self,"next_page_token")andself.next_page_tokenisnotNone: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just for my education, it looks like attributenext_page_token is inherited from "grandparent" classIterator from the core library, whichcreates this attribute at init. Is it necessary to check whether this attribute exist or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This was purely for some failing unit tests where this superclass was mocked out.
Linchin commentedDec 17, 2023
Thank you Tim for the timely fix! LGTM, except for some nits. |
chalmerlowe left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes#1745 🦕