NotificationsYou must be signed in to change notification settings
Fork321
Star786

feat: add`job_id`,`location`,`project`, and`query_id` properties on`RowIterator`#1733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

Linchin merged 3 commits intomainfromb305260214-rowiterator-job-metadata

Nov 18, 2023

Merged

feat: add`job_id`,`location`,`project`, and`query_id` properties on`RowIterator`#1733

Linchin merged 3 commits intomainfromb305260214-rowiterator-job-metadata

Nov 18, 2023

Conversation

Copy link

Contributor

tswast commentedNov 17, 2023•
edited
Loading

These can be used to recover the original job metadata whenRowIterator is the result of aQueryJob. This will be especially important after#1722 wherequery_and_wait() won't return an actualQueryJob object.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Towards internal issue 305260214 and related to#589
🦕

feat: addjob_id,location,project, andquery_id properties …

1f9dd1e

…on `RowIterator`These can be used to recover the original job metadata when `RowIterator` isthe result of a `QueryJob`.

tswast requested review froma team ascode owners

November 17, 2023 20:59

tswast requested a review fromfarhan0102

November 17, 2023 20:59

product-auto-labelbot added size: m

Pull request size is medium.

api: bigqueryIssues related to the googleapis/python-bigquery API. labels

Nov 17, 2023

tswast requested review fromLinchin andchalmerlowe and removed request forfarhan0102

November 17, 2023 21:00

Linchin reviewed

Nov 17, 2023

View reviewed changes

google/cloud/bigquery/job/query.pyShow resolvedHide resolved

Linchin reviewed

Nov 17, 2023

View reviewed changes

google/cloud/bigquery/table.py Outdated

		bqstorage_download=functools.partial(
		_pandas_helpers.download_arrow_bqstorage,
		self._project,
		self._bqstorage_project,

Copy link

Contributor

LinchinNov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks like we now have two possible project values, does this mean aRowIterator can correspond to different projects for storage APIs compared to other APIs?

Linchin reviewed

Nov 17, 2023

View reviewed changes

google/cloud/bigquery/table.py

		@property
		def_bqstorage_project(self)->Optional[str]:
		"""GCP Project ID where BQ Storage API will bill to (if applicable)."""
		client=self.client

Copy link

Contributor

LinchinNov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder what's the difference betweenself.client.project andself._project. When the client isn't of storage type, doesself.client.project still reflect the storage project id?

Copy link

ContributorAuthor

tswastNov 17, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The difference in other contexts is usually described as "billing project" versus "data project". I've renamed this private property to reflect that. This is easiest to understand in the context of public datasets.

I have permission to query tables in thebigquery-public-data project and even download the data in bulk with the BigQuery Storage Read API, but Idon't have permission to run queries in that project or start a BigQuery Storage Read API session from that project. Instead, I have start the query or bq storage read session in my own project and reference the tables in the other project.

Copy link

Contributor

LinchinNov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wow I see! soself._project is the project for the data/table, andself.client.project a.k.a. billing project is the user's own project. So I guess this billing project doesn't just apply to storage?

Copy link

ContributorAuthor

tswastNov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's true, though for queries we're usingjobs.getQueryResults, so the project will always be the billing project since that's where temp results for queries are stored.

Copy link

ContributorAuthor

tswastNov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

tabledata.list is free, but technically we should be overriding thex-goog-user-project header so that the quota is charged to the billing project. (I don't think we're actually doing that, though)

rename bqstorage_project to billing project

18abf49

Linchin reviewed

Nov 17, 2023

View reviewed changes

google/cloud/bigquery/table.py OutdatedShow resolvedHide resolved

Update google/cloud/bigquery/table.py

1a07c4e

Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>

Linchin approved these changes

Nov 18, 2023

View reviewed changes

Linchin merged commit494f275 intomain

Nov 18, 2023

Linchin deleted the b305260214-rowiterator-job-metadata branch

November 18, 2023 00:44

release-pleasebot mentioned this pull request

Nov 18, 2023

chore(main): release 3.14.0#1709

Merged

chalmerlowe reviewed

Nov 19, 2023

View reviewed changes

tests/unit/test_table.py


		deftest_job_id_missing(self):
		rows=self._make_one()
		self.assertIsNone(rows.job_id)

Copy link

Collaborator

chalmerloweNov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@tswast

Just curious about testing philosophy.
For this collection of tests that are pretty much the same (i.e. all the tests that .assertIsNone()) is there a benefit in your mind to having each test broken out separately versus providing a parameterized list of inputs and expected outputs?

Copy link

ContributorAuthor

tswastNov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I matched the existing pattern in this test. I do think we could probably combine a lot of these into a single parametrized test.

I worry slightly about the requiredgetattr(property_name) being too complex for new contributors to understand. In general tests should avoid anything even as complex as an if statement.https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html In this case,getattr is a grey area. I wouldn't necessarily count it as "logic"

kiraksi pushed a commit to kiraksi/python-bigquery that referenced this pull request

Nov 20, 2023

feat: addjob_id,location,project, andquery_id properties …

570e17a

…on `RowIterator` (googleapis#1733)* feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator`These can be used to recover the original job metadata when `RowIterator` isthe result of a `QueryJob`.* rename bqstorage_project to billing project* Update google/cloud/bigquery/table.pyCo-authored-by: Lingqing Gan <lingqing.gan@gmail.com>---------Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>

This was referencedDec 5, 2023

December 04, 2023kitta65/bq-extension-vscode#257

Closed

December 04, 2023kitta65/prettier-plugin-bq#264

Closed

December 04, 2023kitta65/bq2cst#274

Closed

This was referencedDec 19, 2023

December 18, 2023kitta65/bq-extension-vscode#262

Closed

December 18, 2023kitta65/prettier-plugin-bq#269

Closed