- Notifications
You must be signed in to change notification settings - Fork321
feat: addjob_id,location,project, andquery_id properties onRowIterator#1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…on `RowIterator`These can be used to recover the original job metadata when `RowIterator` isthe result of a `QueryJob`.
Uh oh!
There was an error while loading.Please reload this page.
google/cloud/bigquery/table.py Outdated
| bqstorage_download=functools.partial( | ||
| _pandas_helpers.download_arrow_bqstorage, | ||
| self._project, | ||
| self._bqstorage_project, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks like we now have two possible project values, does this mean aRowIterator can correspond to different projects for storage APIs compared to other APIs?
| @property | ||
| def_bqstorage_project(self)->Optional[str]: | ||
| """GCP Project ID where BQ Storage API will bill to (if applicable).""" | ||
| client=self.client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder what's the difference betweenself.client.project andself._project. When the client isn't of storage type, doesself.client.project still reflect the storage project id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The difference in other contexts is usually described as "billing project" versus "data project". I've renamed this private property to reflect that. This is easiest to understand in the context of public datasets.
I have permission to query tables in thebigquery-public-data project and even download the data in bulk with the BigQuery Storage Read API, but Idon't have permission to run queries in that project or start a BigQuery Storage Read API session from that project. Instead, I have start the query or bq storage read session in my own project and reference the tables in the other project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Wow I see! soself._project is the project for the data/table, andself.client.project a.k.a. billing project is the user's own project. So I guess this billing project doesn't just apply to storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's true, though for queries we're usingjobs.getQueryResults, so the project will always be the billing project since that's where temp results for queries are stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
tabledata.list is free, but technically we should be overriding thex-goog-user-project header so that the quota is charged to the billing project. (I don't think we're actually doing that, though)
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>
| deftest_job_id_missing(self): | ||
| rows=self._make_one() | ||
| self.assertIsNone(rows.job_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just curious about testing philosophy.
For this collection of tests that are pretty much the same (i.e. all the tests that .assertIsNone()) is there a benefit in your mind to having each test broken out separately versus providing a parameterized list of inputs and expected outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I matched the existing pattern in this test. I do think we could probably combine a lot of these into a single parametrized test.
I worry slightly about the requiredgetattr(property_name) being too complex for new contributors to understand. In general tests should avoid anything even as complex as an if statement.https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html In this case,getattr is a grey area. I wouldn't necessarily count it as "logic"
…on `RowIterator` (googleapis#1733)* feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator`These can be used to recover the original job metadata when `RowIterator` isthe result of a `QueryJob`.* rename bqstorage_project to billing project* Update google/cloud/bigquery/table.pyCo-authored-by: Lingqing Gan <lingqing.gan@gmail.com>---------Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>
Uh oh!
There was an error while loading.Please reload this page.
These can be used to recover the original job metadata when
RowIteratoris the result of aQueryJob. This will be especially important after#1722 wherequery_and_wait()won't return an actualQueryJobobject.Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Towards internal issue 305260214 and related to#589
🦕