Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix!: use nullableInt64 andboolean dtypes into_dataframe#786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@tswast
Copy link
Contributor

@tswasttswast commentedJul 20, 2021
edited
Loading

To override this behavior, specify the types for the desired columns with the
dtype argument.

BREAKING CHANGE: uses Int64 type by default to avoid loss-of-precision in results with large integer values

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixeshttps://issuetracker.google.com/144712110 🦕
Fixes#793

…frame`To override this behavior, specify the types for the desired columns with the`dtype` argument.
@tswasttswast requested a review froma teamJuly 20, 2021 21:11
@tswasttswast requested a review froma team as acode ownerJuly 20, 2021 21:11
@tswasttswast requested review fromstephaniewang526 and removed request fora teamJuly 20, 2021 21:11
@product-auto-labelproduct-auto-labelbot added the api: bigqueryIssues related to the googleapis/python-bigquery API. labelJul 20, 2021
@google-clagoogle-clabot added the cla: yesThis human has signed the Contributor License Agreement. labelJul 20, 2021
@tswasttswast marked this pull request as draftJuly 20, 2021 21:11
@tswasttswast removed request fora team andstephaniewang526July 20, 2021 21:12
@tswast
Copy link
ContributorAuthor

I'll take a closer look at#776 before finishing this one, as it might mean fewer code paths to cover. I think the BQ Storage API will always be used forto_dataframe after that PR.

@tswast
Copy link
ContributorAuthor

I did a little bit of experimentation to see what the intermediatepyarrow.Table types are for both REST and BQ Storage API on all scalar types. They do align, so that's good. Also, floating points appear to be handled correctly, even with null values in the table.

It appearshttps://issuetracker.google.com/144712110 was fixed for FLOAT columns in#314 as of google-cloud-bigquery >= 2.2.0 (That was technically a breaking change [oops])

I might still keep this open so that we can have some explicit tests for different data types. Also, we're relying on PyArrow -> Pandas to pick the right data types, so maybe there's some dtype defaults we can help with still.

@tswasttswast changed the titlefeat!: use nullable types like float and Int64 by default into_dataframefeat!: use nullable Int64 dtype by default into_dataframeJul 21, 2021
@tswasttswast changed the titlefeat!: use nullable Int64 dtype by default into_dataframefix!: use nullable Int64 dtype by default into_dataframeJul 21, 2021
@plamutplamut added the semver: majorHint for users that this is an API breaking change. labelJul 27, 2021
@tswasttswast changed the base branch frommaster tov3.x.xJuly 27, 2021 17:07
@tswasttswast changed the base branch fromv3.x.x tov3July 27, 2021 17:27
@tswasttswast marked this pull request as ready for reviewAugust 9, 2021 20:05
@tswasttswast changed the titlefix!: use nullable Int64 dtype by default into_dataframefix!: use nullableInt64 andboolean dtype by default into_dataframeAug 9, 2021
@tswasttswast changed the titlefix!: use nullableInt64 andboolean dtype by default into_dataframefix!: use nullableInt64 andboolean dtypes into_dataframeAug 9, 2021
@tswast
Copy link
ContributorAuthor

Re: system test failure:

_____________ TestBigQuery.test_load_avro_from_uri_then_dump_table _____________...E   google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.cloud.bigquery.job.load.LoadJob object at 0x7ff6eae07f40>>), last exception:

Didn't we increase the default deadline to 10 minutes? Maybe v3 branch needs a sync?

@tswasttswast requested a review fromplamutAugust 11, 2021 16:45
Copy link
Contributor

@plamutplamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Two nits, but not essential, looks good.

pip install --upgrade pandas
Alternatively, you can install the BigQuerypython client library with
Alternatively, you can install the BigQueryPython client library with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(nit)
Since already at this, there's at least on other occurrence of "python" not capitalized (line 69), which can also be fixed.

loss-of-precision.
Returns:
Dict[str, str]: mapping from column names to dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(nit) Can be expressed as the annotation of the function return type.

("max_results",), ((None,), (10,),)# Use BQ Storage API. # Use REST API.
)
deftest_list_rows_nullable_scalars_dtypes(bigquery_client,scalars_table,max_results):
df=bigquery_client.list_rows(
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Note to self: I'll need to exclude the INTERVAL column next time we sync with master

@tswasttswast added the automergeMerge the pull request once unit tests and other checks pass. labelAug 16, 2021
@tswasttswast mentioned this pull requestAug 16, 2021
2 tasks
@gcf-merge-on-greengcf-merge-on-greenbot merged commitdcd78c7 intogoogleapis:v3Aug 16, 2021
@gcf-merge-on-greengcf-merge-on-greenbot removed the automergeMerge the pull request once unit tests and other checks pass. labelAug 16, 2021
@tswasttswast deleted the b144712110-nullable-pandas-types branchAugust 16, 2021 15:44
tswast added a commit that referenced this pull requestMar 29, 2022
deps!: BigQuery Storage and pyarrow are required dependencies (#776)fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) feat!: destination tables are no-longer removed by `create_job` (#891)feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972)fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972)feat!: mark the package as type-checked (#1058)feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061)feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967)fix: improve type annotations for mypy validation (#1081)feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117)docs: Add migration guide from version 2.x to 3.x (#1027)Release-As: 3.0.0
waltaskew pushed a commit to waltaskew/python-bigquery that referenced this pull requestJul 20, 2022
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776)fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891)feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972)fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972)feat!: mark the package as type-checked (googleapis#1058)feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061)feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967)fix: improve type annotations for mypy validation (googleapis#1081)feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117)docs: Add migration guide from version 2.x to 3.x (googleapis#1027)Release-As: 3.0.0
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this pull requestApr 17, 2023
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776)fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891)feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972)fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972)feat!: mark the package as type-checked (googleapis#1058)feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061)feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967)fix: improve type annotations for mypy validation (googleapis#1081)feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117)docs: Add migration guide from version 2.x to 3.x (googleapis#1027)Release-As: 3.0.0
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

1 more reviewer

@plamutplamutplamut approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

api: bigqueryIssues related to the googleapis/python-bigquery API.cla: yesThis human has signed the Contributor License Agreement.semver: majorHint for users that this is an API breaking change.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

use pandas Int64 type by default to avoid precision loss

2 participants

@tswast@plamut

[8]ページ先頭

©2009-2025 Movatter.jp