Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ENH: Allow passing schema and additional_metadata into_parquet#3631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
m-mohr wants to merge5 commits intogeopandas:main
base:main
Choose a base branch
Loading
fromm-mohr:additional-metadata

Conversation

@m-mohr
Copy link
Contributor

@m-mohrm-mohr commentedJul 25, 2025
edited
Loading

Implements#3182

  • Allows to set an exact schema for the file (with metadata) through the parameterschema into_parquet
    • Allows to override the geo metadata
    • No encoding of the values by default, must be done by the user
  • Allows to add additional metadata to parquet file headers via a simpleadditional_metadata parameter (and without redefining the schema) into_parquet
    • Overrides the schema metadata if a conflict occurs
    • Doesn't allow to override the geo metadata
    • Encodes values to JSON

Soschema can be used if you need full control over "everything" andadditional_metadata can be used just to add some additional file metadata in a simple way.

@m-mohrm-mohr marked this pull request as draftJuly 25, 2025 21:40
@m-richards
Copy link
Member

Hey@m-mohr, thanks for making a start on this. I wanted to check, doeshttps://github.com/geopandas/geopandas/pull/3597/files also cover this use case?
I suppse you have to know about attrs, and know geopandas preserves it, but it also gives a way of writing arbitrary metadata - and also a way to retrieve it.

There's potentially still value in having this too, but interested in your thoughts.

Neither of these solutions deal with the schema updating you listed in your original issue though, that would probably need a mechanism like this.

@m-mohr
Copy link
ContributorAuthor

Thanks for the pointer. Indeed, I want to update this to actually cover updating the schema. This was a first attempt just to get going and realized during the night that I actually need more than just metadata ;-)

m-richards reacted with thumbs up emoji

@m-mohrm-mohr marked this pull request as ready for reviewJuly 28, 2025 14:11
@m-mohr
Copy link
ContributorAuthor

m-mohr commentedJul 28, 2025
edited
Loading

I've updated the PR accordingly, see the updated PR description above for more details.
Setting this as ready for review to get some first reviews of whether the concept would be acceptable.
Happy to finetune as requested and potentially fix the codecov issue then as well (although not exactly sure what to do yet), but first want to get first opinions before spending more time on it.

#3597 is not solving the issue because it adds metadata to PANDA_ATTRS, but I would neet it separately, i.e. not inside of PANDA_ATTRS.

@m-mohrm-mohr changed the titleENH: Support additional_metadata in to_parquet #3182ENH: Allow passing schema and additional_metadata in to_parquet #3182Jul 28, 2025
@m-richardsm-richards changed the titleENH: Allow passing schema and additional_metadata in to_parquet #3182ENH: Allow passing schema and additional_metadata into_parquetOct 30, 2025
Copy link
Member

@m-richardsm-richards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks@m-mohr this is looking good. I've left a couple of small comments. I'd like to get a second pair of eyes on it too since I'm not a parquet expert.

One question, does feather similarly support schema specification and metadata, would it be straightforward to add to that as well? I don't use feather, but generally the two are pretty in sync.

schema_version=None,
write_covering_bbox=None,
schema=None,
additional_metadata={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

the default value here shouldn't be anything that's mutable i.e. a dict, can lead to trouble where the dict is shared across calls to the function. Instead, set the default value to None, and do anif additional_metadata is None check in the body and reassign to an empty dict. Same in to parquet.

assertmetadata==value


@pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
@pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64()])
@pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64(),pa.float64()])

not sure how arrow will handle this, but I would like to test that either this works, or that the error message from the schema makes sense from a geopandas user perspective.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@m-richardsm-richardsm-richards approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@m-mohr@m-richards

[8]ページ先頭

©2009-2025 Movatter.jp