Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork993
ENH: Allow passing schema and additional_metadata into_parquet#3631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
m-richards commentedJul 28, 2025
Hey@m-mohr, thanks for making a start on this. I wanted to check, doeshttps://github.com/geopandas/geopandas/pull/3597/files also cover this use case? There's potentially still value in having this too, but interested in your thoughts. Neither of these solutions deal with the schema updating you listed in your original issue though, that would probably need a mechanism like this. |
m-mohr commentedJul 28, 2025
Thanks for the pointer. Indeed, I want to update this to actually cover updating the schema. This was a first attempt just to get going and realized during the night that I actually need more than just metadata ;-) |
m-mohr commentedJul 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I've updated the PR accordingly, see the updated PR description above for more details. #3597 is not solving the issue because it adds metadata to PANDA_ATTRS, but I would neet it separately, i.e. not inside of PANDA_ATTRS. |
Uh oh!
There was an error while loading.Please reload this page.
to_parquet
m-richards left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks@m-mohr this is looking good. I've left a couple of small comments. I'd like to get a second pair of eyes on it too since I'm not a parquet expert.
One question, does feather similarly support schema specification and metadata, would it be straightforward to add to that as well? I don't use feather, but generally the two are pretty in sync.
| schema_version=None, | ||
| write_covering_bbox=None, | ||
| schema=None, | ||
| additional_metadata={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
the default value here shouldn't be anything that's mutable i.e. a dict, can lead to trouble where the dict is shared across calls to the function. Instead, set the default value to None, and do anif additional_metadata is None check in the body and reassign to an empty dict. Same in to parquet.
| assertmetadata==value | ||
| @pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| @pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64()]) | |
| @pytest.mark.parametrize("int_type", [pa.int8(),pa.uint16(),pa.int64(),pa.float64()]) |
not sure how arrow will handle this, but I would like to test that either this works, or that the error message from the schema makes sense from a geopandas user perspective.
Uh oh!
There was an error while loading.Please reload this page.
Implements#3182
schemainto_parquetadditional_metadata parameter(and without redefining the schema) into_parquetSo
schemacan be used if you need full control over "everything" andadditional_metadatacan be used just to add some additional file metadata in a simple way.