Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Group Backend Keyword Arguments#10422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
kmuehlbauer wants to merge4 commits intopydata:main
base:main
Choose a base branch
Loading
fromkmuehlbauer:backend_kwargs

Conversation

kmuehlbauer
Copy link
Contributor

This is a first attempt and base for discussion.

This PR does the following:

  1. splitopen_dataset kwargs into four groups:
    Here I followed@shoyer's suggestion to use dataclassesGroup together decoding options into a single argument #4490.
  • coder_opts: options for CF coders (eg. mask_and_scale, decode_times)
  • open_opts: options for the backend file opener (eg. driver, clobber, diskless, format)
  • backend_opts: options for xarray (eg. chunk, cache, inline_array)
  • store_opts: options for the backend store (eg. group, lock, autoclose)
  1. define these classes inBackendEntrypoint and override them in the subclasses.
    for now only for netcdf4/h5netcdf backends
  2. implement logic intoopen_dataset
  3. implement logic intoto_netcdf
  4. for backwards compatibility reinitialize the above options with the given kwargs as needed

Example usage:

# simple call, use backend default optionsds=xr.open_dataset("test.nc",engine="netcdf4")# simple call# define once, use many , these should be imported from the backendopen_opts=NetCDF4OpenOptions(auto_complex=True)coder_opts=NetCDF4CoderOptions(decode_times=False,mask_and_scale=False)backend_opts=XarrayBackendOptions(chunk={"time":10})store_opts=NetCDF4StoreOptions(group="test")# engine could also be the `BackenEntryPoint`ds=xr.open_dataset("test.nc",engine="netcdf4",open_opts=open_opts,coder_opts=coder_opts,backend_opts=backend_opts,store_opts=store_opts)

CONS:

  • Most users might not need to use these added options at all, but could fallback to current behaviour
  • Users might complain about the additional complexity for setting up the dataclasses
  • tbc.

PROS:

  • strict separation of kwargs/options
  • easy forwarding
  • per backend kwargs/options
  • easy adding kwargs/options
  • tbc.

What this PR still needs to do:

  • implement everything above for the other built-in backends (zarr, scipy, pydap, etc.)

I have follow-up ideas:

  • implementsave_dataset inBackendEntrypoint to write to the engine's native format, liketo_netcdf would be for scipy/netcdf4/h5netcdf andto_zarr would be for zarr. With that we could do the writing with a unified API, something like:

    ds=xr.open_dataset("test.nc",engine="netcdf4")# Dataset APIds.save_dataset("test.zarr",engine="zarr)ds.save_dataset("test2.nc",engine="netcdf4")# general APIxr.save_dataset(ds,"test2.nc",engine="netcdf4")ds.save_dataset("test.grib",engine="grib")# my imaginationds.save_dataset("test.hdf5",engine="hdf5")# my imagination
  • further disentangle the current built-in backends from xarray so that they could be their own module

I'm sure I have not taken into account all the possible pitfalls/problems which might arise here. I'd appreciate any comments and suggestions.

@kmuehlbauer
Copy link
ContributorAuthor

Please have a look at#10429, where I've split out the cf coder related kwargs grouping.

@keewis
Copy link
Collaborator

To summarize what I argued for after the end of the meeting today, I think we should slowly transition to an API where we pass the entire decoding chain as a sequence of functions / callable objects intoxr.open_dataset that would be executed in that order they were passed. Additionally, backends should have the option to disable certain builtin coders (this is especially important when encoding).

This would requirea lot of thought to figure out a good API, and even more to find a good way to transition towards that. I think this would make extending the coders a lot easier, and possibly pave the way towards dataset coders (or rather, multi-variable coders).

I think it might be possible to change the dataclass added in this PR to act as a bridge towards the idea in#4490 (comment) (which should probably be extended to allow other libraries / backends to modify that chain).

kmuehlbauer reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Unconstrained forwarding of backend keyword arguments
3 participants
@kmuehlbauer@keewis@dcherian

[8]ページ先頭

©2009-2025 Movatter.jp