Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Zarr-Python: roadmap after 3.1#3250

jhamman started this conversation inGeneral
Jul 15, 2025· 4 comments· 4 replies
Discussion options

I gavethis presentation at SciPy last week on the progress Zarr-Python has made over the last year.tldr; we've come a long way!

I also shared a potential feature list that I think could form the beginning of the roadmap beyond version 3.1 (which landed today!).

Targets for Zarr 3.2 and beyond

  • Performance tuning (sharding, codec-pipeline, async+multi-threading)
  • Additional array-types (e.g. sparse) and data-types (e.g. ml-dtypes)
  • New Extensions
    • Variable length chunk grids
    • Sparse arrays
  • GPU support
    • On-GPU (de)compression
    • Other hardware (Apple, MLX)

Curious to get the input from others on what else we're looking to work on next.

A nice outcome from this discussion would be an update to the Zarr-Python Roadmap which is now nicely out of date:https://zarr.readthedocs.io/en/stable/developers/roadmap.html

cc @zarr-developers/python-core-devs, @zarr-developers/python-emeritus

You must be logged in to vote

Replies: 4 comments 4 replies

Comment options

Adding read/write permissions to arrays & groups would be a good one to add, which I have as a work in progress.

You must be logged in to vote
0 replies
Comment options

  • a lazy array indexing API.
    I actually think this will unlock some performance improvements in our array IO.
  • removeStorePath, give all of its methods to theStore classes
  • Do something other than returnNone for missing keys in the store APIs (e.g., use aResult type, assuming there are no performance penalties)
  • Define an awaitableFuture object, and have all of our user-facing async APIs return instances of thisFuture object that wrap the awaitable they currently return.Future objects would also have async orresult method that just callssync on the underlying awaitable
You must be logged in to vote
4 replies
@normanrz
Comment options

  • a lazy array indexing API.
    I actually think this will unlock some performance improvements in our array IO.

Is that the same as array "views"? I would be a fan of that

  • removeStorePath, give all of its methods to theStore classes

What is the rationale for that? I think it is quite useful to have a pointer to a file (or even byte range) in a store.

@d-v-b
Comment options

Is that the same as array "views"? I would be a fan of that

Yeah, we could think of it like "views" but I think the more basic analogy is with the semantics of slicing generic collections. When you slice into a tuple, you get another tuple, not a numpy array. Zarr should follow the same principle. In concrete terms this would require modelling a zarr array as supported by a collection of(stored object, index) tuples. Indexing the zarr array would create a new array supported by a strict subset of the original collection of(stored object, index) tuples. This would also allow indexing to be reversible, i.e. we could concatenate zarr arrays, or build zarr arrays from stored objects on different storage backends.

What is the rationale for that? I think it is quite useful to have a pointer to a file (or even byte range) in a store.

If it's useful to have this, then it should be part of the Store API.StorePath is literally just a store, a string, and a set of convenience methods that use the store and the string. We can get the exact same functionality by putting all of this logic on the store classes themselves, with the benefit of removing a lot of unnecessary code.

@normanrz
Comment options

Is that the same as array "views"? I would be a fan of that

Yeah, we could think of it like "views" but I think the more basic analogy is with the semantics of slicing generic collections. When you slice into a tuple, you get another tuple, not a numpy array. Zarr should follow the same principle. In concrete terms this would require modelling a zarr array as supported by a collection of(stored object, index) tuples. Indexing the zarr array would create a new array supported by a strict subset of the original collection of(stored object, index) tuples. This would also allow indexing to be reversible, i.e. we could concatenate zarr arrays, or build zarr arrays from stored objects on different storage backends.

I thinkt that would be great. It would need careful API design to become usable and not too confusing. For my cases, an iterator that returns read/writable chunk or shard views would suffice.

@d-v-b
Comment options

yes my plan here is to start with low-level stuff likehttps://github.com/d-v-b/zarr-python/blob/0b9916443d555d9e762f5501314383dc828c26bf/src/zarr/core/array.py#L5253-L5286 and start working that into our indexing routines.

Comment options

Define an awaitable Future object

This has been done before in dask-distributed (inspired by concurrent.futures). fsspec decided not to follow the model, although it was discussed - of course we have sync() in common. I wonder if there is scope to come up with a spinoff project (e.g.,https://docs.rs/futures/latest/futures/ !) for the general public good.

You must be logged in to vote
0 replies
Comment options

remove StorePath, give all of its methods to the Store classes

This is not really user-facing, so whatever makes the most sense internally. If you can remove code and complexity, it's probably worth it.

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
5 participants
@jhamman@normanrz@d-v-b@martindurant@dstansby

[8]ページ先頭

©2009-2025 Movatter.jp