Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork366
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Supporting the array api in zarr-pythonThepython array api standard is an effort to standardize the API for various NDArray objects across the python ecosystem. In thev3 roadmap, one of the goals is to "Align the Zarr-Python array API with the array API Standard". I would like to use this discussion to consider how we can achieve this goal. Array AttributesHere are theattributes defined in the array API standard:
Some of these ( Array methods / functionsThe array API defines a LOT offunctions and methods that transform arrays into new arrays or scalars. Besides indexing, I think implementing these routines in So I would suggest that we support operations that select data (i.e., indexing), but not operations that transform data. How much of the API standard can we supportWithout data-transforming functions and methods, not much, in percentage terms! I couldn't find guidelines for libraries that only support a subset of the standard, but maybe this describes most array APIs used today other than numpy. However, I think this is fine. As long as zarr arrays can be coerced to numpy / cupy / ... arrays as needed, users should be able to compute what they need using the numpy / cupy / ... apis. I'm curious to hear what other people think about this approach. |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 1 comment
-
Today, in practice, we rely on Dask.Array pretty heavily to defer execution against Zarr-backed arrays. Dask already implements a "lazy graph-based computation system", and we should definitely not try to create a new alternative to that here. We could, however, aim to integrate with other similar libraries, such ascubed. There may be room for a more light-weight deferred-execution Array library (similar to Xarray's duck arrays), but I don't think that belongs in Zarr. We should document better how users can wrap their Zarr arrays in Dask / Cubed arrays in order to obtain deferred execution. In Zarr, we should focus on implementing operations that can be pushed down to the storage layer to optimize computational pipelines. This is primarily indexing...
...I think this is a very good rule of thumb
This is an interesting example and hints at one ambiguity behind the idea that we don't support "operations that transform data". The fact is, this is exactly what codecs do. We already have a dtype codec. You could also imagine a generalized arithmetic codec that operates elementwise on each item--kind of like the So one idea might be...if we know how to express an array-API operation as a codec, we could push it into the codec pipeline. This is something we could explore incrementally, one operation at a time.
I think this is a perfectly reasonable idea. My reading of the API is that there is no expectation of cross-library understanding of device, so we are free to define it however we want. Could we use this information some how? Like, if we know two arrays are on the same device, can we use that for any sort of optimization? |
BetaWas this translation helpful?Give feedback.
All reactions
👍 2