Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork366
RFC: Zarr-Python 3.0 Design Doc#1569
-
As part of the Zarr-Python Refactoring working group (#1480) I have developed adraft proposal for a major block of work in the Zarr-Python project, leading version 3.0 and including support for the V3 spec. This post is both aRequest for Comment and a forum for discussion. Goals (read the doc for details)
Thanks to@d-v-b,@normanrz,@rabernat,@dcherian,@monodeldiablo,@olimcc,@martindurant, and@JackKelly for their input so far. cc @zarr-developers/python-core-devs. Note: The next Zarr-Python Refactor working group is slated to meet at 9a PT on November 22, 2023. We'll be discussing this topic in detail then, please feel free to join (#1480). |
BetaWas this translation helpful?Give feedback.
All reactions
👍 5❤️ 1
Replies: 2 comments 9 replies
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Looks awesome! Working withbatches of chunks when doing IO and/or (de)compression from sharded ZarrsTL;DR: The current design looks great for handling something like 1,000 chunks per second1. But, if we want to push Zarr to handle more like one million chunks per second (from a sharded Zarr), then it'd be great to discuss adding support for processingbatches of chunks in one go. I'd be keen to discuss whether this draft proposal would be an appropriate time to consider adding support for processingbatches of chunks (in a single function call). For example, allowing
Yes, in principle, the And, if we're trying to load one million chunks per second then we'll bump into the issue that function calls are surprisingly slow in Python:each function call (with arguments) takes about 350 ns4 (just for the function call: not doing any actual work!) A million function calls would take 0.35 seconds! So, it should be more efficient to do single Some reasons why we might want to ignore this suggestion for Zarr-Python:
FootnotesFootnotes
|
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
I would say it's not really the loop or the function call that will make the difference, but (as other benchmarking already found) memcopies, cpython API calls and syscalls. For instance 10.2ms (where b is a bytes object) |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
Hi@jni, ha! Sorry I didn't "connect the dots" between your blog and your GitHub ID! It's a small world, huh?!? Thanks loads for re-doing the benchmarks! @martindurant, I completely agree that memcopies are surprisingly slow. As you've mentioned in the past, the ideal case would be that - for uncompressed Zarr chunks - we can do exactly one copy: from storage to the final numpy array (ideally using DMA). If we have batched processing of chunks, then there's at least a chance that, even for chunks that need to be "scattered" throughout the final numpy array, it'd still be possible to do that using DMA (e.g. using |
BetaWas this translation helpful?Give feedback.
All reactions
-
I was not really advocating for a low level super fix like that, but by all means try it. I think the python readinto and decompress_into are probably enough, but the point is that the "batch" concept I don't think will make much difference so long as high latency reads are awaited together. I consider this the low-hanging fruit, together with not reading bytes we don't need. All contingent on contiguous memory blocks. Where striding is at issue, we probably can't do better than numpy copies, unless we write our own decompression wrapper capable of "getting every fourth uncompressed byte" or similar pattern. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
On the topic of where to put this in the stack... I feel I'm too new to the Zarr-Python world to have a particularly strong grasp of the pros and cons of where to place this. But I will give it more thought. One option thatI'm considering for my Zarr Rust experiments, is to have a method which - ported to Python - would be something like: classStore:asyncdefget_items(self,keys:Iterable[str],transform:Optional[Callable]=None )->list[bytes]:"""Get multiple chunks. Args: keys: List of chunk keys. transform: A function that will be applied to every chunk. The function must take two arguments: - chunk_id: int - chunk_data: bytes And it must return bytes or None (if, for example, the processing function moves data to a final array) """ Then the So, this kind of dodges the question of "where to put decompression". The But I don't know if that's appropriate for Zarr-Python. Nor have I actually implemented this in Rust yet, so I don't know if it's a good idea for Rust, either! |
BetaWas this translation helpful?Give feedback.
All reactions
-
Were you following |
BetaWas this translation helpful?Give feedback.
All reactions
-
Hi all, just wanted to offer support for the roadmap and design doc (#1583), kudos for plotting a clear course through some very tricky terrain. |
BetaWas this translation helpful?Give feedback.
All reactions
❤️ 1