Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Are trailing partial chunks padded?#3056

Answeredbyd-v-b
maxrjones asked this question inQ&A
Discussion options

I'm having difficulties with the defaultArrayBytesCodec for the last partial chunk of a dataset. Upon inspection, it seems like Zarr-Python pads any trailing partial chunks. Is this interpretation correct? If so, is it intentional? I'm asking because this expectation for all chunks to be complete seems to cause issues for virtualization from other file formats and will also presumable cause errors for any future virtual array concatenation of Zarr stores (e.g.,zarr-developers/zarr-specs#288).

Here's an example:

importzarrfromzarr.storageimportLocalStore# Create an array with one full chunk of shape (3,4) and one partial chunk (1,4)shape= (4,4)chunks= (3,4)new_dtype="uint8"overwrite=Truezarr_format=3store=LocalStore(root=".vscode/zarr-data/example.zarr",read_only=False)arr=zarr.create_array(store,name="0",shape=shape,chunks=chunks,dtype=new_dtype,zarr_format=3,compressors=None,filters=None,overwrite=overwrite)arr[:]=42

Inspect the size of the two chunks on disk

ls -l .vscode/zarr-data/example.zarr/0/c/0/0| awk'{print $5}'# 12 (expected)ls -l .vscode/zarr-data/example.zarr/0/c/1/0| awk'{print $5}'# 12 (I would expect 4)
You must be logged in to vote
Answered by d-v-bMay 13, 2025

Is this interpretation correct? If so, is it intentional?

for thedefault chunk grid, yes to both questions.

This can still work for virtualization but only if byte ranges are addressible in the virtualization scheme, and the byte range for all the boundary chunks has been calculated.

Replies: 2 comments 3 replies

Comment options

Is this interpretation correct? If so, is it intentional?

for thedefault chunk grid, yes to both questions.

This can still work for virtualization but only if byte ranges are addressible in the virtualization scheme, and the byte range for all the boundary chunks has been calculated.

You must be logged in to vote
3 replies
@maxrjones
Comment options

maxrjonesMay 13, 2025
Maintainer Author

Interesting, thanks for the link and explanation.

This can still work for virtualization but only if byte ranges are addressible in the virtualization scheme, and the byte range for all the boundary chunks has been calculated.

Hmm I think that alone is not sufficient. For example, I have the addressable byte ranges for boundary chunks but encounter a reshape error in the ArrayBytes codec because the buffer length is less than a full chunk's buffer length. IIUC one option is to add a BytesBytes (e.g.,Pad) codec in between any compressors and the ArrayBytes codec that pads the buffer to match a full chunk and make sure that thePad codec aligns well with the defined ArrayBytes codec so that the correct bytes are truncated. Seems a bit risky though. Do you know off-hand where in the codec pipeline the truncation happens?

An alternative would be to propose a chunk grid extension, but that risks limiting interoperability.

@maxrjones
Comment options

maxrjonesMay 13, 2025
Maintainer Author

I guess we could also define a customArrayBytes codec that pads before reshaping, again not sure if this is way too hacky though.

@d-v-b
Comment options

ah and I was wrong about the byte addressing thing -- on the encoding side, Ithink partial chunks are padded to full size before the codec pipeline runs? and if so, there's no way a byte range can be helpful, because the entire padded chunk will be compressed.

Answer selected bymaxrjones
Comment options

Note that this is the same issue as#3035.

Here's the explanation for this behavior that I provided there.

it makes sense for the following reason: if every chunk is exactly the same size, then we can easily resize the array without ever having to rewrite chunks. Otherwise, Zarr would have to keep explicitly keep track of the size of the chunks somewhere.

For the default chunk grid, every chunk is identical in terms of how it is stored. There is nothing special about the final chunk.

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Q&A
Labels
None yet
3 participants
@maxrjones@rabernat@d-v-b

[8]ページ先頭

©2009-2025 Movatter.jp