For my work on the sparse codec (and after discussing with@d-v-b and@jhamman at the zarr summit), I've noticed that it should be possible to have the codecs declare their input and output buffer types. The codec pipeline can then verify that the codecs form a chain of buffer types (kind of like jigsaw puzzle pieces), and infer the codec pipeline's buffer prototype as the input of the first array-to-array codec and the output of the last bytes-to-bytes codec.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented indocs/user-guide/*.md
Changes documented as a new file inchanges/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

keewis added6 commits

October 15, 2025 12:26

array registry infrastructure

d57b383

infer the prototype from the array type

b35030e

add buffer declarations to codecs

b36c1a1

use the codec pipeline's prototype instead

1f718a8

Revert "array registry infrastructure"

ca12282

This reverts commitd57b383.

get typing to pass

7ba5ced

github-actionsbot added the needs release notesAutomatically applied to PRs which haven't added release notes label

Oct 16, 2025

Copy link

ContributorAuthor

keewis commentedOct 16, 2025

it just dawned to me that we can potentially split up the sparse codec (which is a array-to-bytes codec) into a array-to-array codec that extracts the metadata and component arrays of the sparse array and creates to specialized "multi-array buffer" for sparse arrays, and a generalized array-to-bytes codec that takes the "multi-array buffer" and packs it into bytes. This obviously means that the metadata we extracted has to live in the array-to-array codec's configuration.

Then should we want a similar procedure for a different array type (e.g. masked arrays or geoarrow-encoded geometry arrays), we can just create a specialized pair of array-to-array codec and "multi-array buffer" type, and reuse the "multi-array to bytes" codec.

keewis mentioned this pull request

Oct 16, 2025

configure zarr to use the sparse buffer for the sparse codeckeewis/zarr-sparse#15

Open

keewis added3 commits

October 17, 2025 11:10

also make the v2 codec declare its buffers

d53ca8d

more input / output declarations on codecs

78287f9

more buffer declarations

b9b7058

Labels

needs release notes

Automatically applied to PRs which haven't added release notes

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Draft: jigsaw codecs: allow codecs to specify the buffers they work on#3529

Are you sure you want to change the base?

Draft: jigsaw codecs: allow codecs to specify the buffers they work on#3529

Uh oh!

Conversation

keewis commentedOct 16, 2025

Uh oh!

keewis commentedOct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant