Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add CLI for converting v2 metadata to v3#3257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
d-v-b merged 70 commits intozarr-developers:mainfromK-Meech:km/v2-v3-conversion
Sep 17, 2025

Conversation

@K-Meech
Copy link
Contributor

@K-MeechK-Meech commentedJul 16, 2025
edited
Loading

For#1798

Adds a CLI usingtyper to convert v2 metadata (.zarray /.zattrs...) to v3 metadatazarr.json.

To test, you will need to install the new optional cli dependency e.g.
pip install -e ".[remote,cli]"

This should make thezarr-converter command available e.g. try:

zarr-converter --helpzarr-converter convert --helpzarr-converter clear --help

convert addszarr.json files to every group / array, leaving the v2 metadata as-is. A zarr with both sets of metadata can still be opened withzarr.open, but will give a UserWarning:Both zarr.json (Zarr format 3) and .zarray (Zarr format 2) metadata objects exist... Zarr v3 will be used.. This can be avoided by passingzarr_format=3 tozarr.open, or by using theclear command to remove the v2 metadata.

clear can also remove v3 metadata. This is useful if the conversion fails part way through e.g. if one of the arrays uses a codec with no v3 equivalent.

All code for the cli is insrc/zarr/core/metadata/converter/cli.py, with the actual conversion functions insrc/zarr/core/metadata/converter/converter_v2_v3.py. These functions can be called directly, for those who don't want to use the CLI (although currently they are part of/core which is considered private API, so it may be best to move them elsewhere in the package).

Some points to consider:

  • I had to modifyset_path fromtest_dtype_registry.py andtest_codec_entrypoints.py, as they were causing the CLI tests to fail if they were run after. This seems to be due to thelazy_load_list of the numcodecs codecs registries being cleared, meaning they were no longer available in my code which finds thenumcodecs.zarr3 equivalent of a numcodecs codec.
  • I tested this on local zarr images, so it would be great if someone with access to s3 / google cloud etc., could try it out on some small example images there.
  • I'm happy to add docs about how to use the CLI, but wanted to get feedback on the general structure first

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented indocs/user-guide/*.rst
  • Changes documented as a new file inchanges/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

jni and norlandrhagen reacted with heart emoji
@github-actionsgithub-actionsbot added the needs release notesAutomatically applied to PRs which haven't added release notes labelJul 16, 2025
Copy link
Contributor

@dstansbydstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

🎉 I think this is good now - I have one question about use of the logger, but it's not a blocker. I'll let this sit for a week or so because it's complicated, and would benefit from a second reviewer. If no-one reviews by then, I'll merge.


app = typer.Typer()

logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it deliberate that this is a new logger, instead of importing the logger object fromzarr? I don't tihnk it matters too much, but re-usingzarr._logger might save some code duplication because you could remove functions from this file for configuring the logger.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes this was intentional - most of the other libraries I've seen have their root logger (i.e.zarr._logger), then each individual file still callslogger = logging.getLogger(__name__) to create a child of that logger.

I could usezarr._logger here, but there are already other files inzarr that uselogger = logging.getLogger(__name__) - so I thought this would be more consistent. Also, I'm not sure changing this would remove code in this file?_set_logging_level usesverbose to determine if the logging level should be INFO or WARNING, then callszarr.set_log_level andzarr.set_format to set this onzarr._logger. So it's already using settings on the overall logger?

return False


def _convert_array_metadata(metadata_v2: ArrayV2Metadata) -> ArrayV3Metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

later on we can put this inzarr.core.metadata, e.g. in aconversion module

TomNicholas reacted with thumbs up emoji
@d-v-b
Copy link
Contributor

a high-level comment about the conversion strategy. this is not a blocker for this PR, but the current conversion strategy is serial over arrays and groups. This means converting an array / group waits until the previous one is converted, and it increases the likelihood of conversion breaking in the middle, leaving a hybrid v2 / v3 hierarchy.

We could also do conversion with a concurrent API based on concurrently reading the entire input hierarchy withdict(group.members(max_depth=None)), preparing new metadata documents for each member, then concurrently writing those documents withcreate_hierarchy.

This could be considered merely a performance concern, but I suspect reading and writing the hierarchy concurrently might be the difference between success and failure for this CLI when people use it on high-latency storage backends.

@K-Meech if you want to experience how your CLI tool handles IO latency, we have astore wrapper class that adds latency to any other store class. Try running your conversion on a large zarr hierarchy with 100ms of latency added toget andset on a local store. If it takes more than a few seconds, then this strongly argues for moving away from the serial conversion strategy and using something concurrent.

TomNicholas reacted with thumbs up emoji

@K-Meech
Copy link
ContributorAuthor

Thanks@d-v-b . I tried converting a Zarr with 30 nested groups (each with a small array inside) and this took about 10 seconds total (with aLatencyStore set up asLatencyStore(local_store, get_latency=0.1, set_latency=0.1)). I'vecreated a gist of this small test, if you wanted to try it out.

Happy to look at changing this to use concurrent writes withcreate_hierachy - perhaps best to do in a separate PR after this one is merged? What do you think?

I've been keeping a list of potential follow up changes mentioned on this PR - happy to convert these into issues after merge:

  • Handle conversion of consolidated metadata (.zmetadata) files
  • Handle any combination ofnumcodecs codecs -see discussion in thread. Currently the code only handles filters that areArrayArrayCodecs and compressors that areBytesBytesCodecs.
  • Use concurrent writes, rather than serial

@d-v-b
Copy link
Contributor

Thanks@d-v-b . I tried converting a Zarr with 30 nested groups (each with a small array inside) and this took about 10 seconds total (with aLatencyStore set up asLatencyStore(local_store, get_latency=0.1, set_latency=0.1)). I'vecreated a gist of this small test, if you wanted to try it out.

cool, thanks for running this experiment! IMO 10 seconds for converting 30 arrays and 30 groups is fine to start with. We can treat the performance tuning as an implementation detail for a later PR.

@K-Meech
Copy link
ContributorAuthor

I've fixed the merge conflicts with this branch, and updated docstrings to reference the newzarr.codecs.numcodecs (rather thannumcodecs.zarr3). I think I've addressed everyone's comments as well -@dstansby , see my replies to your comments above.
Do let me know if anything else needs changing / updating here - otherwise it might be ready to go in?

d-v-b reacted with rocket emoji

@d-v-b
Copy link
Contributor

@zarr-developers/python-core-devs I would like to merge this soon (24 hours) so now is a great time to look over this PR!

K-Meech reacted with hooray emoji

@d-v-bd-v-benabled auto-merge (squash)September 17, 2025 08:13
@d-v-b
Copy link
Contributor

it's going in once the tests are green. Thanks so much@K-Meech! If you don't mind, I'd like to make a post about your work over at image.sc, so people know about this functionality. This might be a good way to get some early feedback from users.

K-Meech reacted with hooray emoji

@d-v-bd-v-b merged commit3c883a3 intozarr-developers:mainSep 17, 2025
31 checks passed
@lumberbot-app
Copy link

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 3.1.xgit pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 3c883a3c578b6e9fdb4d5fd7a160ce992aaec1b3
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #3257: Add CLI for converting v2 metadata to v3'
  1. Push to a named branch:
git push YOURFORK 3.1.x:auto-backport-of-pr-3257-on-3.1.x
  1. Create a PR against branch 3.1.x, I would have named this PR:

"Backport PR#3257 on branch 3.1.x (Add CLI for converting v2 metadata to v3)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove theStill Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free tosuggest an improvement.

@K-Meech
Copy link
ContributorAuthor

Great - thanks@d-v-b ! I'll make some follow up issues foradditional features from the reviews on this PR.

@joshmoore
Copy link
Member

👏🏽 👏🏽 - The one thought I had while pondering this,@K-Meech, (and sorry if it was already discussed) is to what degree the CLI will follow the semver of the library itself. Not that anything needs doing but might be worth expectation management on the part of consumers.

@d-v-b
Copy link
Contributor

👏🏽 👏🏽 - The one thought I had while pondering this,@K-Meech, (and sorry if it was already discussed) is to what degree the CLI will follow the semver of the library itself. Not that anything needs doing but might be worth expectation management on the part of consumers.

the library doesn't follow semver -- we are usingeffver

@joshmoore
Copy link
Member

Fair enough, effver then. Can a user at this point reasonably expect the same level of stability from the CLI that they would for the API?

@d-v-b
Copy link
Contributor

I don't see why we should version the CLI any differently, other than the fact that it's pretty new and so it might need some polishing as people use it. To give ourselves some flexibility, we could consider adding a disclaimer to the CLI stating that it's still experimental and might experience breaking changes in future versions of zarr python.

joshmoore reacted with thumbs up emoji

@d-v-b
Copy link
Contributor

see#3465

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@d-v-bd-v-bd-v-b approved these changes

+1 more reviewer

@dstansbydstansbydstansby approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Projects

None yet

Milestone

3.1.2

Development

Successfully merging this pull request may close these issues.

6 participants

@K-Meech@d-v-b@dstansby@TomNicholas@emmanuelmathot@joshmoore

[8]ページ先頭

©2009-2025 Movatter.jp