Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: Implement ZEP 8 URL syntax support for zarr-python#3369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
jhamman wants to merge19 commits intozarr-developers:main
base:main
Choose a base branch
Loading
fromjhamman:feature/zep8-url-support

Conversation

@jhamman
Copy link
Member

@jhammanjhamman commentedAug 11, 2025
edited
Loading

This PR implements support for the ZEP 8 URL syntax in Zarr Python.

Some examples of what now works:

importzarrroot=zarr.open_group('s3://bucket/data.zip|zip:|zarr3:')# S3 → ZIP → Zarr v3arr=zarr.create_array('memory:|zarr2:group/array',shape=(10, ),dtype='i4')# Memory → Zarr v2# custom adapter for icechunkds=xr.open_zarr('s3://icechunk-public-data/v1/glad|icechunk:')# icechunk (from xarray)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented indocs/user-guide/*.rst
  • Changes documented as a new file inchanges/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

closes#2943
fixes#2831
xref:zarr-developers/zeps#48

cc@jbms

maxrjones reacted with heart emojid-v-b reacted with rocket emoji
- Add comprehensive ZEP 8 URL parsing and resolution system- Implement StoreAdapter ABC for extensible storage adapters- Add built-in adapters for file, memory, S3, GCS, HTTPS schemes- Support pipe-chained URLs like s3://bucket/data.zip|zip:|zarr3:- Add URLSegment parsing with validation- Integrate with zarr.open_group and zarr.open_array APIs- Include demo script and comprehensive test suite- Pass all existing tests + 35 new ZEP 8-specific tests
@github-actionsgithub-actionsbot added the needs release notesAutomatically applied to PRs which haven't added release notes labelAug 11, 2025
@github-actionsgithub-actionsbot removed the needs release notesAutomatically applied to PRs which haven't added release notes labelAug 24, 2025
@jbms
Copy link

One tricky thing abouts3+https://endpoint/a/b is that it is ambiguous as to whether it is using "virtual host" syntax (i.e.endpoint refers to a single bucket and the path is "a/b") or "path" syntax (i.e. the bucket is "a" and the path is "b").

The "path" syntax is generally the default when running a regular s3-compatible server, but the "virtual host" syntax can commonly occur when someone defines a CNAME DNS entry to map their own domain or subdomain to an AWS S3 bucket.

When designing this syntax for Neuroglancer, it seemed like it would be annoying to require users to use separate syntax to disambiguate the two cases. Instead, for operations where it matters (namely List), Neuroglancer just automatically determines which of the two cases applies by trying both ways and seeing which one succeeds, and then caching the result so that subsequent list operations don't require two requests.

@codecov
Copy link

codecovbot commentedSep 11, 2025
edited
Loading

Codecov Report

❌ Patch coverage is67.57246% with179 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.87%. Comparing base (ee9c182) to head (35526a5).
⚠️ Report is 42 commits behind head on main.

Files with missing linesPatch %Lines
src/zarr/storage/_builtin_adapters.py58.36%97 Missing⚠️
src/zarr/storage/_zep8.py83.95%30 Missing⚠️
src/zarr/abc/store_adapter.py39.02%25 Missing⚠️
src/zarr/registry.py60.00%8 Missing⚠️
src/zarr/storage/_zip.py78.94%8 Missing⚠️
src/zarr/storage/_common.py85.00%3 Missing⚠️
src/zarr/storage/_register_adapters.py62.50%3 Missing⚠️
src/zarr/abc/__init__.py0.00%2 Missing⚠️
src/zarr/api/asynchronous.py0.00%2 Missing⚠️
src/zarr/storage/__init__.py0.00%1 Missing⚠️

❗ There is a different number of reports uploaded between BASE (ee9c182) and HEAD (35526a5). Click for more details.

HEAD has 4 uploads less than BASE
FlagBASE (ee9c182)HEAD (35526a5)
1410
Additional details and impacted files
@@             Coverage Diff             @@##             main    #3369       +/-   ##===========================================- Coverage   94.92%   60.87%   -34.06%===========================================  Files          79       86        +7       Lines        9500    10231      +731     ===========================================- Hits         9018     6228     -2790- Misses        482     4003     +3521
Files with missing linesCoverage Δ
src/zarr/storage/_logging.py61.94% <ø> (-38.06%)⬇️
src/zarr/storage/__init__.py9.52% <0.00%> (-85.48%)⬇️
src/zarr/abc/__init__.py0.00% <0.00%> (ø)
src/zarr/api/asynchronous.py71.42% <0.00%> (-19.45%)⬇️
src/zarr/storage/_common.py68.57% <85.00%> (-21.96%)⬇️
src/zarr/storage/_register_adapters.py62.50% <62.50%> (ø)
src/zarr/registry.py63.19% <60.00%> (-25.63%)⬇️
src/zarr/storage/_zip.py72.10% <78.94%> (-25.50%)⬇️
src/zarr/abc/store_adapter.py39.02% <39.02%> (ø)
src/zarr/storage/_zep8.py83.95% <83.95%> (ø)
... and1 more

... and64 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhammanjhamman marked this pull request as ready for reviewSeptember 12, 2025 03:03
Examples::

# Basic ZIP file storage
zarr.open_array("file:data.zip|zip", mode='w', shape=(10, 10), dtype="f4")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It seems like a bad idea to break file uris.

https://datatracker.ietf.org/doc/html/rfc8089

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is considered in the spec doc this PR is building on:zarr-developers/zeps#48

https://github.com/jbms/zeps/blob/92bc64111c7612083560358efdd4450e061f3746/draft/ZEP0008.md?plain=1#L115-L119

And later is says:

  Implementations SHOULD not support `file://relative/path` since that  is ambiguous with the `file://hostname/path` syntax defined by  [RFC8089](https://datatracker.ietf.org/doc/html/rfc8089).

If you forsee serious issues here I'd encourage commenting on that PR on the standard.

Copy link
Contributor

@ianhiianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Partway through the code at this point (currently following the logic of store resolution). Posting some comments now so I don't lose them (i've been burned by this before)

**In-memory storage:**

>>># Create array in memory
>>>z= zarr.open_array("memory:",mode='w',shape=(5,5),dtype="f4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can I then access this from somewhere else using this syntax?

e.g.memory:aesr80s9e8ra?

..warning::
The:class:`zarr.storage.ObjectStore` class is experimental.

URL-based Storage (ZEP 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think what this section is missing is a showcasing of what the equivalent zarr-pyhthon code would be to put it in terms people are more familiar with.

So each section would be:

zarr.open_array("file:zep8-data.zip|zip" ....)# is equivalent tozarr.open_array(zarr.storage.ZipStore(...)...)

Comment on lines +223 to +225
..note::
When using ZEP 8 URLs with third-party libraries like xarray, the URL syntax allows
seamless integration without requiring zarr-specific store creation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
..note::
When using ZEP 8 URLs with third-party libraries like xarray, the URL syntax allows
seamless integration without requiring zarr-specific store creation.

This is already effectively stated above.

Comment on lines +201 to +204
URL-based Storage (ZEP 8)
~~~~~~~~~~~~~~~~~~~~~~~~~

Zarr supports URL-based storage following the ZEP 8 specification, which allows you to specify storage locations using URLs with chained adapters::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Needs a link to the more extensive docs.


- ``file:path.zip|zip`` - ZIP file on local filesystem
- ``s3://bucket/data.zip|zip`` - ZIP file in S3 bucket
- ``memory:`` - In-memory storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
- ``memory:`` - In-memory storage

not an example of piping

Comment on lines +14 to +19
fromzarr.abc.store_adapterimportStoreAdapter
fromzarr.storage._localimportLocalStore
fromzarr.storage._loggingimportLoggingStore
fromzarr.storage._memoryimportMemoryStore
fromzarr.storage._zep8importURLStoreResolver
fromzarr.storage._zipimportZipStore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it worth considering makign these lazy, or not a big enough gain?

Comment on lines +218 to +221
>>> is_zep8_url("s3://bucket/data.zarr")
False
>>> is_zep8_url("file:///data.zarr")
False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't really follow these returning false. seems like a downgrade in functionality, also having looked a the spec a bit I don't see where this is explicitly disallowed (though very possible i misread or misunderstood)

Comment on lines +343 to +348
ifnotis_zep8_url(url):
# Check if it's a simple scheme URL that we can handle
if"://"inurlor ((":"inurl)andnoturl.startswith("/")):
# Parse as a single segment URL - the parser should handle this
try:
segments=self.parser.parse(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ahh I think this answers my question above. Since these are valid urls I think ideally is_zep8_url would handle these simple cases correctly.

Comment on lines +349 to +350
exceptException:
raiseValueError(f"Not a valid URL:{url}")fromNone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What are the expected exception's we are catching here. As written this might silently quash an exception that is a real bug.

I think this whole section can be consolidated and simplified a bit, especially as we don't actually do anyhthing differently here then in theelse branch that calls the same parse

fori,segmentinenumerate(segments):
ifsegment.adapterin ("zarr2","zarr3"):
# Skip zarr format segments - they don't create stores
# TODO: these should propagate to the open call somehow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Calling out this TODO as importnat before merging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Indeed this is crucial functionality:

importzarrarr=zarr.open_array("memory:|zarr2:",mode='w',shape=(5,),dtype='i4')print(f"Array:{arr}")print(f"Format version:{arr.metadata.zarr_format}")
Array: <Array memory://4411020480 shape=(5,) dtype=int32>Format version: 3

Copy link
Contributor

@ianhiianhi left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Accidentally submitted too early!

Ack! Hit submit instead of a different button! Below is still being filled out! I will comment when properly done.


Here are some edge cases I ran into:

Throws a FileNotFoundError:
Not sure what it should do, but I think this needs a Zep8 error

importzarrimportnumpyasnpimporttempfilefrompathlibimportPathtmpdir=Path(tempfile.mkdtemp())zip_path=tmpdir/"mydata.zip"arr_write=zarr.open_array(f"file:{zip_path}|zip:",mode='w',shape=(10,10),chunks=(5,5),dtype='f4')arr_write[:, :]=np.random.random((10,10))# throws FileNotFoundErrorarr_wrong=zarr.open_array(f"file:{zip_path}",# Missing |zip:mode='r')

Accidentally missing an adapter.

I think an absolutely killer feature here is to have automatic suggestions for how to correct a URL (like the python repr does for syntax errors)

e.g.

arr_wrong3=zarr.open_array(f"file:{zip_path}|log:",# Missing |zip:mode='r'    )print(f"Got data:{arr_wrong3[:]}")

gives

ArrayNotFoundError: No array found in store logging-file:///var/folders/tc/fkgp35zn7z913f9cmsxcl6pc0000gn/T/tmpxp7e5hc3/logged_data.zip at path

I would love to have a: "did you forget a|zip: as part of that error message

Nested Zips/nested adapters

I tried to replicate a URL from the spec locally of nested zip files. It simplifies to this:

importzarr,zipfile,tempfilefrompathlibimportPath# Create inner.zip with datatmpdir=Path(tempfile.mkdtemp())inner=tmpdir/"inner.zip"arr=zarr.open_array(f"file:{inner}|zip:",mode='w',shape=(3,),dtype='i4')arr[:]= [1,2,3]delarr# Close# Put inner.zip inside outer.zipouter=tmpdir/"outer.zip"withzipfile.ZipFile(outer,'w')aszf:zf.write(inner,"inner.zip")# Try nested read - THIS FAILSurl=f"file:{outer}|zip:inner.zip|zip:"arr_read=zarr.open_array(url,mode='r')# ❌ FileNotFoundError
FileNotFoundError: [Errno 2] No such file or directory: '[/var/folders/tc/fkgp35zn7z913f9cmsxcl6pc0000gn/T/tmp9kf2z2ks/outer.zip](http://localhost:8888/var/folders/tc/fkgp35zn7z913f9cmsxcl6pc0000gn/T/tmp9kf2z2ks/outer.zip)|zip:inner.zip'

somewhere in the URL reconstruction we lose the zep8 ness and try to just open the whole string as a file

Zarr supports URL-based storage following the ZEP 8 specification, which allows you to specify storage locations using URLs with chained adapters::

>>> # Store data directly in a ZIP file using ZEP 8 URL syntax
>>> z = zarr.open_array("file:data/example-zep8.zip|zip", mode='w', shape=(50, 50), chunks=(10, 10), dtype="f4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

should probably be adding a .zarr in here somewhere. so we aren't creating otherwise unopenable zip files

"""Validate the URL segment."""
importre

fromzarr.storage._zep8importZEP8URLError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's move this to errors.py to avoid circular import shenanigans

Comment on lines +34 to +37
classURLParser:
"""Parse ZEP 8 URL syntax into components."""

defparse(self,url:str)->list[URLSegment]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is currently failing to error on what ought to be invalid URLs with multiple schemes.

z = zarr.open_array("file:memory:example.zip|zip", mode='w', shape=(100, 100), chunks=(10, 10), dtype="i4")

executes and creates teh file:memory:example.zip

I think this is prohibited according to the rules here:https://zeps--48.org.readthedocs.build/en/48/draft/ZEP0008.html#absolute-url-syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also

z = zarr.open_array("memory:memory:", mode='w', shape=(100, 100), chunks=(10, 10), dtype="i4")

creates a standard zarr folder on disk namedmemory:memory:

Comment on lines +56 to +58
path=preceding_url[5:]# Remove 'file:' prefix
ifnotpath:
path="."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For security, implementation SHOULD place limits on where this scheme is permitted.

refering to.. which seems important here to not expose unlimited things in the filesystem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

not really sure how to get into the business of defining what is safe here. But flagging for consideration

Comment on lines +558 to +570
# Only extract paths from adapter segments, not scheme segments
# Scheme segments (like file:, s3:, https:) contain paths to the resource, not zarr paths within it
# Special handling for icechunk: paths with metadata references
# Both old format "branch:main", "tag:v1.0", "snapshot:abc123"
# and new format "@branch.main", "@tag.v1.0", "@abc123def456"
ifsegment.adapterin ("icechunk","ic"):
# Check old format: branch:main, tag:v1.0, snapshot:abc123
if":"insegment.pathandsegment.path.split(":")[0]in (
"branch",
"tag",
"snapshot",
):# pragma: no cover
continue# Skip icechunk metadata paths # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What's going on here? what is the old vs new?

can this all just be removed an handled by icechunk?

Comment on lines +604 to +606
exceptException:# pragma: no cover
# If parsing fails, treat as regular path # pragma: no cover
pass# pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think we shouldnot do this. Or at least be much more careful about what we pass through. e.g. myfile:memory: example I think should fail even if it's techinically valid to have a colon in a file name, i think this implementation will just make that illegal for zarr users perhaps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

but we would need an escape hatch for whoever is already doing that :/

fori,segmentinenumerate(segments):
ifsegment.adapterin ("zarr2","zarr3"):
# Skip zarr format segments - they don't create stores
# TODO: these should propagate to the open call somehow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Indeed this is crucial functionality:

importzarrarr=zarr.open_array("memory:|zarr2:",mode='w',shape=(5,),dtype='i4')print(f"Array:{arr}")print(f"Format version:{arr.metadata.zarr_format}")
Array: <Array memory://4411020480 shape=(5,) dtype=int32>Format version: 3

Copy link
Contributor

@ianhiianhi left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hitting shift enter sending the review off is absolutely killing me. Sorry I keep opening this too early....

A weird one. Adding a log store via zep8 breaks some S3 permissions, but does not if you open with explicit stores

importzarrfromzarr.storageimportFsspecStore,LoggingStorestorage_options= {'anon':True}# WORKS!s3_store=FsspecStore.from_url("s3://noaa-nwm-retro-v2-zarr-pds",storage_options=storage_options)logging_store=LoggingStore(s3_store)group=zarr.open_group(logging_store)keys=list(group.keys())print(f"Keys:{keys[:3]}")# DIES WITH PERMISSION ERRORurl="s3://noaa-nwm-retro-v2-zarr-pds|log:"group=zarr.open_group(url,storage_options=storage_options)keys=list(group.keys())print(f"Keys:{keys[:3]}")

i suspect that this is related the storage_options not being passed through correctly

Comment on lines +18 to +27
fromzarr.storage._builtin_adaptersimport (
FileSystemAdapter,
GSAdapter,
HttpsAdapter,
LoggingAdapter,
MemoryAdapter,
S3Adapter,
S3HttpAdapter,
S3HttpsAdapter,
ZipAdapter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should addLatencyStore to this list with some sensible defaults it would be super helpful for testing

Copy link
Contributor

@ianhiianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've started making some changes on a branch off of this. fixing the issues i found.

One architectural change is that in this PR as stands URLs can either go directly into fsspec (as a special case) or go through zep8 to end up in fsspec. I think it makes more sense to route all URLs through zep8 machinery. And possibly will for local stores.

I also think i found a bug with the inteded usage oflocal:// bypassing fsspec stores inis_fsspec_uri the second part of the if condition could never be triggered

Although

A weird one. Adding a log store via zep8 breaks some S3 permissions, but does not if you open with explicit stores

importzarrfromzarr.storageimportFsspecStore,LoggingStorestorage_options= {'anon':True}# WORKS!s3_store=FsspecStore.from_url("s3://noaa-nwm-retro-v2-zarr-pds",storage_options=storage_options)logging_store=LoggingStore(s3_store)group=zarr.open_group(logging_store)keys=list(group.keys())print(f"Keys:{keys[:3]}")# DIES WITH PERMISSION ERRORurl="s3://noaa-nwm-retro-v2-zarr-pds|log:"group=zarr.open_group(url,storage_options=storage_options)keys=list(group.keys())print(f"Keys:{keys[:3]}")

Comment on lines +458 to +459
ifstorage_options:
store_kwargs.update(storage_options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This effectively unpacks the storage_options dict which I think is what breaks my example with S3+ log store.

needs somethign like:

Suggested change
ifstorage_options:
store_kwargs.update(storage_options)
ifstorage_options:
store_kwargs['storage_options']=storage_options

Comment on lines +236 to +237
defget_supported_schemes(cls)->list[str]:
return ["s3","s3+http","s3+https"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

for thes3+ urls we aren't reaching here unfortunately:

zarr.open("s3+https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr",storage_options={'anon':True})
ValueError: Protocol not known: s3+https

i think this relates to the prior mentioned issues withis_zep8_url

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@martindurantmartindurantmartindurant left review comments

+3 more reviewers

@jbmsjbmsjbms left review comments

@laramiellaramiellaramiel left review comments

@ianhiianhiianhi left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Support ZEP 8 URL Syntax Can't conveniently open zip store from path with zarr v3

5 participants

@jhamman@jbms@martindurant@laramiel@ianhi

[8]ページ先頭

©2009-2025 Movatter.jp