Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork366
obstore-based Store implementation#1661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
obstore-based Store implementation#1661
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Amazing@kylebarron! I'll spend some time playing with this today. |
Withroeap/object-store-python#9 it should be possible to fetch multiple ranges within a file concurrently with range coalescing (using That PR also adds a |
Uh oh!
There was an error while loading.Please reload this page.
Great work@kylebarron! |
I suggest we see whether it makes any improvements first, so it's author's choice for now. |
While@rabernat has seen some impressive perf improvements in some settings when making many requests with Rust's tokio runtime, which would possibly also trickle down to a Python binding, the biggest advantage I see is improved ease of use in installation. A common hurdle I've seen is handling dependency management, especially around boto3, aioboto3, etc dependencies. Versions need to be compatible at runtime with any other libraries the user also has in their environment. And Python doesn't allow multiple versions of the same dependency at the same time in one environment. With a Python library wrapping a statically-linked Rust binary, you can remove all Python dependencies and remove this class of hardship. The underlying Rust object-store crate is stable and under open governance via the Apache Arrow project. We'll just have to wait onsome discussion in object-store-python for exactly where that should live. I don't have an opinion myself on where this should live, but it should be on the order of 100 lines of code wherever it is (unless the v3 store api changes dramatically) |
👍
I want to keep an open mind about what the core stores provided by Zarr-Python are. My current thinking is that we should just do a |
This is no longer an issue, s3fs has much more relaxed deps than it used to. Furthermore, it's very likely to be already part of an installation environment. |
I agree with that. I think it is beneficial to keep the number of dependencies of core zarr-python small. But, I am open for discussion.
Sure! That is certainly useful. |
itsgifnotjiff commentedFeb 23, 2024
This is awesome work, thank you all!!! |
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
The I'd like to update this PR soonish to use that library instead. |
If the zarr group prefersobject-store-rs, we can move it into the zarr-developers org, if you like. I would like to be involved in developing it, particularly if it can grow more explicit fsspec compatible functionality. |
kylebarron commentedOct 22, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I have a few questions because the
I like that |
This came up in the discussion athttps://github.com/zarr-developers/zarr-python/pull/2426/files/5e0ffe80d039d9261517d96ce87220ce8d48e4f2#diff-bb6bb03f87fe9491ef78156256160d798369749b4b35c06d4f275425bdb6c4ad. By default, it's passed as Does it look compatible with what you need? |
Now I'm just trying to get the tests to pass (re#1661 (comment)) and we should be good. (I can't get the tests to pass locally anyways; I get |
In |
Planning to merge this tomorrow if there aren't any objections. |
9e8b50a intozarr-developers:mainUh oh!
There was an error while loading.Please reload this page.
Thanks for the great work everyone! |
kylebarron commentedMar 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Thanks all! Justpublished obstore 0.6, which adds easier, automatic-token-refreshing integration with Planetary Computer. And I was able to gettheir zarr example working with this latest main! ![]() importmatplotlib.pyplotaspltimportpystac_clientimportxarrayasxrfromobstore.auth.planetary_computerimportPlanetaryComputerCredentialProviderfromobstore.storeimportAzureStorefromzarr.storageimportObjectStorecatalog=pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1/")collection=catalog.get_collection("daymet-daily-hi")asset=collection.assets["zarr-abfs"]# The PlanetaryComputerCredentialProvider automatically fetches Planetary# Computer SAS tokens as necessary and refreshes them before they expirecredential_provider=PlanetaryComputerCredentialProvider.from_asset(asset)azure_store=AzureStore(credential_provider=credential_provider)zarr_store=ObjectStore(azure_store,read_only=True)ds=xr.open_dataset(zarr_store,consolidated=True,engine="zarr")fig,ax=plt.subplots(figsize=(12,12))ds.sel(time="2009")["tmax"].mean(dim="time").plot.imshow(ax=ax,cmap="inferno")fig uv pyproject.toml[project]name ="zarr-obstore-pc"version ="0.1.0"description ="Add your description here"readme ="README.md"requires-python =">=3.12"dependencies = ["matplotlib>=3.10.1","obstore>=0.6.0","pystac-client>=0.8.6","xarray>=2025.3.0","zarr",][tool.uv.sources]zarr = {git ="https://github.com/zarr-developers/zarr-python" }[dependency-groups]dev = ["ipykernel>=6.29.5",] |
Huge props to@kylebarron and@maxrjones for sticking with this PR and getting it in! We'll get this out as part of Zarr 3.1. 👏 👏 👏 👏 👏 |
ilan-gold commentedMar 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hi this PR is very exciting - I am curious, is the performance expected to be better than EDIT: I see#1661 (comment) - could be great to highlight this work in the docs! |
Yes, I expect it to be significantly faster, but we don't have rigorous benchmarks yet. I'd love to see some Zarr benchmarks, and then maybe we can update the docs to reflect those. |
itsgifnotjiff commentedMar 28, 2025
I am not sure if this is within the scope of your benchmarking but if you can test the single point query times and performance for Zarr store in the 100 Tb range that would be great. Zarr v2 had problems with both number of inodes required and performance in my experience. The groups/tree addition along with the explosion of large scale data means Zarr stores are either already performant enough or not performant at all for different use cases (geospatial in mine). |
I don't personally use Zarr much, so ideally I want to enable other people to do benchmarking. But happy to pair or support in any way I can. |
itsgifnotjiff commentedMar 28, 2025
Makes perfect sense. I hope I get to benchmark it later this year. I will link my potential findings here as well 😊. Thank you so much for your work. |
Hey@itsgifnotjiff Davis Bennett wrote a great blog post for Earthmover that explains the general improvements in Zarr V3 with opening 100TB range datasets (which accounts for much of the time of single point queries) - you can read thathere. The obstore store offers further improvements, as shown below. Full details are available inhttps://github.com/maxrjones/zarr-obstore-performance. |
itsgifnotjiff commentedApr 5, 2025
Thank you very much for this. I can't wait to see if these kind of performance improvements also apply to pseudo zarrs (zarrs backed by our binary files). |
@itsgifnotjiff what are "pseudo zarrs"? It is similar to a virtual Zarr?https://github.com/zarr-developers/VirtualiZarr |
itsgifnotjiff commentedApr 5, 2025
Yes I am trying to see if I can create Icechunk Arrays and/or Zarr stores for Petabytes of binary format data. I work with Environment and Climate Change Canada where we have a wonderful binary format for NWP model outputs and like all the organisations I've talked to we can not abandon it but if we can build on top of it .... (Bit like gribjump from ECMWF or even some slides from Icechunk). |
Yes that's exactly the problem VirtualiZarr was built to solve.
Those slides are referring to VirtualiZarr, which has facility for writing "virtual" zarr chunks into Icechunk (seevirtualizarr docs oricechunk docs).
Interesting - I hadn't heard of this. But this issue is closed -@itsgifnotjiff let's continue this discussion on theVirtualiZarr repo - perhaps onthis issue (or feel free to open a new one). |




Uh oh!
There was an error while loading.Please reload this page.
A Zarr store based on
obstore, a Python library that uses the Rustobject_storecrate under the hood.object-store is a rust crate for interoperating with remote object stores like S3, GCS, Azure, etc. See thehighlights section of its docs.
obstoremaps async Rust functions to async Python functions, and is able to streamGETandLISTrequests, which all make it a good candidate for use with the Zarr v3 Store protocol.You should be able to test this branch with the latest version of
obstore:TODO: