Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A Torrent ZarrStore#3036

alxmrs started this conversation inIdeas
May 5, 2025· 6 comments· 13 replies
Discussion options

Coming back from Cloud Native Geo, a major theme was how to build resilient data infrastructure (or ecosystem). A common idea that came up in many discussion groups was could we somehow store scientific data via bit-torrent. I don't think this idea seems too far fetched; we simply need to figure out how the Zarr protocol p2p torrent protocols align.

Projects likehttps://academictorrents.com/ (https://github.com/academictorrents) seem to validate that a peer-to-peer, public, global store of data is highly desirable – especially in a time when critical medical and environmental data is potentially being lost.

This repo in their project, in particular, seems very xpublish shaped!https://github.com/academictorrents/torrentify. A deeper look into the intersection of these protocols is warranted right now (maybe, even urgently).

Update:@samapriya is definitely one of the people at the conference who first brought up this idea!

You must be logged in to vote

Replies: 6 comments 13 replies

Comment options

Is IPFS a plausible alternative?

https://ipfs.tech/

You must be logged in to vote
8 replies
@alxmrs
Comment options

I do think that block constraints make IPFS a deal-breaker for Zarr performance, however, IPLD is a really interesting protocol and I think an integration with it for a torrent-based Zarr store would be really cool to see.

Thanks for sharing the dclimate project; it's exciting and validating to see such an effort!

@Faolain
Comment options

Thanks so much for sharing and it's exciting to see that others see value in this effort! On the performance note, what are the speeds you're looking at achieving? We've seen without any optimizations 10-20mb/s when making requests to remote gateways and 260mb/s when running on local ipfs nodes. We think there are many areas for improvement around here such as divide and conquer algorithms withRAPIDE

@alxmrs
Comment options

For climate, weather, and geospatial datasets, read speeds in the ballpark of 3 GBPS are what we're shooting for:https://earthmover.io/blog/icechunk#:~:text=new%20async%20API.-,performance,-At%20this%20stage

I don't know the network characteristics of torrents, but I imagine large block sizes would help achieve this goal.

@Faolain
Comment options

Ah fair enough, this is a good and I think achievable goal. Appreciate you sharing that link and hope to have something to share in the not too distant future. For reference IPFS team members have reported in the low GB/s range with the previous numbers reported above using the IPFS public network on a home network.

Edit: With a private cluster and/or an enterprise server pipe we expect to see speeds exceeding that of S3. tl;dr your download bandwidth can be completely saturated given enough peers.

@Faolain
Comment options

I wasn't too familiar with Bittorrent w.r.t its implementation but it seems like the "block" size is essentially equivalent for torrents

Screenshot 2025-05-23 at 1 53 35 PMScreenshot 2025-05-23 at 1 53 51 PMScreenshot 2025-05-23 at 1 55 24 PM

With that said I'm not sure what the advantage would be for torrents over IPFS (given the latter already exists), or rather the drawback to IPFS necessitating a torrent store if blocksize is a concern. Open to any suggestions or ideas here though.

Comment options

The file structure of.torrent seems very Zarr-like to me:https://en.wikipedia.org/wiki/Torrent_file#File_Structure

You must be logged in to vote
0 replies
Comment options

I think aira2 may be a useful component to managing IO of torrent-based Zarr stores. It already has BitTorrent support from the docs:https://aria2.github.io/manual/en/html/aria2c.html#bittorrent-download

You must be logged in to vote
1 reply
@samapriya
Comment options

We literally had this discussion on academic torrents in our group so exciting to see this form

Comment options

FWIW, a bunch of data has been backed up on sciop.net (example) and I've been using aria2 to grab certain datasets and copy them to object storage to make them more accessible, like the NCEI Estuarine Bathymetry:https://nbviewer.org/gist/rsignell/7339b3a4aa8d39eff4cd766e127aa77e

You must be logged in to vote
1 reply
@alxmrs
Comment options

Really interesting Rich! The sciop.net catalog is really validating. Further, from reading your notebook, I wonder if what we need is not a new Zarr store by a torrent-based filesystem (say, in fsspec).

Comment options

Would this torrent approach work hand-in-hand with Zarr stores that get extended (i.e.:Modifying existing Zarr stores)? Would that break the integrity of the torrent shared between peers?

You must be logged in to vote
3 replies
@alxmrs
Comment options

Sorry I missed this. Extending Zarr stores with torrents for p2p distribution is an interesting idea. Do I understand you correctly? Can you say more here?

@fmigneault
Comment options

No problem. I am trying to understand if usingTorrent ZarrStore would imply that the data represented this way would be inherently read-only / constant, given that modifying it (e.g.: incrementally adding new variables at later time) would change the.torrent metadata and pieces it contains, therefore making it a "different torrent"?

@alxmrs
Comment options

I see what you mean. I am hoping for the Zarr stores to be writeable, but optimized for read access. It's possible that the torrent format is not amenable to writes/updates. If so, thanks for calling this out early.

That said, I bet there could be someinteresting tricks for constancy and mutability under distributed replication.

Comment options

I just got a tip from my boss: Torrent Zarr stores may have performance advantages related tohigher available bandwidth. See the "transfer speed" section ofhttps://web.archive.org/web/20090613010209/https://daniel.haxx.se/docs/bittorrent-vs-http.html

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Ideas
Labels
None yet
6 participants
@alxmrs@alexgleith@samapriya@Faolain@fmigneault@rsignell

[8]ページ先頭

©2009-2025 Movatter.jp