Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork366
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Coming back from Cloud Native Geo, a major theme was how to build resilient data infrastructure (or ecosystem). A common idea that came up in many discussion groups was could we somehow store scientific data via bit-torrent. I don't think this idea seems too far fetched; we simply need to figure out how the Zarr protocol p2p torrent protocols align. Projects likehttps://academictorrents.com/ (https://github.com/academictorrents) seem to validate that a peer-to-peer, public, global store of data is highly desirable – especially in a time when critical medical and environmental data is potentially being lost. This repo in their project, in particular, seems very xpublish shaped!https://github.com/academictorrents/torrentify. A deeper look into the intersection of these protocols is warranted right now (maybe, even urgently). Update:@samapriya is definitely one of the people at the conference who first brought up this idea! |
BetaWas this translation helpful?Give feedback.
All reactions
🎉 2👀 1
Replies: 6 comments 13 replies
-
Is IPFS a plausible alternative? |
BetaWas this translation helpful?Give feedback.
All reactions
-
I do think that block constraints make IPFS a deal-breaker for Zarr performance, however, IPLD is a really interesting protocol and I think an integration with it for a torrent-based Zarr store would be really cool to see. Thanks for sharing the dclimate project; it's exciting and validating to see such an effort! |
BetaWas this translation helpful?Give feedback.
All reactions
-
Thanks so much for sharing and it's exciting to see that others see value in this effort! On the performance note, what are the speeds you're looking at achieving? We've seen without any optimizations 10-20mb/s when making requests to remote gateways and 260mb/s when running on local ipfs nodes. We think there are many areas for improvement around here such as divide and conquer algorithms withRAPIDE |
BetaWas this translation helpful?Give feedback.
All reactions
-
For climate, weather, and geospatial datasets, read speeds in the ballpark of 3 GBPS are what we're shooting for:https://earthmover.io/blog/icechunk#:~:text=new%20async%20API.-,performance,-At%20this%20stage I don't know the network characteristics of torrents, but I imagine large block sizes would help achieve this goal. |
BetaWas this translation helpful?Give feedback.
All reactions
🚀 1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Ah fair enough, this is a good and I think achievable goal. Appreciate you sharing that link and hope to have something to share in the not too distant future. For reference IPFS team members have reported in the low GB/s range with the previous numbers reported above using the IPFS public network on a home network. Edit: With a private cluster and/or an enterprise server pipe we expect to see speeds exceeding that of S3. tl;dr your download bandwidth can be completely saturated given enough peers. |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
BetaWas this translation helpful?Give feedback.
All reactions
🚀 1
-
The file structure of |
BetaWas this translation helpful?Give feedback.
All reactions
-
I think aira2 may be a useful component to managing IO of torrent-based Zarr stores. It already has BitTorrent support from the docs:https://aria2.github.io/manual/en/html/aria2c.html#bittorrent-download |
BetaWas this translation helpful?Give feedback.
All reactions
-
We literally had this discussion on academic torrents in our group so exciting to see this form |
BetaWas this translation helpful?Give feedback.
All reactions
❤️ 1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
FWIW, a bunch of data has been backed up on sciop.net (example) and I've been using aria2 to grab certain datasets and copy them to object storage to make them more accessible, like the NCEI Estuarine Bathymetry:https://nbviewer.org/gist/rsignell/7339b3a4aa8d39eff4cd766e127aa77e |
BetaWas this translation helpful?Give feedback.
All reactions
🎉 1
-
Really interesting Rich! The sciop.net catalog is really validating. Further, from reading your notebook, I wonder if what we need is not a new Zarr store by a torrent-based filesystem (say, in fsspec). |
BetaWas this translation helpful?Give feedback.
All reactions
-
Would this torrent approach work hand-in-hand with Zarr stores that get extended (i.e.:Modifying existing Zarr stores)? Would that break the integrity of the torrent shared between peers? |
BetaWas this translation helpful?Give feedback.
All reactions
-
Sorry I missed this. Extending Zarr stores with torrents for p2p distribution is an interesting idea. Do I understand you correctly? Can you say more here? |
BetaWas this translation helpful?Give feedback.
All reactions
-
No problem. I am trying to understand if usingTorrent ZarrStore would imply that the data represented this way would be inherently read-only / constant, given that modifying it (e.g.: incrementally adding new variables at later time) would change the |
BetaWas this translation helpful?Give feedback.
All reactions
-
I see what you mean. I am hoping for the Zarr stores to be writeable, but optimized for read access. It's possible that the torrent format is not amenable to writes/updates. If so, thanks for calling this out early. That said, I bet there could be someinteresting tricks for constancy and mutability under distributed replication. |
BetaWas this translation helpful?Give feedback.
All reactions
👍 1
-
I just got a tip from my boss: Torrent Zarr stores may have performance advantages related tohigher available bandwidth. See the "transfer speed" section ofhttps://web.archive.org/web/20090613010209/https://daniel.haxx.se/docs/bittorrent-vs-http.html |
BetaWas this translation helpful?Give feedback.


