Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

All-or-nothing generation of multiple Groups/Arrays/.. with zarr: Possible Approaches?#3094

FabricioArendTorres started this conversation inGeneral
Discussion options

Hi everyone,

Context

I'm working on a data format built on top off zarr / using zarr for array and meta data storage.
Part of this involves the creation of a zarr hierarchy, writing arrays, metadata etc.
I would like to bundle them up into transactions, to ensure that I never arrive at an inconsistent state on failure (e.g. system crash).

Question

I was wondering on how to approach that in zarr?

Options I considered:

  1. Build node somewhere else, then move
    The V2 copy implementations do not seem to be atomic on a quick glance,
    so first building up the group node in a tmp group and then moving it does not resolve the issue.
    I assume this is also highly dependent on the store.

  2. Create Locks in zarr store
    Another approach would be the creation of lock files.
    My first idea would be to create temporary groups as lock files.
    Is there a nicer approach for a store-agnostic locking?
    I would like to avoid messing with the lower-level zarr metadata.

Other potential issues?

I guess in all cases I might run into issues with the async approach of v3, or have to force synchronization for the transactions.
Happy to hear any opinions on this.

Thank you and best regards,
Fabricio

You must be logged in to vote

Replies: 1 comment 1 reply

Comment options

Icechunk, which builds on top of zarr, provides exactly this.

Create Locks in zarr store

This might not be an option, depending on what file / object storage system you're using. Object stores like S3 don't provide multi-object, atomic updates so Zarr alone isn't enough. Consolidated metadata can help with a subset of use cases where you only ever append new data (since the update to the arrays can be done ahead of time and the update to the consolidated metadata file is atomic).

But icechunk is probably the way to go.

You must be logged in to vote
1 reply
@FabricioArendTorres
Comment options

Oh, icechunk seems really cool and very fitting for that, thank you!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
2 participants
@FabricioArendTorres@TomAugspurger

[8]ページ先頭

©2009-2025 Movatter.jp