Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] Fix race condition in concurrent artifact additions#27574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
rhatdan wants to merge1 commit intocontainers:main
base:main
Choose a base branch
Loading
fromrhatdan:artifact

Conversation

@rhatdan
Copy link
Member

@rhatdanrhatdan commentedNov 20, 2025
edited
Loading

This fixes a race condition where concurrent 'podman artifact add' commands for different artifacts would result in only one artifact being created, without any error messages.

The root cause was in the artifact store's Add() method, which would:

  1. Acquire lock
  2. Read OCI layout index
  3. Create ImageDestination (which snapshots the index)
  4. RELEASE lock (optimization for blob copying)
  5. Copy blobs (while unlocked)
  6. Reacquire lock
  7. Commit changes (write new index)

When two concurrent additions happened:

  • Process A: Lock → Read index → Create dest A → Unlock → Copy blobs
  • Process B: Lock → Read index (no artifact A!) → Create dest B → Unlock
  • Process A: Lock → Commit (write index with A)
  • Process B: Lock → Commit (write index with B, OVERWRITING A)

The fix keeps the lock held for the entire operation. While this reduces concurrency for blob copying, it prevents the index file corruption that caused artifacts to be lost.

Changes:

  • Remove lock release/reacquire around blob copying in store.Add()
  • Simplify lock management (no more conditional unlock)
  • Add e2e test for concurrent artifact additions
  • Add standalone test script to verify the fix

Fixes:#27569

Generated-with: Cursor AI

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, usegit commit -s --amend). The author email must match
    the sign-off email address. SeeCONTRIBUTING.md
    for more information.
  • Referenced issues usingFixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits passmake validatepr (format/lint checks)
  • Release note entered in the section below (orNone if no user-facing changes)

Does this PR introduce a user-facing change?

podman artifact add can now run without race conditions on adds.

@openshift-ci
Copy link
Contributor

[APPROVALNOTIFIER] This PR isNOT APPROVED

This pull-request has been approved by:rhatdan
Once this PR has been reviewed and has the lgtm label, please assignmtrmac for approval. For more information seethe Code Review Process.

The full list of commands accepted by this bot can be foundhere.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing/approve in a comment
Approvers can cancel approval by writing/approve cancel in a comment

@baude
Copy link
Member

@rhatdan you know you cannot do this in podman, it has to be done in common

@packit-as-a-service
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@mheon
Copy link
Member

This is going to badly hurt usability - artifact addition on large artifacts can be slow, and now it's going to block all other artifact operations. The locking@Luap99 is doing for pulls could potentially be adapted to improve matters, but the amount of effort he had to put in to get that working was substantial, and this is a different enough codepath that I don't think much would carry over from that work.

@mheon
Copy link
Member

Not that we shouldn't merge this, once it's migrated to container-libs... Just that it's going to suck once we do.

Not sure how I feel about the test, honestly. Is the race 100% consistent on main? I don't like deterministic tests, if it doesn't fail with 100% consistency, we can break without realizing it, and it just shows up as a flake after the fact.

@Luap99
Copy link
Member

This is going to badly hurt usability - artifact addition on large artifacts can be slow, and now it's going to block all other artifact operations. The locking@Luap99 is doing for pulls could potentially be adapted to improve matters, but the amount of effort he had to put in to get that working was substantial, and this is a different enough codepath that I don't think much would carry over from that work.

Yeah this is going to suck. IMO the design to use the oci layout as backing store was not to great as we funnel all updates thought the index.json writes where in many cases a more parallel approach could have been possible I suppose. The content addressed sha based file store does make sense for me overall but having the metdata in this one file is causing to much contention and since we just slapped locking on top without fine grade locking around index.json we are in this situation now.

Like in many other parts of our codebase when doing state updates once must take the lock, then read the state from disk, then update state and write it back to disk and only then unlock.

So what we did wrong here is that the state was no loaded back from disk as such we wrote an older in memory kept copy back to disk, so the actual fix should be to re-read the store which is more or less what my unlocked layer add work in storages comes down to.


we do have a some parallel testing for a few things, i.e. this one

@test"podman parallel build should not race" {

I don't mind such tests as long as they actually somewhat reliable reproduce the current issue and it help to safe guard again other bugs in the future. If this actually do flakes we know the fix didn't work and can always disable the test until we can fix the code.

@rhatdan
Copy link
MemberAuthor

@rhatdan you know you cannot do this in podman, it has to be done in common

Cursor, did it, I will take a look and move it to container-libs

This fixes a race condition where concurrent 'podman artifact add'commands for different artifacts would result in only one artifactbeing created, without any error messages.The root cause was in the artifact store's Add() method, whichwould:1. Acquire lock2. Read OCI layout index3. Create ImageDestination (which snapshots the index)4. RELEASE lock (optimization for blob copying)5. Copy blobs (while unlocked)6. Reacquire lock7. Commit changes (write new index)When two concurrent additions happened:- Process A: Lock → Read index → Create dest A → Unlock → Copy blobs- Process B: Lock → Read index (no artifact A!) → Create dest B → Unlock- Process A: Lock → Commit (write index with A)- Process B: Lock → Commit (write index with B, OVERWRITING A)The fix keeps the lock held for the entire operation. While thisreduces concurrency for blob copying, it prevents the index filecorruption that caused artifacts to be lost.Changes:- Remove lock release/reacquire around blob copying in store.Add()- Simplify lock management (no more conditional unlock)- Add e2e test for concurrent artifact additions- Add standalone test script to verify the fixFixes:containers#27569Generated-with: Cursor AISigned-off-by: Daniel J Walsh <dwalsh@redhat.com>
@rhatdanrhatdan changed the titleFix race condition in concurrent artifact additions[WIP] Fix race condition in concurrent artifact additionsNov 21, 2025
@openshift-ciopenshift-cibot added the do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress. labelNov 21, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

do-not-merge/work-in-progressIndicates that a PR should not merge because it is a work in progress.release-note

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Running two podman artifact adds at the same time, fails to create both artifacts with no errors.

4 participants

@rhatdan@baude@mheon@Luap99

[8]ページ先頭

©2009-2025 Movatter.jp