Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: send prebuild job notification after job build db commit#20693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
ssncferreira wants to merge1 commit intomain
base:main
Choose a base branch
Loading
fromssncferreira/fix-prebuild-job-notification-race

Conversation

@ssncferreira
Copy link
Contributor

@ssncferreirassncferreira commentedNov 10, 2025
edited
Loading

Problem

Fix race condition in prebuilds reconciler. Previously, a job notification event was sent inside the provisioning database transaction. This means that if a provisioner daemon received the notification and queried for the job before the transaction commits, it won't find the job in the database.

This manifested as a flake failure inTestReinitializeAgent, where provisioners would occasionally miss newly created jobs.

The solution consists in moving the job notification outside the database transaction to ensure it's only sent after the transaction commits.

Changes

  • Theprovision() andprovisionDelete() functions now return the provisioner job instead of sending notifications internally.
  • A newpublishProvisionerJob() helper centralizes the notification logic and is called after each transaction completes.

Closes:coder/internal#963

@ssncferreirassncferreiraforce-pushed thessncferreira/fix-prebuild-job-notification-race branch 5 times, most recently fromc31db3d to2c77682CompareNovember 10, 2025 13:48
@ssncferreirassncferreiraforce-pushed thessncferreira/fix-prebuild-job-notification-race branch from2c77682 tof161c82CompareNovember 10, 2025 14:28
@ssncferreirassncferreira marked this pull request as ready for reviewNovember 10, 2025 14:33
Comment on lines +746 to +748
// Publish provisioner job event to notify the acquirer that a new job was posted
c.publishProvisionerJob(ctx,provisionerJob,prebuiltWorkspaceID)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should we only do this ifprovisionerJob is not nil?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

provisionerJob should never benil, we already check it inprovision and return an error in that case:

ifprovisionerJob==nil {
// This should not happen, builder.Build() should either return a job or an error.
// Returning an error to fail fast if we hit this unexpected case.
returnnil,xerrors.Errorf("provision succeeded but returned no job")
}

Nevertheless, I'm also checking inpublishProvisionerJob if the job isnil, just in case:

ifprovisionerJob==nil {
return
}

// Publish provisioner job events to notify the acquirer that new jobs were posted
forworkspaceID,job:=rangeprovisionerJobs {
c.publishProvisionerJob(ctx,job,workspaceID)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If the same channel name is signaled multiple times with identical payload strings within the same transaction, only one instance of the notification event is delivered to listeners.

Link

We are now publishing these job events outside of the transaction, so we don't get this automatic de-duping. I'm not sure if this will cause issues with larger deployments?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIK, we were never publishing the events within the transaction. Previously, we would send the job to theprovisionNotifyCh channel, which would then publish the job outside the transactionhttps://github.com/coder/coder/blob/main/enterprise/coderd/prebuilds/reconcile.go#L164
Therefore, this deduplication was not happening in the previous implementation as well. The only difference here is that we guarantee that we only publish the job after the transaction commits, ensuring provisioners can see the job when they query the database.

Also, there seems to be a rule to avoid calling publish within a transaction:https://github.com/coder/coder/blob/8274251f/scripts/rules.go#L143

Nevertheless, this is a good point about deduping. Because it seems the job is published as (link):

msg, err := json.Marshal(JobPosting{    OrganizationID:  job.OrganizationID,    ProvisionerType: job.Provisioner,    Tags:            job.Tags,})

Meaning that all of these canceled jobs would have the same payload. So theoretically, we only need to send one job event, right? 🤔

Copy link
Member

@johnstcnjohnstcnNov 10, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Therefore, this deduplication was not happening in the previous implementation as well.

Fair enough 👍 Just wanted to make sure this was called out.

Also, there seems to be a rule to avoid calling publish within a transaction:https://github.com/coder/coder/blob/8274251f/scripts/rules.go#L143

I completely forgot about that 😁 good callout!

Meaning that all of these canceled jobs would have the same payload. So theoretically, we only need to send one job event, right? 🤔

That's how I understand it, yes!provisionerdserver.Acquirer listens to allprovisioner_job_posted events and 'wakes up' its provisioner if the tags match.

EDIT: now that I think about it more, I'm not sure that pubsub events actually impact canceled jobs.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@johnstcnjohnstcnjohnstcn left review comments

@SasSwartSasSwartAwaiting requested review from SasSwart

At least 1 approving review is required to merge this pull request.

Assignees

@ssncferreirassncferreira

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

flake: TestReinitializeAgent/#0

3 participants

@ssncferreira@johnstcn

[8]ページ先頭

©2009-2025 Movatter.jp