- Notifications
You must be signed in to change notification settings - Fork1.1k
feat: cancel pending prebuilds from non-active template versions#20387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
f1a8e32 to9cfb23aCompare9cfb23a to29cc88eCompare73cf135 to51fa3efCompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
johnstcn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Mostly LGTM, but I'm not 100% convinced about the row locking?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
coderd/database/dbauthz/dbauthz.go Outdated
| func (q*querier)UpdatePrebuildProvisionerJobWithCancel(ctx context.Context,arg database.UpdatePrebuildProvisionerJobWithCancelParams) ([]uuid.UUID,error) { | ||
| // This is a system-only operation for canceling pending prebuild-related jobs | ||
| // when a new template version is promoted. | ||
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
note(non-blocking): just want to noterbac.ResourceSystem usage here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If we givesubjectPrebuildsOrchestrator the ability to read/update provisioner jobs, we should be able to userbac.ResourceProvisionerJob here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, permissions are a bit tricky here.
There is aResourceProvisionerJobs that we could use here, but that would mean that all users with this permissions would be abel to execute it.
The way it is done forUpdateProvisionerJobWithCancelByID, for instance, is to check theUpdate permissions on the workspace. Since it is a prebuild, we could also do aif err := q.authorizeContext(ctx, policy.ActionUpdate, rbac.ResourcePrebuiltWorkspace); err != nil { Maybe this one makes the most sense 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That's not a bad approximation actually 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Added therbac.ResourcePrebuiltWorkspace update check indcb6de1
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
ssncferreira commentedOct 22, 2025
Note: Moving this PR back to draft. After internal discussion, we've decided to move the logic to cancel pending prebuilds to the prebuilds reconciliation loop rather than handling it in the This PR will be updated to reflect these changes:
The PR will be marked ready for review once the new implementation is in place. |
…cel-pending-prebuilds
…cel-pending-prebuilds
coderd/database/dbauthz/dbauthz.go Outdated
| func (q*querier)UpdatePrebuildProvisionerJobWithCancel(ctx context.Context,arg database.UpdatePrebuildProvisionerJobWithCancelParams) ([]uuid.UUID,error) { | ||
| // This is a system-only operation for canceling pending prebuild-related jobs | ||
| // when a new template version is promoted. | ||
| iferr:=q.authorizeContext(ctx,policy.ActionRead,rbac.ResourceSystem);err!=nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If we givesubjectPrebuildsOrchestrator the ability to read/update provisioner jobs, we should be able to userbac.ResourceProvisionerJob here.
| -- Cancels all pending provisioner jobs for prebuilt workspaces on a specific preset from an | ||
| -- inactive template version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: This comment is slightly misleading, the query doesn't check that the preset relates to an inactive template version. It's currently the responsibility of the caller to verify that@preset_id is not related to the current active version ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That is a good catch! I've updated the query to filter out active template versions:0497f8b
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| funcTestCancelPendingPrebuilds(t*testing.T) { | ||
| t.Parallel() | ||
| for_,tt:=range []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit, non-blocking: Since these tests all boil down to "only these jobs should be canceled and nothing else", I think we could collapse this to a single test case with all of the "should not cancel" data in one. The rationale for this is twofold:
- It reduces the number of test databases we have to create
- It gives us better assurance that only the jobs we care about are being canceled in the presence of more antagonist data.
SasSwart left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
One nit/question. Otherwise, looks good!
Uh oh!
There was an error while loading.Please reload this page.
f6e86c6 intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Description
This PR introduces an optimization to automatically cancel pending prebuild-related jobs from non-active template versions in the reconciliation loop.
Problem
Currently, when a template is configured with more prebuild instances than available provisioners, the provisioner queue can become flooded with pending prebuild jobs. This issue is worsened when provisioning/deprovisioning operations take a long time.
When the prebuild reconciliation loop generates jobs faster than provisioners can process them, pending jobs accumulate in the queue. Since prebuilt workspaces should always run the latest active template version, pending prebuild jobs from non-active versions become obsolete once a new version is promoted.
Solution
The reconciliation loop cancels pending prebuild-related jobs from non-active template versions that match the following criteria:
pendingworker_idisNULL)startThis prevents the queue from being cluttered with stale prebuild jobs that would provision workspaces on an outdated template version that would consequently need to be deprovisioned.
Changes
CountPendingNonActivePrebuildsto identify presets with pending jobs from non-active versionsUpdatePrebuildProvisionerJobWithCancelto cancel jobs for a specific presetActionTypeCancelPendinghandles the cancellation logicFollow-up PR
Canceling pending prebuild jobs leaves workspaces in a Canceled state. While no Terraform resources need to be destroyed (since jobs were canceled before provisioning started), these database records should still be cleaned up. This will be addressed in a follow-up PR.
Closes:#20242