- Notifications
You must be signed in to change notification settings - Fork1k
feat: allow users to pause prebuilt workspace reconciliation#18700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM so far, just missing the final piece of plumbing
…whether prebuilds are currently paused
johnstcn commentedJul 2, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
EDIT: prior art exists with |
MetricEligibleGauge=namespace+"eligible" | ||
MetricPresetHardLimitedGauge=namespace+"preset_hard_limited" | ||
MetricLastUpdatedGauge=namespace+"metrics_last_updated" | ||
MetricReconciliationPausedGauge=namespace+"reconciliation_paused" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Paused is difficult to reason about; I'd suggest inverting and make it_running
so that a good value is 1 and a bad value is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
paused
in this case has a technical meaning relating to a specific setting. If we rename this torunning
then I think we need to change the setting to be{"reconciliation_running": true}
, which doesn't communicate the intent of the API clearly.
I'd like to keep it how it currently is, but as compensatory measures I will:
- update the metric documentation to be clear about its meaning
- add a second metric to represent a general "check engine light" for prebuilds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This will be pretty handy, LGTM!
return | ||
} | ||
ifbytes.Equal(settingsJSON, []byte(currentSettingsJSON)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nit: not a problem now but when we add more settings, map ordering may cause this check to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
good catch.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
require.Len(t,workspaces,2,"should have created 2 prebuilds") | ||
// Now pause prebuilds reconciliation | ||
err=db.UpsertPrebuildsSettings(ctx,`{"reconciliation_paused": true}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd suggest going via the API here to avoid duplicating this business logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
good point
ssncferreira left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM 🎉 Thanks for creating this.
There is just a question about rbac: with this change we require the permissionrbac.ResourceDeploymentConfig
, so that means that only the owner of the resouce or a user with roleauditorRole
can upsert this setting. Should we extend it to Admins?
Uh oh!
There was an error while loading.Please reload this page.
} | ||
func (q*querier)UpsertPrebuildsSettings(ctx context.Context,valuestring)error { | ||
iferr:=q.authorizeContext(ctx,policy.ActionUpdate,rbac.ResourceDeploymentConfig);err!=nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just to clarify here on the rbac side, this means that only a user with this permission can update the setting. Fromhttps://github.com/coder/coder/blob/main/coderd/rbac/roles.go currently, only theauditorRole
has this permission. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Auditor only has read though?
Line 327 in8a69f6a
ResourceDeploymentConfig.Type: {policy.ActionRead}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, good catch! 👀 So it means that only the owner of theResourceDeploymentConfig
can update the settings, right? Should we maybe extend it to the Admins? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think there exists any other suitable role that should have permission to edit site-wide deployment config? Our choices are site-wide template admin or site-wide user admin, neither of which feel right to me. Site-wide auditor should only have read permissions, and all of the org-scoped admins are out of site-wide scope.
This might be something to explore in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, this could definitely be addressed in a separate PR. You might be right that adding the permission to admin roles isn’t the best approach. My main concern was that currently only users with theOwner
role can update this setting. In our dogfood environment, this works fine since many users have theOwner
role, but this is not the typical case.
On second thought, limiting this permission to theOwner
role does make sense, since it’s an important command and users should be fully aware of the implications when using it.
typePrebuildsSettingsstruct { | ||
ID uuid.UUID`db:"id" json:"id"` | ||
ReconciliationPausedbool`db:"reconciliation_paused" json:"reconciliation_paused"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do you think it would also be useful to add a timestamp?
ifbytes.Equal(settingsJSON, []byte(currentSettingsJSON)) { | ||
// See: https://www.rfc-editor.org/rfc/rfc7232#section-4.1 | ||
httpapi.Write(ctx,rw,http.StatusNotModified,nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: according to the RFC the 304 Not Modified is normally a response to a GET request.
Maybe a 204 No Content makes more sense in this case, wdyt?https://datatracker.ietf.org/doc/html/rfc7231#section-6.3.5
returnxerrors.Errorf("unable to pause prebuilds: %w",err) | ||
} | ||
_,_=fmt.Fprintln(inv.Stderr,"Prebuilds are now paused.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not?
_,_=fmt.Fprintln(inv.Stderr,"Prebuilds are now paused.") | |
_,_=fmt.Fprintln(inv.Stdout,"Prebuilds are now paused.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
My reasoning was that this command doesn't actually produce any output of its own.
It reports its own status, but that's more logging than output.
assert.Contains(t,buf.String(),"Prebuilds are now paused.") | ||
// Verify the settings were actually updated | ||
//nolint:gocritic // Only owners can change deployment settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why the//nolint:gocritic
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There's a rule to check that we test with non-owner roles as they're usually more appropriate. In this case, owner is the appropriate role.
err=db.UpsertPrebuildsSettings(ctx,`{"reconciliation_paused": false}`) | ||
require.NoError(t,err) | ||
// Run reconciliation again - it should now recreate the prebuilds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nice test 🤩
01163ea
intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
This PR provides two commands:
coder prebuilds pause
coder prebuilds resume
These allow the suspension of all prebuilds activity, intended for use if prebuilds are misbehaving.