- Notifications
You must be signed in to change notification settings - Fork905
fix: reduce cost of prebuild failure#17697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
ab27056
to3ef469a
Compare1e9c551
tofcb3e5d
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Query looks good, we need more visibility of this problem and its remediation, though.
func (q *querier) GetPresetsAtFailureLimit(ctx context.Context, hardLimit int64) ([]database.GetPresetsAtFailureLimitRow, error) { | ||
// GetPresetsAtFailureLimit returns a list of template version presets that have reached the hard failure limit. | ||
// Request the same authorization permissions as GetPresetsBackoff, since the methods are similar. | ||
if err := q.authorizeContext(ctx, policy.ActionViewInsights, rbac.ResourceTemplate.All()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I still think "insights" is a weird auth context to use here.
Why did we choose this again@SasSwart?
evgeniy-scherbinaMay 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@dannykopping
Basically we have 3 actions:
Read
Template.All()Use
Template.All()ViewInsights
Template.All()
Read
is for reading the content of templateUse
is for creating workspacesViewInsights
is for gathering stats & metrics across templates
Looks likeGetPresetsBackoff
falls intoViewInsights
category - because we gathering statistics across all template presets, calculating number of failed builds, etc...
I had a discussion with Steven regarding this, but I'm open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Repurposing other RBAC roles just because they overlap logically feels like a bad design choice.
Each feature should have its own RBAC policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can introducenew resource type
here:https://github.com/coder/coder/blob/main/coderd/rbac/object_gen.go
Then use it like this:
iferr:=q.authorizeContext(ctx,policy.ActionViewInsights,rbac.ResourcePrebuilds.All());err!=nil {}
Optionally introduce newaction type
as well, and use it like this:
iferr:=q.authorizeContext(ctx,policy.ActionCollectStatistics,rbac.ResourcePrebuilds.All());err!=nil {}
But I think we don't want to introduce too many action types.
cc@Emyrk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We can do this in a follow-up, by the way 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't have all the context for prebuilds. Can we throw a quick disc chat on the calendar?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
e88e301
to0fdd096
Compareab90612
to5c60065
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looking good so far, but I don't see any metric added which we previously discussed.
Given the size of the PR, it can be a follow-up.
coderd/database/migrations/000328_prebuild_failure_limit_notification.up.sql OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
coderd/database/migrations/000328_prebuild_failure_limit_notification.up.sql OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
coderd/database/migrations/000328_prebuild_failure_limit_notification.up.sqlShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
coderd/database/migrations/000328_prebuild_failure_limit_notification.up.sql OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
coderd/database/migrations/000329_add_status_to_template_presets.up.sql OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM 👍
coderd/database/migrations/000328_prebuild_failure_limit_notification.up.sqlShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
53e8e9c
intomainUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Relates to#17432
Part 1:
Notes:
GetPresetsAtFailureLimit
SQL query is added, which is similar toGetPresetsBackoff
, they use same CTEs:filtered_builds
,time_sorted_builds
, but they are still different.Query is executed on every loop iteration. We can consider marking specific preset as permanently failed as an optimization to avoid executing query on every loop iteration. But I decided don't do it for now.
By default
FailureHardLimit
is set to 3.FailureHardLimit
is configurable. Setting it to zero - means that hard limit is disabled.Part 2
Notes:
PrebuildFailureLimitReached
notification is added.log.Warn
on every loop iteration.validation_failed
not used in this PR, but I think it will be used in next one, so I wanted to save us an extra migration.Latest notification views: