Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: reduce cost of prebuild failure#17697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
evgeniy-scherbina merged 51 commits intomainfrom17432-limit-prebuild-failure-cost
May 21, 2025

Conversation

evgeniy-scherbina
Copy link
Contributor

@evgeniy-scherbinaevgeniy-scherbina commentedMay 6, 2025
edited
Loading

Relates to#17432

Part 1:

Notes:

  • GetPresetsAtFailureLimit SQL query is added, which is similar toGetPresetsBackoff, they use same CTEs:filtered_builds,time_sorted_builds, but they are still different.

  • Query is executed on every loop iteration. We can consider marking specific preset as permanently failed as an optimization to avoid executing query on every loop iteration. But I decided don't do it for now.

  • By defaultFailureHardLimit is set to 3.

  • FailureHardLimit is configurable. Setting it to zero - means that hard limit is disabled.

Part 2

Notes:

  • PrebuildFailureLimitReached notification is added.
  • Notification is sent to template admins.
  • Notification is sent only the first time, when hard limit is reached. But it willlog.Warn on every loop iteration.
  • I introduced this enum:
CREATETYPEprebuild_statusAS ENUM ('normal',-- Prebuilds are working as expected; this is the default, healthy state.'hard_limited',-- Prebuilds have failed repeatedly and hit the configured hard failure limit; won't be retried anymore.'validation_failed'-- Prebuilds failed due to a non-retryable validation error (e.g. template misconfiguration); won't be retried.);

validation_failed not used in this PR, but I think it will be used in next one, so I wanted to save us an extra migration.

  • Notification looks like this:
image

Latest notification views:

imageimage

@evgeniy-scherbinaevgeniy-scherbinaforce-pushed the17432-limit-prebuild-failure-cost branch from1e9c551 tofcb3e5dCompareMay 9, 2025 21:40
@evgeniy-scherbinaevgeniy-scherbina marked this pull request as ready for reviewMay 9, 2025 21:40
Copy link
Contributor

@dannykoppingdannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Query looks good, we need more visibility of this problem and its remediation, though.

func (q *querier) GetPresetsAtFailureLimit(ctx context.Context, hardLimit int64) ([]database.GetPresetsAtFailureLimitRow, error) {
// GetPresetsAtFailureLimit returns a list of template version presets that have reached the hard failure limit.
// Request the same authorization permissions as GetPresetsBackoff, since the methods are similar.
if err := q.authorizeContext(ctx, policy.ActionViewInsights, rbac.ResourceTemplate.All()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I still think "insights" is a weird auth context to use here.
Why did we choose this again@SasSwart?

Copy link
ContributorAuthor

@evgeniy-scherbinaevgeniy-scherbinaMay 12, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dannykopping
Basically we have 3 actions:

  • Read Template.All()
  • Use Template.All()
  • ViewInsights Template.All()

Read is for reading the content of template
Use is for creating workspaces
ViewInsights is for gathering stats & metrics across templates

Looks likeGetPresetsBackoff falls intoViewInsights category - because we gathering statistics across all template presets, calculating number of failed builds, etc...

I had a discussion with Steven regarding this, but I'm open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Repurposing other RBAC roles just because they overlap logically feels like a bad design choice.
Each feature should have its own RBAC policy.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can introducenew resource type here:https://github.com/coder/coder/blob/main/coderd/rbac/object_gen.go

Then use it like this:

iferr:=q.authorizeContext(ctx,policy.ActionViewInsights,rbac.ResourcePrebuilds.All());err!=nil {}

Optionally introduce newaction type as well, and use it like this:

iferr:=q.authorizeContext(ctx,policy.ActionCollectStatistics,rbac.ResourcePrebuilds.All());err!=nil {}

But I think we don't want to introduce too many action types.

cc@Emyrk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We can do this in a follow-up, by the way 👍

evgeniy-scherbina reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't have all the context for prebuilds. Can we throw a quick disc chat on the calendar?

@evgeniy-scherbinaevgeniy-scherbinaforce-pushed the17432-limit-prebuild-failure-cost branch frome88e301 to0fdd096CompareMay 12, 2025 19:26
@evgeniy-scherbinaevgeniy-scherbina changed the titlefix: limit prebuild failure costfix: reduce cost of prebuild failureMay 14, 2025
@evgeniy-scherbinaevgeniy-scherbinaforce-pushed the17432-limit-prebuild-failure-cost branch fromab90612 to5c60065CompareMay 15, 2025 18:37
Copy link
Contributor

@dannykoppingdannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking good so far, but I don't see any metric added which we previously discussed.
Given the size of the PR, it can be a follow-up.

Copy link
Contributor

@dannykoppingdannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM 👍

@evgeniy-scherbinaevgeniy-scherbina merged commit53e8e9c intomainMay 21, 2025
37 of 39 checks passed
@evgeniy-scherbinaevgeniy-scherbina deleted the 17432-limit-prebuild-failure-cost branchMay 21, 2025 19:16
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMay 21, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@EmyrkEmyrkEmyrk left review comments

@dannykoppingdannykoppingdannykopping approved these changes

@SasSwartSasSwartAwaiting requested review from SasSwart

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@evgeniy-scherbina@dannykopping@Emyrk

[8]ページ先頭

©2009-2025 Movatter.jp