Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat: cancel stuck pending jobs#17803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ibetitsmike merged 24 commits intomainfrommike/16488-cancel-stuck-pending-jobs
May 20, 2025

Conversation

ibetitsmike
Copy link
Contributor

Closes:#16488

@ibetitsmikeibetitsmike marked this pull request as ready for reviewMay 13, 2025 18:46
@matifali
Copy link
Member

Does this also solve#12331?

@deansheather
Copy link
Member

Does this also solve#12331?

@matifali Partially, but it shouldn't be closed until there's also a way for users to cancel these jobs. See#17791

matifali reacted with thumbs up emoji

Comment on lines 2319 to 2322
if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceProvisionerJobs); err != nil {
return nil, err
}
return q.db.GetProvisionerJobsByIDsWithQueuePosition(ctx, ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These should really be scoped. If users are inside orgs, even if they areorg-admins they should not be able to read across org boundaries.

The organization boundaries have to be kept.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right now it's impossible to check this without extending the database that would link ProvisionerJobs to owners.

I'll do that in a separate PR.

@@ -503,7 +504,7 @@ func ReloadBuiltinRoles(opts *RoleOptions) {
// the ability to create templates and provisioners has
// a lot of overlap.
ResourceProvisionerDaemon.Type: {policy.ActionCreate, policy.ActionRead, policy.ActionUpdate, policy.ActionDelete},
ResourceProvisionerJobs.Type: {policy.ActionRead},
ResourceProvisionerJobs.Type: {policy.ActionRead, policy.ActionUpdate, policy.ActionCreate},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wouldn't regular members need this permission too? To create the jobs for their workspaces?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

They would, but I'm not sure I want to polute the RBAC with assigning members access to all provisioner jobs.

We currently don't check this at all. I'll introduce another PR that separetly focuses on solving this issue.

@ibetitsmikeibetitsmikeforce-pushed themike/16488-cancel-stuck-pending-jobs branch from635218d toc03bfa3CompareMay 19, 2025 07:57
// TODO: Remove this once we have a proper rbac check for provisioner jobs.
// Currently ProvisionerJobs are not associated with a user, so we can't
// check for a user's permissions. We'd need to check for the associated workspace
// and verify ownership through that.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The comment should probably mention all the ways a job could be made and what the permission check should look like. I think the following is correct:

  • template_version_import jobs require eithertemplate_version:create on the org (if creating a new template) ortemplate:update andtemplate_version:create on the template (if pushing a new version)
  • template_version_dry_run jobs require permissions toworkspace:create in the org as well astemplate_version:read on the specific template version
  • workspace_build jobs requireworkspace:update on the specific workspace as well astemplate_version:read on the specific template version

I just don't want the comment to only say that workspace ownership needs to be checked, and for some poor soul to start working on this thinking it'll be simple when in reality it will probably be difficult.

Likely should have a ticket created to track this too. You might want to add a comment here in the code with the ticket number after you create it.

johnstcn and ibetitsmike reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Left a simple description with link to the issue to not duplicate information.

Comment on lines +1290 to +1296
Name: "JobReaper Detect Interval",
Description: "Interval to poll for hungand pendingjobs and automatically terminate them.",
Flag: "job-hang-detector-interval",
Env: "CODER_JOB_HANG_DETECTOR_INTERVAL",
Hidden: true,
Default: time.Minute.String(),
Value: &c.JobHangDetectorInterval,
Value: &c.JobReaperDetectorInterval,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Follow-up suggestion, non-blocking: there is now a disconnect between our internal naming ("job reaper") and the external flags ("hang detector"). We may wish to deprecate the old env in future to avoid confusion.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As agreed - leaving as-is for now. The description & name reflect this, while changing Flag & Env will break existing setups.

johnstcn reacted with thumbs up emoji
Comment on lines +311 to +320

// If the job was never started (pending), set the StartedAt time to the current
// time so that the build duration is correct.
if job.JobStatus == database.ProvisionerJobStatusPending {
job.StartedAt = sql.NullTime{
Time: now,
Valid: true,
}
}
err = db.UpdateProvisionerJobWithCompleteWithStartedAtByID(ctx, database.UpdateProvisionerJobWithCompleteWithStartedAtByIDParams{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

potential follow-up: The fact that we need to worry about this implies to me that we need a separate manager for provisioner job transitions similar to thewsbuild package.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could be, are there any more scenarios we'd like to validate and put in a new issue?

Copy link
Member

@johnstcnjohnstcnMay 19, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I did up a table of all of the possible provisioner job states based on the four fields we use:

#completed_aterrorcanceled_atstarted_atstatuscomment
1NULLNULLNULLNULLpending
2NULLNULLNULLtimestamprunning
3NULLNULLtimestampNULLcanceling
4NULLNULLtimestamptimestampcanceling
5NULLtextNULLNULLfailed
6NULLtextNULLtimestampfailedHow can a job have failed if it never completed?
7NULLtexttimestampNULLfailedHow can a job have failed it it never started?
8NULLtexttimestamptimestampfailedHow can a job have failed if it never completed?
9timestampNULLNULLNULLsucceededHow can a job have succeeded if it never started?
10timestampNULLNULLtimestampsucceeded
11timestampNULLtimestampNULLcanceled
12timestampNULLtimestamptimestampcanceled
13timestamptextNULLNULLfailedHow can a job have failed if it never started?
14timestamptextNULLtimestampfailed
15timestamptexttimestampNULLfailedHow can a job have failed if it never started?
16timestamptexttimestamptimestampfailed

It'sprobably definitely out of scope of this PR, but I think there's a case to be made for drawing a state transition digram of provisioner jobs and figuring out which ones should never be possible. The ones I've commented on are my best guess at present.

Copy link
Member

@johnstcnjohnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some other nits, but I don't need to review again. Thanks for looking into this!

@ibetitsmikeibetitsmike merged commit769c9ee intomainMay 20, 2025
35 of 37 checks passed
@ibetitsmikeibetitsmike deleted the mike/16488-cancel-stuck-pending-jobs branchMay 20, 2025 13:22
@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsMay 20, 2025
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@EmyrkEmyrkEmyrk left review comments

@johnstcnjohnstcnjohnstcn approved these changes

@deansheatherdeansheatherdeansheather approved these changes

Assignees

@ibetitsmikeibetitsmike

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Permanently enqueued provisioner jobs affect the position in queue
5 participants
@ibetitsmike@matifali@deansheather@johnstcn@Emyrk

[8]ページ先頭

©2009-2025 Movatter.jp