Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

docs: add troubleshooting steps for prebuilt workspaces#20231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
SasSwart wants to merge4 commits intomain
base:main
Choose a base branch
Loading
fromjjs/coder-19490

Conversation

SasSwart
Copy link
Contributor

@SasSwartSasSwart commentedOct 9, 2025
edited
Loading

This PR adds troubleshooting steps to guide Coder operators when they suspect that prebuilds might have overwhelmed their deployments.

Relates to#19490

This PR does not yet close the issue entirely. We still need to document how to detect and clean orphaned resources that occur when running prebuild jobs are cancelled.

@SasSwartSasSwart marked this pull request as ready for reviewOctober 9, 2025 08:32
Copy link
Contributor

@ssncferreirassncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice work 🚀 Should we also mention that users can tune theCODER_PREBUILDS_RECONCILIATION_INTERVAL to manage how frequently the prebuild reconciliation loop runs? That might help reduce the load from frequent reconciliations. Wdyt?

Comment on lines +254 to +255
1.**Organic overload**: Not enough provisioners to meet the deployment's needs
2.**Broken template**: A template that mistakenly requests too many prebuilt workspaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the issue here is actually a combination of these two factors: there aren’t enough resources to handle the high demand from prebuild-related provisioner jobs. This problem can be further amplified when those jobs take a long time to complete.

Additionally, might be worth explanation an additional scenario when a user creates a new template version (a user-initiated job), once this is processed and the prebuild reconciliation loop runs, it adds even more load by scheduling new prebuild-related jobs. This means the queue could now include jobs for both template version 1 and version 2.


If your Coder deployment is exhibiting the above symptoms, follow these instructions to verify and then rectify the situation:

First, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe having this numbered would help?

Suggested change
First, run:
1) Pause prebuilt workspace reconciliation

coder prebuilds pause
```

This prevents further pollution of your provisioner queues by stopping the prebuilt workspaces feature from scheduling new creation jobs. Jobs that have already been enqueued will still be processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe worth adding a note that this will pause prebuilds system-wide, not just organization-wide


This will show a list of all pending jobs that have been enqueued by the prebuilt workspace system. The length of this list indicates whether prebuilt workspaces have overwhelmed your Coder deployment.

Human-initiated jobs have priority over pending prebuild jobs, but running prebuild jobs cannot be preempted. A long list of pending prebuild jobs increases the likelihood that all provisioners are already occupied when a user wants to create a workspace. This increases the likelihood that users will experience delays waiting for the next available provisioner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice 👍


Human-initiated jobs have priority over pending prebuild jobs, but running prebuild jobs cannot be preempted. A long list of pending prebuild jobs increases the likelihood that all provisioners are already occupied when a user wants to create a workspace. This increases the likelihood that users will experience delays waiting for the next available provisioner.

To ensure that the next available provisioner will be given to a human-initiated job, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I’m not sure this sentence is entirely accurate. Since human-initiated jobs already have priority over prebuild-related jobs, the next available provisioner will automatically be assigned a human-initiated job if there is one. The purpose of this behavior is to help clear the queue and prevent situations where all provisioner daemons are occupied with prebuild-related jobs, which could delay human-initiated ones.

To ensure that the next available provisioner will be given to a human-initiated job, run:

```bash
coder provisionerjobs list --status=pending --initiator=prebuilds| jq -r'.[].id'| xargs -n1 -P2 -I{} coder provisionerjobs cancel {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIU, this command won’t actually print the list of jobs — it will pipe them directly into jq. I think it would be useful to show the list of jobs first, so users can review them before deciding to cancel. That way, they could choose to cancel only a subset of prebuilds if needed.

Wouldn’t it make more sense for coder provisioner jobs cancel to accept a list of job IDs?
Right now, we don’t support cancelling multiple jobs simultaneously (either through the CLI or the dashboard), so adding that capability would be a nice improvement.


At this stage, most prebuild related impact will have been mitigated. There may still be a bugged template version, but it will no longer pollute provisioner queues with prebuilt workspace jobs. If the latest version of a template is also broken for reasons unrelated to prebuilds, then users are able to create workspaces using a previous template version. Some running jobs may have been initiated by the prebuild system, but these cannot be cancelled without potentially orphaning resources that have already been deployed by Terraform. Depending on your deployment and template provisioning times, it might be best to upload a new template version and wait for it to be processed organically.

If you need to expedite the processing of human-related jobs at the cost of some infrastructure housekeeping, you can run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be good to include a warning about the infrastructure housekeeping implications here, and clarify that this command should generally be used as a last resort.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ssncferreirassncferreirassncferreira left review comments

At least 1 approving review is required to merge this pull request.

Assignees

@SasSwartSasSwart

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@SasSwart@ssncferreira

[8]ページ先頭

©2009-2025 Movatter.jp