Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

docs: add troubleshooting steps for prebuilt workspaces#20231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
SasSwart wants to merge4 commits intomain
base:main
Choose a base branch
Loading
fromjjs/coder-19490
Open
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -247,6 +247,71 @@ When prebuilt workspaces are configured for an organization, Coder creates a "pr

If a quota is exceeded, the prebuilt workspace will fail provisioning the same way other workspaces do.

### Managing prebuild provisioning queues

Prebuilt workspaces can overwhelm a Coder deployment, causing significant delays when users and template administrators attempt to create new workspaces or manage their templates. This can happen in two scenarios:

1. **Organic overload**: Not enough provisioners to meet the deployment's needs
2. **Broken template**: A template that mistakenly requests too many prebuilt workspaces
Comment on lines +254 to +255
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the issue here is actually a combination of these two factors: there aren’t enough resources to handle the high demand from prebuild-related provisioner jobs. This problem can be further amplified when those jobs take a long time to complete.

Additionally, might be worth explanation an additional scenario when a user creates a new template version (a user-initiated job), once this is processed and the prebuild reconciliation loop runs, it adds even more load by scheduling new prebuild-related jobs. This means the queue could now include jobs for both template version 1 and version 2.


In the second case, it can be difficult to fix the situation because you cannot upload a corrected template version while the provisioners are overloaded.

The troubleshooting steps below will help you resolve this situation:

- Pause prebuilt workspace reconciliation to stop the problem from getting worse
- Check how many prebuild jobs are clogging your provisioner queue
- Cancel excess prebuild jobs to free up provisioners for human users
- Fix any problematic templates that are causing the issue
- Resume prebuilt reconciliation once everything is back to normal

If your Coder deployment is exhibiting the above symptoms, follow these instructions to verify and then rectify the situation:

First, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe having this numbered would help?

Suggested change
First, run:
1) Pause prebuilt workspace reconciliation


```bash
coder prebuilds pause
```

This prevents further pollution of your provisioner queues by stopping the prebuilt workspaces feature from scheduling new creation jobs. Jobs that have already been enqueued will still be processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe worth adding a note that this will pause prebuilds system-wide, not just organization-wide


**Important**: Remember to run `coder prebuilds resume` once all impact has been mitigated (see the last step in this section).

Next, run:

```bash
coder provisioner jobs list --status=pending --initiator=prebuilds
```

This will show a list of all pending jobs that have been enqueued by the prebuilt workspace system. The length of this list indicates whether prebuilt workspaces have overwhelmed your Coder deployment.

Human-initiated jobs have priority over pending prebuild jobs, but running prebuild jobs cannot be preempted. A long list of pending prebuild jobs increases the likelihood that all provisioners are already occupied when a user wants to create a workspace. This increases the likelihood that users will experience delays waiting for the next available provisioner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice 👍


To ensure that the next available provisioner will be given to a human-initiated job, run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I’m not sure this sentence is entirely accurate. Since human-initiated jobs already have priority over prebuild-related jobs, the next available provisioner will automatically be assigned a human-initiated job if there is one. The purpose of this behavior is to help clear the queue and prevent situations where all provisioner daemons are occupied with prebuild-related jobs, which could delay human-initiated ones.


```bash
coder provisioner jobs list --status=pending --initiator=prebuilds | jq -r '.[].id' | xargs -n1 -P2 -I{} coder provisioner jobs cancel {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIU, this command won’t actually print the list of jobs — it will pipe them directly into jq. I think it would be useful to show the list of jobs first, so users can review them before deciding to cancel. That way, they could choose to cancel only a subset of prebuilds if needed.

Wouldn’t it make more sense for coder provisioner jobs cancel to accept a list of job IDs?
Right now, we don’t support cancelling multiple jobs simultaneously (either through the CLI or the dashboard), so adding that capability would be a nice improvement.

```

This will clear the provisioner queue of all jobs that were not initiated by a human being, which increases the probability that a provisioner will be available when the next human operator needs it. It does not cancel running provisioner jobs, so there may still be some delay in processing new provisioner jobs until a provisioner completes its current job.

At this stage, most prebuild related impact will have been mitigated. There may still be a bugged template version, but it will no longer pollute provisioner queues with prebuilt workspace jobs. If the latest version of a template is also broken for reasons unrelated to prebuilds, then users are able to create workspaces using a previous template version. Some running jobs may have been initiated by the prebuild system, but these cannot be cancelled without potentially orphaning resources that have already been deployed by Terraform. Depending on your deployment and template provisioning times, it might be best to upload a new template version and wait for it to be processed organically.

If you need to expedite the processing of human-related jobs at the cost of some infrastructure housekeeping, you can run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be good to include a warning about the infrastructure housekeeping implications here, and clarify that this command should generally be used as a last resort.


```bash
coder provisioner jobs list --status=running --initiator=prebuilds | jq -r '.[].id' | xargs -n1 -P2 -I{} coder provisioner jobs cancel {}
```

This will cancel running prebuild jobs (orphaning any resources that have already been deployed) and immediately make room for human-initiated jobs.

Once the provisioner queue has been cleared and all templates have been fixed, resume prebuild reconciliation by running:

```bash
coder prebuilds resume
```

This re-enables the prebuilt workspaces feature and allows the reconciliation loop to resume normal operation. The system will begin creating new prebuilt workspaces according to your template configurations.

### Template configuration best practices

#### Preventing resource replacement
Expand Down
Loading

[8]ページ先頭

©2009-2025 Movatter.jp