This PR does not yet close the issue entirely. We still need to document how to detect and clean orphaned resources that occur when running prebuild jobs are cancelled.

docs: add troubleshooting steps for prebuilt workspaces

4b9bdfe

SasSwart requested a review fromssncferreira

October 9, 2025 08:02

github-actionsbot assignedSasSwart

Oct 9, 2025

SasSwart added3 commits

October 9, 2025 08:09

make lint/markdown

846d724

ask an LLM to review my documentation for grammar, style and tone

c4df5b3

Make the linter happy

00b5a07

SasSwart marked this pull request as ready for review

October 9, 2025 08:32

ssncferreira reviewed

Oct 9, 2025

View reviewed changes

Copy link

Contributor

ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice work 🚀 Should we also mention that users can tune theCODER_PREBUILDS_RECONCILIATION_INTERVAL to manage how frequently the prebuild reconciliation loop runs? That might help reduce the load from frequent reconciliations. Wdyt?

docs/admin/templates/extending-templates/prebuilt-workspaces.md

Comment on lines +254 to +255

		1.Organic overload: Not enough provisioners to meet the deployment's needs
		2.Broken template: A template that mistakenly requests too many prebuilt workspaces

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the issue here is actually a combination of these two factors: there aren’t enough resources to handle the high demand from prebuild-related provisioner jobs. This problem can be further amplified when those jobs take a long time to complete.

Additionally, might be worth explanation an additional scenario when a user creates a new template version (a user-initiated job), once this is processed and the prebuild reconciliation loop runs, it adds even more load by scheduling new prebuild-related jobs. This means the queue could now include jobs for both template version 1 and version 2.

docs/admin/templates/extending-templates/prebuilt-workspaces.md


		If your Coder deployment is exhibiting the above symptoms, follow these instructions to verify and then rectify the situation:

		First, run:

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe having this numbered would help?

Suggested change

	First, run:
	1) Pause prebuilt workspace reconciliation

docs/admin/templates/extending-templates/prebuilt-workspaces.md

		coder prebuilds pause
		```

		This prevents further pollution of your provisioner queues by stopping the prebuilt workspaces feature from scheduling new creation jobs. Jobs that have already been enqueued will still be processed.

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe worth adding a note that this will pause prebuilds system-wide, not just organization-wide

docs/admin/templates/extending-templates/prebuilt-workspaces.md


		This will show a list of all pending jobs that have been enqueued by the prebuilt workspace system. The length of this list indicates whether prebuilt workspaces have overwhelmed your Coder deployment.

		Human-initiated jobs have priority over pending prebuild jobs, but running prebuild jobs cannot be preempted. A long list of pending prebuild jobs increases the likelihood that all provisioners are already occupied when a user wants to create a workspace. This increases the likelihood that users will experience delays waiting for the next available provisioner.

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice 👍

docs/admin/templates/extending-templates/prebuilt-workspaces.md


		Human-initiated jobs have priority over pending prebuild jobs, but running prebuild jobs cannot be preempted. A long list of pending prebuild jobs increases the likelihood that all provisioners are already occupied when a user wants to create a workspace. This increases the likelihood that users will experience delays waiting for the next available provisioner.

		To ensure that the next available provisioner will be given to a human-initiated job, run:

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I’m not sure this sentence is entirely accurate. Since human-initiated jobs already have priority over prebuild-related jobs, the next available provisioner will automatically be assigned a human-initiated job if there is one. The purpose of this behavior is to help clear the queue and prevent situations where all provisioner daemons are occupied with prebuild-related jobs, which could delay human-initiated ones.

docs/admin/templates/extending-templates/prebuilt-workspaces.md

		To ensure that the next available provisioner will be given to a human-initiated job, run:

		```bash
		coder provisionerjobs list --status=pending --initiator=prebuilds\| jq -r'.[].id'\| xargs -n1 -P2 -I{} coder provisionerjobs cancel {}

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

AFAIU, this command won’t actually print the list of jobs — it will pipe them directly into jq. I think it would be useful to show the list of jobs first, so users can review them before deciding to cancel. That way, they could choose to cancel only a subset of prebuilds if needed.

Wouldn’t it make more sense for coder provisioner jobs cancel to accept a list of job IDs?
Right now, we don’t support cancelling multiple jobs simultaneously (either through the CLI or the dashboard), so adding that capability would be a nice improvement.

docs/admin/templates/extending-templates/prebuilt-workspaces.md


		At this stage, most prebuild related impact will have been mitigated. There may still be a bugged template version, but it will no longer pollute provisioner queues with prebuilt workspace jobs. If the latest version of a template is also broken for reasons unrelated to prebuilds, then users are able to create workspaces using a previous template version. Some running jobs may have been initiated by the prebuild system, but these cannot be cancelled without potentially orphaning resources that have already been deployed by Terraform. Depending on your deployment and template provisioning times, it might be best to upload a new template version and wait for it to be processed organically.

		If you need to expedite the processing of human-related jobs at the cost of some infrastructure housekeeping, you can run:

Copy link

Contributor

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would be good to include a warning about the infrastructure housekeeping implications here, and clarify that this command should generally be used as a last resort.

david-fraley self-requested a review

October 9, 2025 13:10

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add troubleshooting steps for prebuilt workspaces#20231

Are you sure you want to change the base?

docs: add troubleshooting steps for prebuilt workspaces#20231

Uh oh!

Conversation

SasSwart commentedOct 9, 2025•
edited
Loading

Uh oh!

Uh oh!

ssncferreira left a comment

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Movatterモバイル変換

docs: add troubleshooting steps for prebuilt workspaces#20231

Are you sure you want to change the base?

docs: add troubleshooting steps for prebuilt workspaces#20231

Uh oh!

Conversation

SasSwart commentedOct 9, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ssncferreira left a comment

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ssncferreiraOct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SasSwart commentedOct 9, 2025•
edited
Loading