- Notifications
You must be signed in to change notification settings - Fork928
Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
When using the same template and creating the same workspaces, with the exception of the name, some workspaces (10 to 20% of the workspaces in my case) get stuck in starting phase (same happened to me during deletion phase of stopped workspace, but I don't have proof on hand):
There's no logs explaining what is going on exactly, besides audit log for create action and there is no progress unless I cancel the build and wait for 5 minutes until the following message appears:
Then I can delete the workspace.
Relevant Log Output
2025-06-25 10:19:05.612 [info] coderd: audit_log ID=c4b9b12c-3f95-412d-9111-3bf371763106 Time="2025-06-25T10:19:05.506227Z" UserID=1af29cb0-af43-4741-b7de-57a1c05a0c28 OrganizationID=afc95229-f7e3-48af-a413-0d13f265f3fe Ip=hidden UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36" ResourceType=workspace ResourceID=747b39bd-7779-4a32-a1ba-636df16456e2 ResourceTarget=rose-mouse-31 Action=create Diff="{\"automatic_updates\":{\"Old\":\"\",\"New\":\"never\",\"Secret\":false},\"id\":{\"Old\":\"\",\"New\":\"747b39bd-7779-4a32-a1ba-636df16456e2\",\"Secret\":false},\"name\":{\"Old\":\"\",\"New\":\"rose-mouse-31\",\"Secret\":false},\"owner_id\":{\"Old\":\"\",\"New\":\"1af29cb0-af43-4741-b7de-57a1c05a0c28\",\"Secret\":false},\"template_id\":{\"Old\":\"\",\"New\":\"ef6f520c-c8bb-47a2-a479-16efdd352733\",\"Secret\":false},\"ttl\":{\"Old\":0,\"New\":10800000000000,\"Secret\":false}}" StatusCode=201 AdditionalFields="{\"workspace_name\":\"\",\"build_number\":\"\",\"build_reason\":\"\",\"workspace_owner\":\"hidden\",\"workspace_id\":\"00000000-0000-0000-0000-000000000000\"}" RequestID=6c23e9ce-548a-47b5-a603-a14374e4cd20 ResourceIcon="" actor="&{ID:1af29cb0-af43-4741-b7de-57a1c05a0c28 Email:hidden Username:hidden}"2025-06-25 11:11:01.874 [info] provisionerd-coder-5568dc94ff-6z6gx-0.runner: attempting graceful cancellation job_id=4cbebb78-4ac9-47b5-aa18-6606d11976be template_name=nodejs template_version=nostalgic_lumiere2 workspace_build_id=303b2ba5-0df3-46b0-8f01-f247d81a3bff workspace_id=747b39bd-7779-4a32-a1ba-636df16456e2 workspace_name=rose-mouse-31 workspace_owner=hidden workspace_transition=start2025-06-25 11:21:02.025 [warn] provisionerd-coder-5568dc94ff-6z6gx-0.runner: failed to call FailJob job_id=4cbebb78-4ac9-47b5-aa18-6606d11976be template_name=nodejs template_version=nostalgic_lumiere2 workspace_build_id=303b2ba5-0df3-46b0-8f01-f247d81a3bff workspace_id=747b39bd-7779-4a32-a1ba-636df16456e2 workspace_name=rose-mouse-31 workspace_owner=hidden workspace_transition=start ...
Expected Behavior
At the very least there should be an indication of what is going on during this scenario, then maybe some mechanism that detects that this have occurred and either retries the build process or fails it so that it won't get stuck in this state indefinitely.
Steps to Reproduce
- Create workspace from the template
- If the issue hasn't occurred, create another workspace, until it occurs.
Environment
- Host OS: ghcr.io/coder/coder, kubernetes env
- Coder version: v2.23.1
Additional Context
No response