- Notifications
You must be signed in to change notification settings - Fork921
Description
Problem
If a prebuilt workspace's template usesignore_changes
as recommended in the docs, its agent may not reconnect after a workspace template upgrade.
e.g.
resource"docker_container""workspace" {lifecycle {ignore_changes=all }count=data.coder_workspace.me.start_countentrypoint=["sh","-c",coder_agent.main.init_script]env=["CODER_AGENT_TOKEN=${coder_agent.main.token}"]...}
Details
A template upgrade kicks off astart
build.start
builds setcoder_workspace.start_count
to1
, which is used in thecount
attribute of compute resources (see above example). If the workspace is already started, then any resources which already havecount=1
will attempt to be updated in-place.
Astart
build causes thecoder_agent
to be recreated, which generates a new auth token. Normally, withoutignore_changes
, theenv
attribute above would be modified, since the token value changes.env
is immutable (i.e. defined asForceNew
), therefore Terraform will see any changes to this attribute as drift from the original and force a replacement. This would lead to thedocker_container
being recreated and the agent would start afresh and connect to the control plane.
Withignore_changes
, however, changes to these attributes are ignoredin order for prebuilds to work, which means the template update for the workspace has no real effect at all, but thecoder_agent
's token is still changed and so the agent can no longer connect to the control plane on behalf of the workspace. The previous agent token would still be used, even though the control plane will only accept the new one.
Workaround
Manually restarting the workspace will allow the agent to reconnect successfully.
Proposed Solution
Template updates should not bestart
builds, but rather a logical restart (i.e. successivestop
andstart
builds) in order to guarantee the behaviour customers expect. This should apply for both claimed prebuilt workspaces AND regular workspaces alike, to guarantee that the compute resource is created anew. I fear the currentstart
-only mechanism is working by accident, because of Terraform drift taking care of destroying and recreating the resource like astop
+start
would.