coder/coderPublic

NotificationsYou must be signed in to change notification settings
Fork1.1k
Star11.4k

Workspace timings are incomplete #16535

New issue

Open

Feature

Open

Workspace timings are incomplete#16535

Feature

Assignees

Labels

observabilityIssues related to observability (metrics, dashboards, alerts, opentelemetry)

Description

dannykopping

opened

on Feb 12, 2025

Bottom-line upfront:

As a user, I would expect to see all aspects of workspace provisioning to show up in the timings panel. Currently two important aspects are missing: compute instance boot time and agent download time - both of which could add seconds or even minutes to my workspace startup time, meaning as a user my experience is worse than what is being measured.

Since v2.17, Coder has provided timing information for workspace builds.

We currently capture:

terraform initialization, graphing, planning, and applying
agent connection, and startup script execution

As a reminder, this is how workspaces start up:

terraform applies the template, creating resources
some form of compute (VM, container) resource is provisioned and boots up
that compute executes theagent init script (example:bootstrap_linux.sh)
this script downloads the agent binary fromcoderd
the agent binary starts up and connects tocoderd

Timings are not currently captured for steps 2 and 4 (3 is not captured either, but it's not worth measuring).

Both of these steps could introduce serious latency from a user's perspective, so we have to capture them.

Additionally, if we have this new information we can use it to enhance the determination of workspace states. We could introduce new states likeBOOTING UP andAGENT DOWNLOADING, which would go a long way to helping users understand what's happening with their workspaces.

This view is not terribly helpful right now.

Implementation ideas:

The binary which the agent downloads is bundled into thecoder binary, and accessed via a file handler:

coder/site/site.go

Line 114 in2ace044

mux.Handle("/bin/",http.StripPrefix("/bin",http.HandlerFunc(func(rw http.ResponseWriter,r*http.Request) {

We inject the workspace metadata into the environment of the compute instance:

coder/provisioner/terraform/provision.go

Lines 257 to 263 in9520da3

	"CODER_WORKSPACE_ID="+metadata.GetWorkspaceId(),
	"CODER_WORKSPACE_OWNER_ID="+metadata.GetWorkspaceOwnerId(),
	"CODER_WORKSPACE_OWNER_SESSION_TOKEN="+metadata.GetWorkspaceOwnerSessionToken(),
	"CODER_WORKSPACE_TEMPLATE_ID="+metadata.GetTemplateId(),
	"CODER_WORKSPACE_TEMPLATE_NAME="+metadata.GetTemplateName(),
	"CODER_WORKSPACE_TEMPLATE_VERSION="+metadata.GetTemplateVersion(),
	"CODER_WORKSPACE_BUILD_ID="+metadata.GetWorkspaceBuildId(),

Based on this, we could pass along the value ofCODER_WORKSPACE_BUILD_ID when downloading the agent, and track download attempts against this record. We need to use the build ID and not the workspace ID since we need these timings on a per-build (technically per-provisioner-job) level like other timings.

Knowing when this request was made will allow us to calculate (without precision but close enough):

2: compute boot time =(_first_ agent binary download attempt time) - (terraform apply end time)
4: agent download time =(agent connection start time) - (_first_ agent binary download attempt time)

The bootstrap script willretry to download the agent binary if it fails, so we need to consider these in the timings. In both cases, we should use the time of thefirst attempt to download the agent binary, since this is a good proxy metric for when the compute instance has first booted and also represents the full time taken to download the agent (including retries).

Along with the timings we can also have a query which returns the number of download attempts which could be added somewhere in the UI,maybe even the tooltip of the download timings.

NOTE: It might not be worth it to measure each individual download attempt. We'd either have to hook the file server, or send another request from the bootstrap script (or some extra metadata in each request) to capture the download failed times. We can probably leave this out for now since it's probably not that useful; it can be a future enhancement.

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workspace timings are incomplete #16535

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions