Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Workspace timings are incomplete #16535

Open
Assignees
johnstcn
Labels
observabilityIssues related to observability (metrics, dashboards, alerts, opentelemetry)
@dannykopping

Description

@dannykopping

Bottom-line upfront:

As a user, I would expect to see all aspects of workspace provisioning to show up in the timings panel. Currently two important aspects are missing: compute instance boot time and agent download time - both of which could add seconds or even minutes to my workspace startup time, meaning as a user my experience is worse than what is being measured.


Since v2.17, Coder has provided timing information for workspace builds.

We currently capture:

  • terraform initialization, graphing, planning, and applying
  • agent connection, and startup script execution

As a reminder, this is how workspaces start up:

  1. terraform applies the template, creating resources
  2. some form of compute (VM, container) resource is provisioned and boots up
  3. that compute executes theagent init script (example:bootstrap_linux.sh)
  4. this script downloads the agent binary fromcoderd
  5. the agent binary starts up and connects tocoderd

Timings are not currently captured for steps 2 and 4 (3 is not captured either, but it's not worth measuring).

Both of these steps could introduce serious latency from a user's perspective, so we have to capture them.


Additionally, if we have this new information we can use it to enhance the determination of workspace states. We could introduce new states likeBOOTING UP andAGENT DOWNLOADING, which would go a long way to helping users understand what's happening with their workspaces.

Image

This view is not terribly helpful right now.


Implementation ideas:

The binary which the agent downloads is bundled into thecoder binary, and accessed via a file handler:

mux.Handle("/bin/",http.StripPrefix("/bin",http.HandlerFunc(func(rw http.ResponseWriter,r*http.Request) {

We inject the workspace metadata into the environment of the compute instance:

"CODER_WORKSPACE_ID="+metadata.GetWorkspaceId(),
"CODER_WORKSPACE_OWNER_ID="+metadata.GetWorkspaceOwnerId(),
"CODER_WORKSPACE_OWNER_SESSION_TOKEN="+metadata.GetWorkspaceOwnerSessionToken(),
"CODER_WORKSPACE_TEMPLATE_ID="+metadata.GetTemplateId(),
"CODER_WORKSPACE_TEMPLATE_NAME="+metadata.GetTemplateName(),
"CODER_WORKSPACE_TEMPLATE_VERSION="+metadata.GetTemplateVersion(),
"CODER_WORKSPACE_BUILD_ID="+metadata.GetWorkspaceBuildId(),

Based on this, we could pass along the value ofCODER_WORKSPACE_BUILD_ID when downloading the agent, and track download attempts against this record. We need to use the build ID and not the workspace ID since we need these timings on a per-build (technically per-provisioner-job) level like other timings.

Knowing when this request was made will allow us to calculate (without precision but close enough):

2: compute boot time =(_first_ agent binary download attempt time) - (terraform apply end time)
4: agent download time =(agent connection start time) - (_first_ agent binary download attempt time)

The bootstrap script willretry to download the agent binary if it fails, so we need to consider these in the timings. In both cases, we should use the time of thefirst attempt to download the agent binary, since this is a good proxy metric for when the compute instance has first booted and also represents the full time taken to download the agent (including retries).

Along with the timings we can also have a query which returns the number of download attempts which could be added somewhere in the UI,maybe even the tooltip of the download timings.

NOTE: It might not be worth it to measure each individual download attempt. We'd either have to hook the file server, or send another request from the bootstrap script (or some extra metadata in each request) to capture the download failed times. We can probably leave this out for now since it's probably not that useful; it can be a future enhancement.

Metadata

Metadata

Assignees

Labels

observabilityIssues related to observability (metrics, dashboards, alerts, opentelemetry)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions


    [8]ページ先頭

    ©2009-2025 Movatter.jp