Scale Testing
Scaling Coder involves planning and testing to ensure it can handle more loadwithout compromising service. This process encompasses infrastructure setup,traffic projections, and aggressive testing to identify and mitigate potentialbottlenecks.
A dedicated Kubernetes cluster for Coder is recommended to configure, host, andmanage Coder workloads. Kubernetes provides container orchestrationcapabilities, allowing Coder to efficiently deploy, scale, and manage workspacesacross a distributed infrastructure. This ensures high availability, faulttolerance, and scalability for Coder deployments. Coder is deployed on thiscluster using theHelm chart.
For more information about scaling, see ourCoder scaling best practices.
Methodology
Our scale tests include the following stages:
Prepare environment: create expected users and provision workspaces.
SSH connections: establish user connections with agents, verifying theirability to echo back received content.
Web Terminal: verify the PTY connection used for communication with WebTerminal.
Workspace application traffic: assess the handling of user connections withspecific workspace apps, confirming their capability to echo back receivedcontent effectively.
Dashboard evaluation: verify the responsiveness and stability of Coderdashboards under varying load conditions. This is achieved by simulating userinteractions using instances of headless Chromium browsers.
Cleanup: delete workspaces and users created in step 1.
Infrastructure and setup requirements
The scale tests runner can distribute the workload to overlap single scenariosbased on the workflow configuration:
T0 | T1 | T2 | T3 | T4 | T5 | T6 | |
---|---|---|---|---|---|---|---|
SSH connections | X | X | X | X | |||
Web Terminal (PTY) | X | X | X | X | |||
Workspace apps | X | X | X | X | |||
Dashboard (headless) | X | X | X | X |
This pattern closely reflects how our customers naturally use the system. SSHconnections are heavily utilized because they're the primary communicationchannel for IDEs with VS Code and JetBrains plugins.
The basic setup of scale tests environment involves:
- Scale tests runner (32 vCPU, 128 GB RAM)
- Coder: 2 replicas (4 vCPU, 16 GB RAM)
- Database: 1 instance (2 vCPU, 32 GB RAM)
- Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
The test is deemed successful if:
- Users did not experience interruptions in their
workflows,
coderd
did not crash or require restarts, and- No other internal errors were observed.
Traffic Projections
In our scale tests, we simulate activity from 2000 users, 2000 workspaces, and2000 agents, with two items of workspace agent metadata being sent every 10seconds. Here are the resulting metrics:
Coder:
- Median CPU usage forcoderd: 3 vCPU, peaking at 3.7 vCPU while all tests arerunning concurrently.
- Median API request rate: 350 RPS during dashboard tests, 250 RPS during WebTerminal and workspace apps tests.
- 2000 agent API connections with latency: p90 at 60 ms, p95 at 220 ms.
- on average 2400 Web Socket connections during dashboard tests.
Provisionerd:
- Median CPU usage is 0.35 vCPU during workspace provisioning.
Database:
- Median CPU utilization is 80%, with a significant portion dedicated to writingworkspace agent metadata.
- Memory utilization averages at 40%.
write_ops_count
between 6.7 and 8.4 operations per second.
Available reference architectures
Hardware recommendation
Control plane: coderd
To ensure stability and reliability of the Coder control plane, it's essentialto focus on node sizing, resource limits, and the number of replicas. Werecommend referencing public cloud providers such as AWS, GCP, and Azure forguidance on optimal configurations. A reasonable approach involves using scalingformulas based on factors like CPU, memory, and the number of users.
While the minimum requirements specify 1 CPU core and 2 GB of memory percoderd
replica, we recommend that you allocate additional resources dependingon the workload size to ensure deployment stability.
CPU and memory usage
Enablingagent stats collection(optional) may increase memory consumption.
Enabling direct connections between users and workspace agents (apps or SSHtraffic) can help prevent an increase in CPU usage. It is recommended to keepthis option enabledunless there are compelling reasons to disable it.
Inactive users do not consume Coder resources.
Scaling formula
When determining scaling requirements, consider the following factors:
1 vCPU x 2 GB memory
for every 250 users: A reasonable formula to determineresource allocation based on the number of users and their expected usagepatterns.- API latency/response time: Monitor API latency and response times to ensureoptimal performance under varying loads.
- Average number of HTTP requests: Track the average number of HTTP requests togauge system usage and identify potential bottlenecks. The number of proxiedconnections: For a very high number of proxied connections, more memory isrequired.
HTTP API latency
For a reliable Coder deployment dealing with medium to high loads, it'simportant that API calls for workspace/template queries and workspace buildoperations respond within 300 ms. However, API template insights calls, whichinvolve browsing workspace agent stats and user activity data, may require moretime. Moreover, Coder API exposes WebSocket long-lived connections for WebTerminal (bidirectional), and Workspace events/logs (unidirectional).
If the Coder deployment expects traffic from developers spread across the globe,be aware that customer-facing latency might be higher because of the distancebetween users and the load balancer. Fortunately, the latency can be improvedwith a deployment of Coderworkspace proxies.
Node Autoscaling
We recommend disabling the autoscaling forcoderd
nodes. Autoscaling can causeinterruptions for user connections, seeAutoscaling for more details.
Control plane: Workspace Proxies
When scalingworkspace proxies, follow thesame guidelines as forcoderd
above:
1 vCPU x 2 GB memory
for every 250 users.- Disable autoscaling.
Control plane: provisionerd
Each external provisioner can run a single concurrent workspace build. Forexample, running 10 provisioner containers will allow 10 users to startworkspaces at the same time.
By default, the Coder server runs 3 built-in provisioner daemons, but thePremium Coder release allows for running external provisioners to separate theload caused by workspace provisioning on thecoderd
nodes.
Scaling formula
When determining scaling requirements, consider the following factors:
1 vCPU x 1 GB memory x 2 concurrent workspace build
: A formula to determineresource allocation based on the number of concurrent workspace builds, andstandard complexity of a Terraform template.Rule of thumb: the moreprovisioners are free/available, the more concurrent workspace builds can beperformed.
Node Autoscaling
Autoscaling provisioners is not an easy problem to solve unless it can bepredicted when a number of concurrent workspace builds increases.
We recommend disabling autoscaling and adjusting the number of provisioners todeveloper needs based on the workspace build queuing time.
Data plane: Workspaces
To determine workspace resource limits and keep the best developer experiencefor workspace users, administrators must be aware of a few assumptions.
- Workspace pods run on the same Kubernetes cluster, but possibly in a differentnamespace or on a separate set of nodes.
- Workspace limits (per workspace user):
- Evaluate the workspace utilization pattern. For instance, web applicationdevelopment does not require high CPU capacity at all times, but will spikeduring builds or testing.
- Evaluate minimal limits for single workspace. Include in the calculationrequirements for Coder agent running in an idle workspace - 0.1 vCPU and 256MB. For instance, developers can choose between 0.5-8 vCPUs, and 1-16 GBmemory.
Scaling formula
When determining scaling requirements, consider the following factors:
1 vCPU x 2 GB memory x 1 workspace
: A formula to determine resourceallocation based on the minimal requirements for an idle workspace with arunning Coder agent and occasional CPU and memory bursts for buildingprojects.
Node Autoscaling
Workspace nodes can be set to operate in autoscaling mode to mitigate the riskof prolonged high resource utilization.
One approach is to scale up workspace nodes when total CPU usage or memoryconsumption reaches 80%. Another option is to scale based on metrics such as thenumber of workspaces or active users. It's important to note that as new usersonboard, the autoscaling configuration should account for ongoing workspaces.
Scaling down workspace nodes to zero is not recommended, as it will result inlonger wait times for workspace provisioning by users. However, this may benecessary for workspaces with special resource requirements (e.g. GPUs) thatincur significant cost overheads.